Droid explorations and DMS alternatives

Things have started to move in our efforts to perform digital preservation on the PAWDOC collection. I’ve been running the National Archives DROID tool across the 190,000 files and Ross’s automated analysis of the results has turned up a number of issues including several hundred duplicates which we are investigating. Among other things, DROID identifies file types and versions, and this has helped another strand of our investigations to try and gain access to about three hundred files which can no longer be opened. 150 of these are old PowerPoint files from the early 90s which neither the Microsoft viewer nor the earliest version of OpenOffice can open. However, the Zamzar online service, to which you download a file and specify what format you want it to become, successfully converted all of the examples which I submitted, into a version of Powerpoint I can open. Zamzar can’t deal with every problem file, especially those for which I no longer have the relevant application, for example, MS Project and iThink, though it did convert Visio drawings into PDF. We’re continuing to work through these files with the intention of getting a clear decision about what to do with each one so that specific actions can be included in our eventual preservation project plan.

Another substantial investigation underway is to try and identify a suitable alternative to the document management system (DMS) that controls the collection’s files. The future of the current DMS is uncertain, and is too complex to reinstall on upgraded hardware without expensive consultancy support. Jan’s exploration of alternative DMS and preservation repositories, highlighted the fact that, while there are several free to use public domain systems available, they all require multiple components and appear to be relatively complex to install, configure, and maintain. This observation has prompted me to be a lot clearer about the immediate requirements for the collection. It is hoped to find a long term owner, perhaps working in the field of modern history, and it’s possible that that person or organisation may require more sophisticated search and access control functions. However, until that eventual owner is found, only a minimal level of single user functionality is needed, and minimal system management and cost demands are essential. In light of this greater clarity, we are now also considering a low tech, low cost alternative which would involve inserting the Index reference number into the title of every file and storing all the files in the standard Windows folder system. After identifying a required Reference No in the Index, files would be accessed by putting the reference number into the folder system’s standard search facility. As well as looking at the pros and cons of such a solution, we are also investigating the feasibility of  getting the necessary information out of the current DMS and into the titles of all the document files. A further challenge that would have to be overcome is that the current DMS stores multi-page documents as a series of separate TIF files. If we were to move to the low tech Windows folder system solution, it would first be necessary to combine the files making up a single document into one single file. This would need to be an automated process as there are too many documents to contemplate doing it manually.

All these activities and more are required in order to be able to assemble a project plan with unambiguous tasks of known duration. We are continuing to work towards this goal.

Leave a Reply

Your email address will not be published. Required fields are marked *