Retrospective Preservation Observations

Yesterday I reached a major milestone. I completed the conversion of the storage of my document collection from a Document Management System (DMS) to files in Windows Folders. It feels a huge release not to have the stress of maintaining two complicated systems – a DMS and the underlying SQL database – in order to access the documents.

From a preservation perspective, a stark conclusion has to be drawn from this particular experience: the collection started using a DMS some 22 years ago during which I have undergone 5 changes of hardware, one laptop theft and a major system crash. In order to keep the DMS and SQL Db going I have had to try and configure and maintain complex systems I had no in-depth knowledge of; engage with support staff over phone, email, screen sharing and in person for many, many hours to overcome problems; and backup and nurture large amounts of data regularly and reliably. If I had done nothing to the DMS and SQL Db over those years I would long ago have ceased to be able to access the files they contained. In contrast, if they had been in Windows folders I would still be able to access them. So, from a digital preservation perspective there can be no doubt that having the files in Windows Folders will be a hugely more durable solution.

When considering moving away from a DMS I was concerned it might be difficult to search for and find particular documents. I needn’t have worried. Over the last week or so I’ve done a huge amount of checking to ensure the export from the DMS into Windows Folders had been error free. This entailed constant searching of the 16,000 Windows Folders and I’ve found it surprisingly easy and quick to find what I need. The collection has an Index with each index entry having a Reference Number. There is a Folder for each Ref No within which there can be one or more separate files, as illustrated below.

Initially, I tried using the Windows Explorer search function to look for the Ref Nos, but I soon realised it was just as easy – and probably quicker – to scroll through the Folders to spot the Ref No I was looking for. The search function on the other hand will come in useful when searching for particular text strings within non-image documents such as Word and PDF – a facility built into Windows as standard.

I performed three main types of check to ensure the integrity of the converted collection: a check of the documents that the utility said it was unable to export; a check of the DMS files that remained after the export had finished (the utility deleted the DMS version of a file after it had exported it); and, finally, a check of all the Folder Ref Nos against the Ref Nos in the Index. These checks are described in more detail below.

Unable to export: The utility was unable to export only 13 of the 27,000 documents and most of these were due to missing files or missing pages of multi-page documents.

Remaining files: About 1400 files remained after the export had finished. About 1150 of  these were found to be duplicates with contents that were present in files that had been successfully exported. The duplications probably occurred in a variety of ways over the 22 year life of the DMS including human error in backing up and in moving files from off-line media to on-line media as Laptops started to acquire more storage. 70 of the files were used to recreate missing files or to augment or replace files that had been exported. Most of the rest were pages of blank or poor scans which I assume I had discovered and replaced at the point of scanning but which somehow had been retained in the system. I was unable to identify only 7 of the files.

Cross-check of Ref Nos in Index and Folders: This cross-check revealed the following problems with the exported material from the DMS:

  • 9 instances in which a DMS entry was created without a Index entry being created,
  • 9 cases in which incorrect Ref Nos had been created in the DMS,
  • 6 instances in which the final digit of a longer than usual Ref No had been omitted (eg PAW-BIT-Nov2014-33-11-1148 was exported as PAW-BIT-Nov2014-33-11-114),
  • 3 cases in which documents had been marked as removed in the Index but not removed from the DMS,
  • 2 cases in which documents were missing from the DMS export.

It also revealed a number of problems and errors within the 17,000 index entries. These included 12 instances in which incorrect Filemaker Doc Refs had been created, and 6 cases in which duplicated Filemaker entries were identified.

The overall conclusion from this review of the integrity of the systems managing the document collection over some 37 years, is that a substantial amount of human error has crept in, unobtrusively, over the years. Experience tells me that this is not specific to this particular system, but a general characteristic of all systems which are manipulated in some way or other by humans. From a digital preservation standpoint this is a specific risk in its own right since, as time goes by, as memories fade, and as people come and go, the knowledge about how and why these errors were made just disappears making it harder to identify and rectify them.

Leave a Reply

Your email address will not be published. Required fields are marked *