From Nottingham to Manchester

Last month I heard back from the keeper of Manuscripts and Special Collections at the University of Nottingham, Mark Dorrington, who said that my collection may not be a good fit with their archives and that, in any case, they were not geared up to deal with such a large digital collection. However he did suggest trying the National Archive for the History of Computing at the University of Manchester and provided a link to its web page.

I have, in fact, already been round the houses with the University of Manchester Library; however, that was not specifically in relation to this particular archive, and it was before I had done any digital preservation work on the collection. So, today I tried making contact with someone specifically concerned with this particular archive and was told that the archivist for this and a number of other special collections is Dr. James Peters. I duly emailed him with the following opening para: ” Dr. Peters, I’m contacting you as the Archivist in charge of the National Archive for the History of Computing (NAHC). I have a collection of documents which reflect the development and application of computers over the last 40 years, and would be grateful for your advice as to whether the collection has any merit and where it could be placed.” I followed this with a description of the background to the collection and of its contents. I’m hoping that my rather indirect approach on this occasion might engender some discussion rather than the outright rejection which I’m becoming used to.

Still looking for a home

Back in 2015 I reported on my efforts to find a permanent home for my document collection. I had no success with any of the organisations I mentioned in that post, and subsequently turned my attention to trying to find a contemporary historian who is interested in the development of computing. I came across one Daniel Wilson (no relation) based at Cambridge University who has a particular interest in the history of science and technology; and I duly contacted him. Despite being interested in hearing about the contents of the collection, he felt unable to help, explaining that “this will require significant work and few people have the budget or the time, given current pressures”. He gave me the name of another contemporary historian at Leicester University who I also tried emailing, but, despite sending a follow-up, I got no response. I’ve concluded that individual academics just have too little time to take on the management of a collection that isn’t absolutely central to a specific piece of work that they are doing.

I am now turning my attention, once more, to institutions, and have just sent an email to the Keeper of Manuscripts and Special Collections (MSC) at the University of Nottingham. I came across this organisation in a JISC email which advised that MSC has just joined the DPC. I was able to mention in the email that, not only have I just completed a digital preservation exercise on the PAWDOC collection using templates which are published in the DPC website; but also that the PAWDOC collection contains much material from the Cosmos project in which the University’s Department of Computer Science took part – perhaps those little extra bits of information might spark an extra bit of interest.

Relief

As reported in the Preservation Planning Journey in this Blog, my document collection has just been exported from the Document Management System (DMS) that it has been in for the last 22 years, and now resides in some 16,000 Windows folders. I feel a strong sense of relief that I will no longer have to nurture two complicated systems – the DMS and its underlying SQL database – in order to access the documents.

Over the years I have had to take special measures to ensure the survival of the collection through 5 changes of hardware, one laptop theft and a major system crash. This included:

  • trying to configure and maintain complex systems I had no in-depth knowledge of
  • paying out hundreds of pounds for extra specialist support (despite the software cost and most general support being very kindly provided free because this has always been a research-oriented exercise)
  • engaging with support staff over phone, email, screen sharing and in person for hundreds of hours to overcome problems (it starts to add up over 22 years…)
  • backing-up and protecting large amounts of data (40Gb total) regularly and reliably.

That’s not to say that DMSs are not worth using – they have characteristics which are essential for high usage, multi-user, systems in which regulatory and legal requirements must be met. However, such constraints don’t apply to the individual. The stark conclusion has to be that, for a Personal Information System, using a DMS was serious overkill.

I guess I’d already come to that conclusion back in 2012 when I set up a filing system for my non-work files using an Excel index and a single Windows Folder for all the documents. That has worked pretty well, however it’s slightly different from the way the newly converted work document collection is stored which has a separate Folder for each Ref No as shown below.

Experience so far with the Windows Folder system indicates that it is very easy and quick to find documents by scrolling through the Folders – quicker than it was using the DMS since there is no need to load an application and invoke a series of commands: Windows Explorer is immediately accessible. As for the process of adding new documents, that too seems much simpler and quicker than having to import files into a DMS, because it involves using the same Windows file system within which the digital files reside in the first place.

Its early days yet so it’ll be a while before I have an in-depth feel for how well other aspects of the system, such as backup requirements, are working; watch this space.

Disks and DMS

As part of the digital preservation work (documented elsewhere) that I’m doing on my document collection, I’ve just completed an exercise to organise and index all the associated physical disks.  It turns out that there are 156 disks of which 16 are actually contained in the collection, and the remaining 140 are backup disks (which have been accumulating over the years) of the collection’s computer system and digitised contents. Old backup disks may not be useful to restore a system crash, but I have kept them to provide an audit trail over the 20+ years that the digital system has been in operation.  Over that period documents have been lost, the index has had fields deleted by mistake, files have been corrupted, and no doubt other errors have occurred. Although the number of such occurrences is low, when such problems are identified it is very useful to have the ability to trace back through previous states of the system.

Another activity that has been prompted by the digital preservation work is to establish what future plans the current supplier of FISH (the document management system I use) has for the system. Last time I asked the question in February 2016, I was told that there are no plans to upgrade the product and that current customers who wanted to look at alternatives were being advised to consider a product called File Stream supplied by Filestream Ltd which is based in Berkshire in the UK. I spoke to the Fish supplier, m-hance, again earlier today and was told there had been no change – it is unlikely that Fish will be upgraded and Filestream is still the recommended replacement product. When I contacted Filestream last year I was told that the product would cost £750 to purchase and £250 a year for support including upgrades.

When I was investigating Filestream last year, I also took a quick look at Open Source document management systems and found several – some of them being free to use. However, further investigation would be required to establish what other components (such as the back-end database) would have to be acquired and whether they would also be free.

These and other options to future proof the collection will all be considered in the digital preservation project currently underway.

Digitised and Checked

I reached a milestone today: my document collection is totally digitised, and every Index entry and associated Document Management folder has been checked. It’s been a very laborious process – which is why my last entry here was over four months ago. However, the collection is now in good shape for a digital preservation exercise, and is ready for transfer to a long-term repository if one can be found.

Following the checking exercise, a detailed analysis was performed to derive statistics and rectify problems where possible. The report documenting the analysis serves as a comprehensive status report on the whole collection at the end of May 2016.

Digitisation in progress

Since my last entry I’ve been steadily digitising the remaining paper in my lifetime work document collection. These are documents I want to retain in original form (some of which have a comb binding), documents that need to be scanned in colour, or documents that were too large to go through the scanner. I acquired a better comb binding machine at the end of October, my current scanner has full colour capability, and I’ve found that photographing large items with my modern camera produces a perfectly readable on-screen image. So there’ve been no more obstacles to getting the job done. As each item is digitised and the file inserted into the FISH document management system, I’m checking the index entry and updating the Movement Status field with either OK or XX as described in my last entry.  At the current rate of progress I should finish the digitisation work by the end of January.

Checking the Collection

Two of the remaining things to be done with my lifetime document collection are to:

a) scan the remaining paper (documents not yet scanned because they were labelled as artefacts to be retained in both their paper and electronic form); and

b) go through all the index entries making sure they contain valid information and that there is an equivalent scan in the Document Management System.

For a) some of the paper documents have comb bindings and will require a binding machine if they are to be scanned using a sheet feeder and then reassembled in the comb binding. I acquired a very cheap comb binding machine on ebay some three weeks ago (though, it seems it was false economy – it stopped functioning properly and I had to send it back yesterday…) and have made a start on scanning the remaining paper. I’m addressing b) in parallel, and recording any issues or key points I find using the following notation in the ‘Movement Status’ field:

OK = The Index entry is as complete as possible and there is an equivalent scanned version

XX = There is a serious issue with this item.

Should the index entry and scans be present but there are some points to be recorded about them,  the ‘OK’ notation is qualified within brackets as follows (multiple qualifications can be recorded within the brackets as necessary separated by  a comma):

  • OK(multi): one or more of the equivalent scanned files in the FISH Document Management System are in the form of multiple TIF files – one for each page. FISH obscures the fact that there is a separate file for each page – but that is how the scan is actually stored.
  • OK(n docs): This identifies when there is more than one scanned document associated with this index entry – where n is the number of separate documents (this is a feature of this approach to electronic filing – multiple documents can be stored under a single Index entry).
  • OK(poor): the quality of some or all of the scanned electronic pages is poor.
  • OK(dbl): one or more of the associated scanned files came from documents with double sided pages which have been scanned all of one side first and then the pile turned over and the other side scanned. When this has been done the scanned pages are out of order. This was done with the first two scanners I had which were not able to handle double sided pages.
  • OK( ord): the pages of one or more of the scanned files are out of order for a reason other than the ‘dbl’ reason above.
  • OK(left): the original document was deliberately left at the location of the employer concerned when I moved jobs.
  • OK(A5): one of the scanners I had was not able to handle A5 pages reliably and sometimes recorded a line as an image dragged down the page for an inch or more.

Should an XX notation be applied to an Index entry, the reason it is being noted as such is recorded in brackets with one or more of the following notations:

  • XX(lost): the paper document was lost before a scan could be taken, so the Index entry is the only trace left of this document.
  • XX(ref): The Reference No is duplicated or incorrect in some other way.
  • XX(pap): The document is still only in paper form because its form is such that it has not yet been possible to digitise it effectively.

The fact that such points and issues are present in the collection in noticeable numbers, simply reflects the fact that, when dealing with such large volumes of material in the course of performing busy jobs across many years, it is inevitable that things will go wrong and mistakes will be made. Having been through the whole of the index, I’ll have statistics about the overall prevalence of such issues in this particular collection.

Search Status

In my last entry I said I’d contacted six potential repositories for my lifetime document collection. This is where those communications are up to:

Loughborough University’s Centre for Information Management: My email was forwarded to the University Library which did not respond. I followed this up on 20th September with an email to the Director of Library Services, and am waiting for a reply.

Manchester University’s Computer Science Dept Research Office: My email was forwarded to a researcher with an interest in the history of computing, but that person replied saying that her work in that area had been put on hold. On 20Sep I used the University library general enquiry form to enquire if the library would be interested. The library advised me to contact the Head of School administration  in the School of Computer Science who I duly emailed on 2nd October, and I am awaiting a reply.

City University’s Cass Business School:  My contact said he would pass my email onto his colleagues and I have heard nothing further.

UCL’s Dept of Information Studies: My contact said she would look out for interested people at conferences.

The National Archives: I was advised by my contact to direct my question to the Archive Sector Development team which, while it does not have any direct provision for taking private collections, should be well placed to provide advice. I emailed the Development Team on 16th September and am waiting for a reply.

 The Science Museum Wroughton Library and Archives: The Library asked me a number of questions about the collection but finally responded by sayingThank you for allowing us the time to consider your collection which we have now discussed with the Archive Collections Manager, Science Museum’s Keeper of Technologies and Engineering, and Head of Library and Archives. We have concluded that whilst we find this a most interesting idea, we do not think that the content fits within our current collecting policy criteria. You may have already contacted them, but we suggest that the National Computing Collection might be a more appropriate repository for your collection.”

This is all pretty much as expected: I know its going to be hard to find a repository that’s interested. However, should there be no interest from any of the above organisations, I plan to rely on interest being generated by the publication of my paper on Digital Preservation Planning.

Repository sought

In the last few months I’ve been making good progress on figuring out how to undertake a Digital Preservation project. Since I’m getting close to being ready to undertake digital preservation work on the PAW/DOC collection, I decided to make an attempt to find a home for the collection before I start. That way, I can tailor the digital preservation work to the requirements of the receiving repository – should I be lucky enough to find anyone who is interested. Anyway, I now have a short two pager to send to repositories which might be interested. This is the second version. Dave Thompson of the Wellcome Foundation (who I met on the UCL online Digital Curation course) was kind enough to comment on the first version and his observations resulted in a substantial rewrite. I’ve sent it to 6 organisations – Loughborough University’s Centre for Information Management, Manchester University’s Computer Science Dept, City University’s Cass Business School, UCL’s Dept of Information Studies, the National Archives, and The Science Museum Wroughton Library and Archives. If get a positive response from any of these all well and good. If I do not I shall proceed with the Digital Preservation work as planned.

Some three years ago I made a list of activities I wanted to undertake with the PAW/DOC collection, and this seems a good moment to summarise where I’m up to – the activities and their status are described below:

  • Scan the remaining 4 boxes of paper. Take the opportunity to explore scanning in colour and using PDF. Possibly also using OCR – though this is of much lower priority. DONE (but not OCRd)
  • Write a paper on “The paper artefact in the digital age” using an analysis of the contents of PAW/DOC as the basis for the paper. DONE
  • Explore the issues of longevity and survivability of file formats and of digital indexing and file management systems, using PAWDOC as the basis for the work. This could also include moving the material from FISH and even Filemaker. STILL TO BE DONE
  • Revisit all the requirements listed in my 2001 BIT paper to identify current status and opportunities for further work. STILL TO BE DONE
  • Scan all remaining PAW/DOC paper i.e. all those items in the three archive boxes (most of which have been identified as artefacts to be retained in their physical form). STILL TO BE DONE – but next on list – I’m trying to find a binding machine to be able to sheet feed the documents with comb binding
  • Check that all index entries are valid (i.e. not blank and with an appropriate Movement Field entry) and have an associated populated FISH entry. STILL TO BE DONE
  • Write up a guide to the material and to the technology supporting it. STILL TO BE DONE
  • Hand over PAW/DOC and its supporting technology to the new owner and provide training for the people who will be managing it going forward. STILL TO BE DONE

An Update – This Work on Hold

This work has lain dormant for a little while now – but only because I’ve been focusing on other supporting activities. In particular, I’m exploring the field of Digital Preservation with the aim of undertaking work to ensure that the contents of my work document collection is long lasting. In the process of doing that I’m also trying to publicise the existence of the collection in order to find someone who might be interested in giving it a long term home. So, I don’t intend to any further work on Personal Document Management until I’ve finished the Digital Preservation investigation.

For the record, I did actually go and talk to Jenny Bunn’s Digital Curation students at UCL on 27Feb2014. I talked for about 20 minutes, provided a handout (the odd layout is because it is designed to be printed double sided), and there was some Q&A at the end. I also had an interesting conversation afterwards with Jenny. However, it prompted no further interest in the work document collection.

Finally, a word about Anne O’Brien of Loughborough University who I started collaborating with on this topic in early 2013. The last contact I had with her was in September of that year, and I had heard nothing more from her or about her until I read in the November 2014 issue of the Loughborough University Alumni magazine that she had died in May 2014. Tom Jackson of Loughborough’s Centre for Information Management where she worked, confirmed in an email that she had died of a heart attack and that her death had come as a huge shock.  I’d like to record here that, in our brief collaboration, Ann was very helpful to me and gave me a number of substantial steers which moved the work I was doing forward both in terms of content and contacts.