Paper published by the DPC

Over the last few weeks I’ve been working with Sara Thomson in the Digital Preservation Coalition (DPC), to get my paper revised and on to the DPC web site. That work was duly completed last Monday and the paper is now accessible on the DPC website in the Publications/Case Studies section at http://www.dpconline.org/advice/case-notes. Publishing the paper on a website is much better than having it in a conventional journal because the web publication makes it easy to provide the template documents as downloadable attachments that people can use. Let’s hope there are some takers. I’ve also been able to add a section at the end explaining that I am seeking a permanent repository for my PAWDOC collection and/or callaborators to apply the digital preservation process described in the paper, to the PAWDOC collection.

The DPC will be sending out a press release about the paper in the next few days to various mailing lists; and, beyond that, I believe the DPC want to arrange a webinar sometime in the next few months to air the contents of the paper. I’m hoping that all this publicity may spark some interest in the PAWDOC collection.

Case Study in DPC Report

Having completed my paper on ‘Preservation Planning for Personal Collections’ I sent it to the people mentioned in the Acknowledgements section to get their permission to include their names. In October, I received an offer from two of the people I had acknowledged – Neil Beagrie and William Kilbride – to make the paper available as a case note on the Digital Preservation Coalition’s (DPC) web site early in 2016. I was pleased to accept the DPC offer. Since then, the work I had done for the paper was included as a case study in the DPC Technology Watch Report on Personal Digital Archiving which was published on 15Dec2105. This is a very informative document with useful advice for individuals and I’m pleased to be a part of it.

I am now waiting to hear from the DPC about what changes to the paper its reviewers have suggested, and the timescale for the paper to be published in the DPC website. My hope is that, once it is published, I might hear from some people interested in collaborating in applying the process described in the paper to my work document collection.

Personal Preservation Planning Paper

I see my last entry was in late June – a few weeks before moving house in mid-July. Since then, a full programme of packing, unpacking and house renovation has affected progress. However, I completed a paper describing my experiences in identifying a preservation planning process for personal collections, just over a week ago (on the day my granddaughter was born!). It included templates for a Scoping Document, a Digital Preservation Plan, A Digital Preservation Chart, and a Preservation Maintenance Plan – which I have produced in the belief that the document formats derived in the course of this work may be of use to others. I am now in the process of trying to find a publisher.

The completion of the preservation planning trials and the associated paper, now puts me in a position to undertake preservation planning on my lifetime collection of work documents. I would prefer to conduct this exercise (which will be a major challenge) in conjunction with the destination repository for the collection; but, in the absence of such a repository, with any other organisations and individuals who might be interested. I am devoting my efforts in the immediate future to identifying such organisations or individuals. Please get in touch if you would like to be involved.

Second Trial Finished

Yesterday I finished my second trial of the Preservation Planning process on the 17,000+ files of our photo collection. While the preservation activities were a little laborious and time-consuming by their very nature, there’s no doubt that the planning process was a whole lot easier the second time around with the benefit of the experience and template documentation from the first trial. The first trial highlighted a need to undertake pre-planning work on file formats, and I duly did this to good effect – that aspect will become embedded in the process  going forward.

I believe I now have enough knowledge to embark on Preservation work on my PAW/DOC document collection (which was the original objective of the two trials) sometime early next year. First, though, I’ll update the documentation and templates to reflect the findings of this second trial. I think I may also write up my findings in the form of a journal paper since, as I discovered when I started this work, such process guidance for owners of personal collections does not seem to be freely available.

Photo Preservation Plans done – with DROID’s help

The UCL Digital Curation course finished at the end of the March. It was an excellent introduction to the field in general and provided links to a great deal of relevant material elsewhere on the net. Two aspects were particularly important for me – first, the tutor prompted us to use the National Archive’s DROID software on our personal collections, and I discovered what an easy and effective tool it is for identifying the formats and numbers of a set of files. I used it to undertake the pre-planning  work on the 17,000+ files of the Photo collection to great effect. Furthermore, I emailed the DROID support team to help understand parts of the DROID report, and found them to be very helpful. With DROID’s assistance, and by comparing its results with Windows Explorer searches, I was able to find and cull spurious files and to decide what format changes would be needed in the course of carrying out the Preservation work on the SUPAUL-PHOTO collection.

The second aspect of the UCL course which I found particularly useful was the interaction with the other students (I believe there were more than 200 from all over the world). It was fascinating to read about where they were from and what jobs they were doing. It was also very interesting to read the discussions that went on in the course’s forums – though, sadly, after a lively beginning discussion did fall away as the course progressed. Nevertheless, I was able to ask some questions about file formats and got clear guidance on whether it was worth converting JPGs to JPG2000 (it’s not). I also met Dave Thompson, Digital Curator at the Wellcome Foundation, on the course, and, I was able to get some invaluable advice from him outside the course about how to word a flyer seeking a repository for my collection of work documents (details of my attempts to find such a repository are recorded in the Personal Document Management section of this site).

Regarding my Preservation Planning work on the SUPAUL-PHOTO collection, I have now completed the pre-planning work and the writing of the Project Plan Description and the associated Project Plan Chart, so am now all set to conduct the work itself. Having undertaken two trials of the planning process, I’m satisfied that I understand what’s required and that the documentation will support the work. Of course, the PAW-DOC collection (which is what I developed the planning process for) is a far larger and more complex collection than either of the collections on which I have been testing the process. However, I’m confident that I will at least able to start the PAW-DOC preservation work in a coherent and comprehensive way, and that the knowledge and experience I have gained so far will help me to figure out how to make any adjustments that may be needed in the course of the project. Beyond that, I hope I’ll be able to draw on the advice of the contacts I’ve made in the course of exploring the digital preservation field. Anyway, that’s all for the future. First I need to actually carry out the preservation work on the SUPAUL-PHOTO collection as specified in the project plan.

Scoping Document for Photo Collection

With the experience of doing Preservation Planning on the PAW-PERS collection under my belt, I’ve started work on my collection of photos and videos. To manage the process, I shall produce a Scoping Document, a Plan document, and a Maintenance document as originally envisaged. However, the Scoping document has now changed substantially to reflect the lessons learned in the PAW-PERS work and I’ve incorporated those changes in a reusable Preservation Planning Scoping Document Template. By the end of this exercise on the Photo collection I aim to have templates for each of three documents which I’ll be able to use for the original target piece of work – preservation planning on PAW-DOC  (my lifetime collection of work documents).

But back to the immediate task – the photo collection. I’ve completed the Photo Scoping Document so I now need to address the pre-planning tasks – which include assessing each of the different file types in the collection, deciding what formats to change them to (if at all) and becoming familiar with any conversion tools that are to be used. The need to do this before the Preservation Project Plan is created was one of the key findings from the earlier work on the PAW-PERS collection.

First Preservation Planning Trial – Done!

Yesterday I completed the Preservation Planning work on my PAW-PERS collection. The final activity was to create a Preservation Maintenance Plan which will initiate future work at intervals down the years. I had never undertaken preservation planning before, so the whole exercise was designed to be one of learning by trial and error. The Preservation Maintenance Plan essentially captures all that learning by specifying a preferred process for the next scheduled preservation maintenance activity on the PAW-PERS collection.

Having completed work on PAW-PERS, I shall now try out the improved preservation planning process on my collection of 17,000 photos. This exercise will identify any further process refinements that are needed, before I start on the original objective of all this work – to undertake preservation planning on my lifetime collection of working documents (referred to as PAW-DOC). This is a very large collection of diverse and some very old files all managed in a proprietary document management system integrated with an index held in a Filemaker database. Performing preservation planning on this beast is going to be a stretch for me as an individual with no corporate resources to draw on – I will need all the knowledge and expertise I can glean from working on the PAW-PERS and the Photo collections.

UCL’s Online Digital Curation Course

A couple of weeks ago I joined UCL’s free – and excellent – 8 week online Digital Curation course. It has several hundred participants from all over the world – many of them professionals and students in the Archiving and Curation field. The course covers what digital curation is, how it is performed, and its major activities and communities worldwide, as well as leading participants through some practical digital curation work on their own files. This latter activity is a perfect fit with the trial I am currently performing of an approach to creating and planning a Preservation Plan.  The course also encourages participants to discuss what is being taught and, although I’m actually doing digital curation work, I’m an amateur with no training, so I’m finding it very valuable to listen to the perspectives of specialists in the area.

Last week in the course we were asked to write about our digital mindset – our early experiences with computers and any turning points where we suddenly became more aware of the digital world we are now in. This was my (slightly augmented) contribution:

I first came across computers at university where we handed our punched card programs into the Computer Dept and collected the results a day or two later. In my first job in Kodak I experienced computerised stock control, sales estimating and factory production planning, and was fascinated. I became a Needs Analyst. However, it wasn’t till I joined the National Computing Centre’s newly formed Office Systems division in 1980 that the digital penny really dropped. The job was to seek out best practice and spread it to UK organisations. It was a time when Word Processing was gaining ground, personal computers were being introduced and electronic mail was just emerging. Within a year I knew that the future for the individual, both in the office and at home, was digital. I plunged in enthusiastically. I started filing all my documents using an index knowing that eventually the index would be computerised and that the documents themselves would be digitised; I replaced my pocket and desk diaries with a constantly updated folded A4 page that I kept in my wallet; and I rushed to work early in the morning to furiously communicate with distant colleagues in the British Library electronic journal project BLEND. By the time I took my next job in 1984 my path was set and the remaining 26 years of my career were spent harnessing the increasing power and lowering costs of computers to augment my digital visions. At home, we started budgeting on the EazyCalc spreadsheet, our addresses were held in a database, and I started indexing and scanning every family photo. At work, my wallet diary was eventually replaced by an Organiser and then mobile phone (though my wallet diary sheets are the best diary records I have); and I immersed myself in email, Computer Conferencing services, and research in configurable message systems. My file index was computerised on a Mac and eventually I started scanning my documents into a document management system. Shortly afterwards I started to experience preservation anxiety when I realised that this ever expanding, increasingly precious collection of all my work knowledge was utterly dependent on the next 30 years of effective back-up procedures and flawless migrations through many upgrades of three software products, the operating system, and my laptop.

When I retired in 2012 and was released from the overload hell that email had become, I had time to digitise the boxes of mementos accumulated since 1958. So, now I have a 33Gb digital collection of all my work documents (approx 180,000 scanned pages) which is in serious need of a preservation plan and a final destination. I also have a 44Gb collection of 17,000 family photos, and a 7Gb collection of 1600 digitised family mementos – both of which have a destination (my offspring) but which also require a preservation plan and a mechanism for informing, and handing them over to, the unsuspecting recipients. My digital vision for the workplace has long since been achieved; but there is much left to explore in the home – how to show, share and bring to life our physical and digital objects, and how to ensure they are reliably passed through the generations; and of course, ways to allay the ever-present preservation anxiety associated with such precious collections.

PDF/A Flavours and Error Messages

A week ago I acquired an updated version of my eCopy PDF Pro Office software with much more comprehensive facilities for creating PDF/A documents. Since then I’ve been exploring what those facilities are and using them on the files I’m converting in my PAW-PERS collection. The updated eCopy software provides support for PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, and PDF/A-2u. Broadly speaking, PDF/A-1b seems to be the most basic level of conformance required and aims to achieve a reliably rendered visual appearance. PDF/A-1a supports additional features such as Tags and Language, while PDF/A-2 (which was published after PDF-A-1) also ensures that layers, transparency and embedded files are preserved. eCopy enables you to check whether a document conforms to any one of these standards, and I used this facility to check that the documents I was converting to PDF from other formats, complied with PDF/A-1b. On almost every occasion, even though I was using the eCopy software to convert the documents into PDF, the compliance check threw up errors – below are examples of some of the most common ones.

  • xmp: CreateDate Bad XMP Date: ‘2015-01-27T09:56:01Z:P’ Page;1 Number;1 (The XMP metadata stream should conform to XMP specification);
  • Mismatch between xmp:CreateDate (‘2015-01-27T09:56:01Z:P’) and CreationDate (‘D:20150127095601Z’) (The XMP metadata stream should conform to XMP specification);
  • DeviceRGB used in image but no output intent (Device-specific colour space used, but no Output Intent is defined for the file)
  • Output Intent missing (Non-Device-independent colour space is used but no OutputIntent is not defined).
  • Missing PDF/A identifier (the PDF/A version and conformance level of a file shall be specified using the PDF/A identification extension schema in the XMP packet)

eCopy also provides a “Fix” facility which in most cases cleared the errors – though only if the resulting file was saved with a different file name. In some cases however, even the Fixed file still had errors in it which were only cleared by a further “Fix” and saving the file to yet another file name.

This turned out to be a rather tortuous process so I decided that I was only going to check and ensure PDF/A-1b compliance for the files that, at the start of this exercise, were not in PDF format at all. The remaining 800+ files which were already in PDF format at the start of this exercise will have to stay as they are for now. I’ve checked a few of them and they all have several compliance errors, but to ensure they all complied with PDF-A1-b would consume more time than I am prepared to spend right now.

The key findings from this phase of the work so far are that it is vital to fully understand the file formats you are targeting, and to become very familiar with the software you intend to use, before creating the Preservation Plan. Without that knowledge the Plan is likely to be unrealistic and almost impossible to stick to.

A first attempt at a Preservation Plan

Shortly after my last entry, I set about creating a Preservation Plan for my PAW-PERS collection of personal documents and mementos. I combined elements of Project Planning that I had experienced while working as an IT professional together with aspects of the preservation planning concepts already documented in the Scoping Document. The Project Plan consists of two documents – a Project Plan Description and a Project Plan Chart.

I sent drafts of these two documents to Chris Hilton, William Kilbride and Neil Beagrie asking for comments. Chris Hilton of the Wellcome Foundation very kindly sent me back his views just before Christmas – in summary, he thought the plans were thorough and that the decision to convert most documents to PDF or PDF/A was a good one. He also suggested keeping the original versions of any documents containing some processing components (such as spreadsheets) which may not be captured within the PDA format; and he endorsed keeping off-site copies.

Neil Beagrie put me in touch with Gabriela Redwine of Yale University who is doing work on Personal Digital Archiving for the DPC (Digital Preservation Coalition). She too provided a positive reaction to the Preservation Plan. So with these two endorsements I set about implementing the plan – the first part of which requires that those documents that need to be retained in their original form are identified; and that the remaining files are converted to PDF/A.

Unfortunately the rigours of Christmas and a subsequent call on my time to help my son and his wife do initial renovation work on their new house, have interrupted progress. However, even the little I have done so far has identified a number of issues: a) conversion of an htm document into a PDF document using my PDF package (eCopy PDF Pro Office) did not produce a good similarity. The most reliable rendition was achieved by copying the htm screen into a Word document and then turning that into a PDF; b) a 2010 article from Ohio State University alerts readers that Word 2007 only produces a so-called PDF/A-1b version which does not include tags and mark-ups and which is suitable for documents which are primarily image-based and do not have alternate text. The more complete PDF/A-1a version enables screen reader technology to correctly read the document to disabled persons; c) It seems that even if you have software that can convert to PDF/A format, it still only places the “PDF” extension at the end of the file name, thereby providing no explicit confirmation of whether the file has been converted successfully to PDF/A or whether a file is or is not PDF/A compliant.