Repository sought

In the last few months I’ve been making good progress on figuring out how to undertake a Digital Preservation project. Since I’m getting close to being ready to undertake digital preservation work on the PAW/DOC collection, I decided to make an attempt to find a home for the collection before I start. That way, I can tailor the digital preservation work to the requirements of the receiving repository – should I be lucky enough to find anyone who is interested. Anyway, I now have a short two pager to send to repositories which might be interested. This is the second version. Dave Thompson of the Wellcome Foundation (who I met on the UCL online Digital Curation course) was kind enough to comment on the first version and his observations resulted in a substantial rewrite. I’ve sent it to 6 organisations – Loughborough University’s Centre for Information Management, Manchester University’s Computer Science Dept, City University’s Cass Business School, UCL’s Dept of Information Studies, the National Archives, and The Science Museum Wroughton Library and Archives. If I get a positive response from any of these all well and good. If I do not I shall proceed with the Digital Preservation work as planned.

Some three years ago I made a list of activities I wanted to undertake with the PAW/DOC collection, and this seems a good moment to summarise where I’m up to – the activities and their status are described below:

  • Scan the remaining 4 boxes of paper. Take the opportunity to explore scanning in colour and using PDF. Possibly also using OCR – though this is of much lower priority. DONE (but not OCRd)
  • Write a paper on “The paper artefact in the digital age” using an analysis of the contents of PAW/DOC as the basis for the paper. DONE
  • Explore the issues of longevity and survivability of file formats and of digital indexing and file management systems, using PAWDOC as the basis for the work. This could also include moving the material from FISH and even Filemaker. STILL TO BE DONE
  • Revisit all the requirements listed in my 2001 BIT paper to identify current status and opportunities for further work. STILL TO BE DONE
  • Scan all remaining PAW/DOC paper i.e. all those items in the three archive boxes (most of which have been identified as artefacts to be retained in their physical form). STILL TO BE DONE – but next on list – I’m trying to find a binding machine to be able to sheet feed the documents with comb binding
  • Check that all index entries are valid (i.e. not blank and with an appropriate Movement Field entry) and have an associated populated FISH entry. STILL TO BE DONE
  • Write up a guide to the material and to the technology supporting it. STILL TO BE DONE
  • Hand over PAW/DOC and its supporting technology to the new owner and provide training for the people who will be managing it going forward. STILL TO BE DONE

Waiting for the hedge to bud

The basic T-Shirt photos were taken successfully last week by putting a large sheet of white paper on the conservatory floor to act as a background, and by attaching the camera on a mini tripod to a plank extending over the top of the background by weighing one end of it down on a table with heavy books. It worked well. One of the resulting photos is below.

Lotus T-Shirt

Since then I’ve been concentrating on cutting out the lifesize heads that are to be used for the ‘Feeling’ photos, and designing and painting the images on the different heads. This is all finished now – the heads are in the picture below.

IMG_4215

Now I’m just awaiting the best moment to actually take the photos with the hedge as background. The hedge is just starting to bud and I want to take the photos when the bright green buds are clear and prominent but still sufficiently small as not to totally obscure the bare branches inside the hedge. I’m all set to go so it’s just a matter of keeping an eye on the hedge and deciding when to do it. In the meantime I need to get on and design the climate change collage – which will involve me finding out a little bit more about the basic physics that are causing the changes to occur.

Halfway through the photo-shoot

After my last entry I got stuck in and spent a day creating selfie photos of me in the T- Shirts in a separate location for each – these will be the ‘trophy’ set of photos. I then turned to the challenge of selecting the objects that would go with each T-Shirt for the ‘reminder’ set of photos. This has been a deal more demanding and a much slower process – but it’s finished now. For each trophy and reminder setup, I’ve taken multiple photos so that I’ll have plenty of choice when it comes to picking the images to use. But that will come later – I still have the ‘basic T-Shirt’ and the ‘Feeling’ photos to plan and take. For the latter, I’ve decided to suspend the T-Shirts using a stick through the arms and to put a painted paper head cut-out on top. The colour and shape of the painting on the heads will reflect the feelings the T-Shirts inspire. I’m planning to take these photos with the T-Shirts leaning against the low hedge in the garden with the painted heads sticking up above the hedge with the field beyond providing the background.

As for how all these photos will go together, I’m aiming to create a collage with a climate change theme which will go in my 35 x 23 in poster frame – plenty to think about!

Some thinking about T-Shirts

Yesterday I started musing about the T-Shirts I have collected and what I can do with them. I concluded that they could be photographed being worn; that they could be photographed as items in their own right; that their images could be displayed together in a variety of ways (I kept thinking they would look good on a mug); and that the original physical artefacts could be transformed into other things such as cushions.

Today I took the T-Shirts out and had a good look at them. There are 10 (with one duplicate which I shall ignore). Then I thought some more – but this time about what they meant to me. This time I concluded that they are trophy items – I have them as evidence that I was somewhere or did something; that, as I thought previously, that they could be photographed as items in their own right; that they remind me about the purpose or event they were created for; and that, finally, they, invoke the personal feelings I have about those purposes and events. These will be the four aspects which I shall be trying to capture in coming weeks

Scoping Document for Photo Collection

With the experience of doing Preservation Planning on the PAW-PERS collection under my belt, I’ve started work on my collection of photos and videos. To manage the process, I shall produce a Scoping Document, a Plan document, and a Maintenance document as originally envisaged. However, the Scoping document has now changed substantially to reflect the lessons learned in the PAW-PERS work and I’ve incorporated those changes in a reusable Preservation Planning Scoping Document Template. By the end of this exercise on the Photo collection I aim to have templates for each of three documents which I’ll be able to use for the original target piece of work – preservation planning on PAW-DOC  (my lifetime collection of work documents).

But back to the immediate task – the photo collection. I’ve completed the Photo Scoping Document so I now need to address the pre-planning tasks – which include assessing each of the different file types in the collection, deciding what formats to change them to (if at all) and becoming familiar with any conversion tools that are to be used. The need to do this before the Preservation Project Plan is created was one of the key findings from the earlier work on the PAW-PERS collection.

First Preservation Planning Trial – Done!

Yesterday I completed the Preservation Planning work on my PAW-PERS collection. The final activity was to create a Preservation Maintenance Plan which will initiate future work at intervals down the years. I had never undertaken preservation planning before, so the whole exercise was designed to be one of learning by trial and error. The Preservation Maintenance Plan essentially captures all that learning by specifying a preferred process for the next scheduled preservation maintenance activity on the PAW-PERS collection.

Having completed work on PAW-PERS, I shall now try out the improved preservation planning process on my collection of 17,000 photos. This exercise will identify any further process refinements that are needed, before I start on the original objective of all this work – to undertake preservation planning on my lifetime collection of working documents (referred to as PAW-DOC). This is a very large collection of diverse and some very old files all managed in a proprietary document management system integrated with an index held in a Filemaker database. Performing preservation planning on this beast is going to be a stretch for me as an individual with no corporate resources to draw on – I will need all the knowledge and expertise I can glean from working on the PAW-PERS and the Photo collections.

UCL’s Online Digital Curation Course

A couple of weeks ago I joined UCL’s free – and excellent – 8 week online Digital Curation course. It has several hundred participants from all over the world – many of them professionals and students in the Archiving and Curation field. The course covers what digital curation is, how it is performed, and its major activities and communities worldwide, as well as leading participants through some practical digital curation work on their own files. This latter activity is a perfect fit with the trial I am currently performing of an approach to creating and planning a Preservation Plan.  The course also encourages participants to discuss what is being taught and, although I’m actually doing digital curation work, I’m an amateur with no training, so I’m finding it very valuable to listen to the perspectives of specialists in the area.

Last week in the course we were asked to write about our digital mindset – our early experiences with computers and any turning points where we suddenly became more aware of the digital world we are now in. This was my (slightly augmented) contribution:

I first came across computers at university where we handed our punched card programs into the Computer Dept and collected the results a day or two later. In my first job in Kodak I experienced computerised stock control, sales estimating and factory production planning, and was fascinated. I became a Needs Analyst. However, it wasn’t till I joined the National Computing Centre’s newly formed Office Systems division in 1980 that the digital penny really dropped. The job was to seek out best practice and spread it to UK organisations. It was a time when Word Processing was gaining ground, personal computers were being introduced and electronic mail was just emerging. Within a year I knew that the future for the individual, both in the office and at home, was digital. I plunged in enthusiastically. I started filing all my documents using an index knowing that eventually the index would be computerised and that the documents themselves would be digitised; I replaced my pocket and desk diaries with a constantly updated folded A4 page that I kept in my wallet; and I rushed to work early in the morning to furiously communicate with distant colleagues in the British Library electronic journal project BLEND. By the time I took my next job in 1984 my path was set and the remaining 26 years of my career were spent harnessing the increasing power and lowering costs of computers to augment my digital visions. At home, we started budgeting on the EazyCalc spreadsheet, our addresses were held in a database, and I started indexing and scanning every family photo. At work, my wallet diary was eventually replaced by an Organiser and then mobile phone (though my wallet diary sheets are the best diary records I have); and I immersed myself in email, Computer Conferencing services, and research in configurable message systems. My file index was computerised on a Mac and eventually I started scanning my documents into a document management system. Shortly afterwards I started to experience preservation anxiety when I realised that this ever expanding, increasingly precious collection of all my work knowledge was utterly dependent on the next 30 years of effective back-up procedures and flawless migrations through many upgrades of three software products, the operating system, and my laptop.

When I retired in 2012 and was released from the overload hell that email had become, I had time to digitise the boxes of mementos accumulated since 1958. So, now I have a 33Gb digital collection of all my work documents (approx 180,000 scanned pages) which is in serious need of a preservation plan and a final destination. I also have a 44Gb collection of 17,000 family photos, and a 7Gb collection of 1600 digitised family mementos – both of which have a destination (my offspring) but which also require a preservation plan and a mechanism for informing, and handing them over to, the unsuspecting recipients. My digital vision for the workplace has long since been achieved; but there is much left to explore in the home – how to show, share and bring to life our physical and digital objects, and how to ensure they are reliably passed through the generations; and of course, ways to allay the ever-present preservation anxiety associated with such precious collections.

PDF/A Flavours and Error Messages

A week ago I acquired an updated version of my eCopy PDF Pro Office software with much more comprehensive facilities for creating PDF/A documents. Since then I’ve been exploring what those facilities are and using them on the files I’m converting in my PAW-PERS collection. The updated eCopy software provides support for PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, and PDF/A-2u. Broadly speaking, PDF/A-1b seems to be the most basic level of conformance required and aims to achieve a reliably rendered visual appearance. PDF/A-1a supports additional features such as Tags and Language, while PDF/A-2 (which was published after PDF-A-1) also ensures that layers, transparency and embedded files are preserved. eCopy enables you to check whether a document conforms to any one of these standards, and I used this facility to check that the documents I was converting to PDF from other formats, complied with PDF/A-1b. On almost every occasion, even though I was using the eCopy software to convert the documents into PDF, the compliance check threw up errors – below are examples of some of the most common ones.

  • xmp: CreateDate Bad XMP Date: ‘2015-01-27T09:56:01Z:P’ Page;1 Number;1 (The XMP metadata stream should conform to XMP specification);
  • Mismatch between xmp:CreateDate (‘2015-01-27T09:56:01Z:P’) and CreationDate (‘D:20150127095601Z’) (The XMP metadata stream should conform to XMP specification);
  • DeviceRGB used in image but no output intent (Device-specific colour space used, but no Output Intent is defined for the file)
  • Output Intent missing (Non-Device-independent colour space is used but no OutputIntent is not defined).
  • Missing PDF/A identifier (the PDF/A version and conformance level of a file shall be specified using the PDF/A identification extension schema in the XMP packet)

eCopy also provides a “Fix” facility which in most cases cleared the errors – though only if the resulting file was saved with a different file name. In some cases however, even the Fixed file still had errors in it which were only cleared by a further “Fix” and saving the file to yet another file name.

This turned out to be a rather tortuous process so I decided that I was only going to check and ensure PDF/A-1b compliance for the files that, at the start of this exercise, were not in PDF format at all. The remaining 800+ files which were already in PDF format at the start of this exercise will have to stay as they are for now. I’ve checked a few of them and they all have several compliance errors, but to ensure they all complied with PDF-A1-b would consume more time than I am prepared to spend right now.

The key findings from this phase of the work so far are that it is vital to fully understand the file formats you are targeting, and to become very familiar with the software you intend to use, before creating the Preservation Plan. Without that knowledge the Plan is likely to be unrealistic and almost impossible to stick to.

An antidote for a brainwash

These two statements in William Keegan’s article in today’s Observer have prompted me to flesh out this idea: “it should never be forgotten that the coalition inherited a burgeoning economic recovery in the summer of 2010 and proceeded to bring it to a halt with its misguided programme of austerity” and “ I think I heard the prime minister come out yet again on the wireless the other day with that pre-Keynesian howler – much in vogue with the German economic establishment – that when the private sector cuts back, it makes sense for the public sector to cut back too. On the contrary, it does not make sense, and was the reverse of what was needed after the depression which followed the financial crash of 2008-09”. I’m fed up with politicians, clerics, lobbyists and other people with an axe to grind, brainwashing us with stuff that I suspect they do not fully understand, or that they are twisting to their own ends, or, worse still, that they are simply lying about. I’d like to see these points tested in the media by sending them to the Telegraph, Times/Sunday Times, Financial Times, Guardian/Observer, and The Independent, and asking them to research the following aspects: a) was the quote an accurate record of the statement? b) was any additional meaning imparted by the context in which the statement was made? c) what evidence did the person making the statement base it on? d) what are the findings of the research that has been done on the subject e) what experiments/empirical tests have been performed to validate each main set of findings? e) what is the broad consensus of the professionals in the field concerned regarding each of the main sets of findings?

A first attempt at a Preservation Plan

Shortly after my last entry, I set about creating a Preservation Plan for my PAW-PERS collection of personal documents and mementos. I combined elements of Project Planning that I had experienced while working as an IT professional together with aspects of the preservation planning concepts already documented in the Scoping Document. The Project Plan consists of two documents – a Project Plan Description and a Project Plan Chart.

I sent drafts of these two documents to Chris Hilton, William Kilbride and Neil Beagrie asking for comments. Chris Hilton of the Wellcome Foundation very kindly sent me back his views just before Christmas – in summary, he thought the plans were thorough and that the decision to convert most documents to PDF or PDF/A was a good one. He also suggested keeping the original versions of any documents containing some processing components (such as spreadsheets) which may not be captured within the PDA format; and he endorsed keeping off-site copies.

Neil Beagrie put me in touch with Gabriela Redwine of Yale University who is doing work on Personal Digital Archiving for the DPC (Digital Preservation Coalition). She too provided a positive reaction to the Preservation Plan. So with these two endorsements I set about implementing the plan – the first part of which requires that those documents that need to be retained in their original form are identified; and that the remaining files are converted to PDF/A.

Unfortunately the rigours of Christmas and a subsequent call on my time to help my son and his wife do initial renovation work on their new house, have interrupted progress. However, even the little I have done so far has identified a number of issues: a) conversion of an htm document into a PDF document using my PDF package (eCopy PDF Pro Office) did not produce a good similarity. The most reliable rendition was achieved by copying the htm screen into a Word document and then turning that into a PDF; b) a 2010 article from Ohio State University alerts readers that Word 2007 only produces a so-called PDF/A-1b version which does not include tags and mark-ups and which is suitable for documents which are primarily image-based and do not have alternate text. The more complete PDF/A-1a version enables screen reader technology to correctly read the document to disabled persons; c) It seems that even if you have software that can convert to PDF/A format, it still only places the “PDF” extension at the end of the file name, thereby providing no explicit confirmation of whether the file has been converted successfully to PDF/A or whether a file is or is not PDF/A compliant.