Retrospective Preservation Observations

Yesterday I reached a major milestone. I completed the conversion of the storage of my document collection from a Document Management System (DMS) to files in Windows Folders. It feels a huge release not to have the stress of maintaining two complicated systems – a DMS and the underlying SQL database – in order to access the documents.

From a preservation perspective, a stark conclusion has to be drawn from this particular experience: the collection started using a DMS some 22 years ago during which I have undergone 5 changes of hardware, one laptop theft and a major system crash. In order to keep the DMS and SQL Db going I have had to try and configure and maintain complex systems I had no in-depth knowledge of; engage with support staff over phone, email, screen sharing and in person for many, many hours to overcome problems; and backup and nurture large amounts of data regularly and reliably. If I had done nothing to the DMS and SQL Db over those years I would long ago have ceased to be able to access the files they contained. In contrast, if they had been in Windows folders I would still be able to access them. So, from a digital preservation perspective there can be no doubt that having the files in Windows Folders will be a hugely more durable solution.

When considering moving away from a DMS I was concerned it might be difficult to search for and find particular documents. I needn’t have worried. Over the last week or so I’ve done a huge amount of checking to ensure the export from the DMS into Windows Folders had been error free. This entailed constant searching of the 16,000 Windows Folders and I’ve found it surprisingly easy and quick to find what I need. The collection has an Index with each index entry having a Reference Number. There is a Folder for each Ref No within which there can be one or more separate files, as illustrated below.

Initially, I tried using the Windows Explorer search function to look for the Ref Nos, but I soon realised it was just as easy – and probably quicker – to scroll through the Folders to spot the Ref No I was looking for. The search function on the other hand will come in useful when searching for particular text strings within non-image documents such as Word and PDF – a facility built into Windows as standard.

I performed three main types of check to ensure the integrity of the converted collection: a check of the documents that the utility said it was unable to export; a check of the DMS files that remained after the export had finished (the utility deleted the DMS version of a file after it had exported it); and, finally, a check of all the Folder Ref Nos against the Ref Nos in the Index. These checks are described in more detail below.

Unable to export: The utility was unable to export only 13 of the 27,000 documents and most of these were due to missing files or missing pages of multi-page documents.

Remaining files: About 1400 files remained after the export had finished. About 1150 of  these were found to be duplicates with contents that were present in files that had been successfully exported. The duplications probably occurred in a variety of ways over the 22 year life of the DMS including human error in backing up and in moving files from off-line media to on-line media as Laptops started to acquire more storage. 70 of the files were used to recreate missing files or to augment or replace files that had been exported. Most of the rest were pages of blank or poor scans which I assume I had discovered and replaced at the point of scanning but which somehow had been retained in the system. I was unable to identify only 7 of the files.

Cross-check of Ref Nos in Index and Folders: This cross-check revealed the following problems with the exported material from the DMS:

  • 9 instances in which a DMS entry was created without a Index entry being created,
  • 9 cases in which incorrect Ref Nos had been created in the DMS,
  • 6 instances in which the final digit of a longer than usual Ref No had been omitted (eg PAW-BIT-Nov2014-33-11-1148 was exported as PAW-BIT-Nov2014-33-11-114),
  • 3 cases in which documents had been marked as removed in the Index but not removed from the DMS,
  • 2 cases in which documents were missing from the DMS export.

It also revealed a number of problems and errors within the 17,000 index entries. These included 12 instances in which incorrect Filemaker Doc Refs had been created, and 6 cases in which duplicated Filemaker entries were identified.

The overall conclusion from this review of the integrity of the systems managing the document collection over some 37 years, is that a substantial amount of human error has crept in, unobtrusively, over the years. Experience tells me that this is not specific to this particular system, but a general characteristic of all systems which are manipulated in some way or other by humans. From a digital preservation standpoint this is a specific risk in its own right since, as time goes by, as memories fade, and as people come and go, the knowledge about how and why these errors were made just disappears making it harder to identify and rectify them.

Started and Exported

A week ago the Pawdoc DP project started in earnest after 14 months of Scoping work. The Project Plan DESCRIPTION document and associated Project Plan CHART define a 5 month period of work in 10 separate sections. The Scoping work proved to be extremely valuable in ensuring as far as possible that the tasks in the plan are doable and of a fixed size. No doubt there will be hiccups but they should be self contained within a specific area and not affect the viability of the whole project.

It took rather longer than anticipated to get the m-Hance utility to a position where it can be used to export the PAWDOC files – though I guess such delays are typical in these kind of transactions. First there was an issue around payment caused by the m-Hance accounting system not being able to cope with a non-company which could not be credit checked. I paid up front and the utility was released to me once the payment had gone through the bank transfer system. After that there followed a period of testing and some adjustment using the export facility WITHOUT deletion in Fish. At that point I finalised the Plan and the Schedule and started work. However, although it was believed that the utility was working as it should, there followed a frustrating week during which its operation to export WITH delete (needed so that I could check any remaining files) kept producing exception reports and the m-Hance support staff produced modified versions of the utility. There’s an obvious reminder here that nothing can be assumed until you try it out and verify it. Anyway, all is well now and the export WITH delete completed successfully late last night. I decided against re-planning to accommodate the delays in running it in the belief that I can make up the time in the course of the three weeks planned to check the output from the export.

Principles, Assumptions, Constraints, Risks

The export utility to move the PAWDOC files out of the Fish document management system and into files residing in Windows Explorer folders, has been completed by the Fish supplier, m-Hance. Broadly speaking, it will deliver files with a title which starts with the Reference Number; then has three spaces followed by the file description that I originally input to Fish (truncated after 64 characters); and ending with the date when the file was originally placed in Fish. I have already received the utility documentation which provides full instructions of how to install and run it and am confident I know what to do. So all that remains is for me to receive the utility (which I expect early next week) and to give it an initial test run on the PAWDOC collection in Fish.

I’ve already created a full draft of the Project Plan Description document and the Project Plan Chart, so the test run will inform me of any final changes that I need to make to the plan. After that, all that will be left to do is to fix an overall start date and then to insert the start and end dates for each task.

One part of the Project Plan Description that was of particular interest to construct was the section on Principles, Assumptions, Constraints and Risks. Since some of them really require expert digital preservation knowledge and experience – commodities which I don’t have – I’ve sent these out to my colleagues Matt Fox-Wilson, Jan Hutar, and Ross Spencer in the hope that they will let me know of any serious errors of judgement that I may have made. The text of the section I sent them is shown below:


The Principles below have been followed in the construction of this Project Plan, and will be applied throughout the performance of the project:

  • No action will be taken which will increase the cost or effort required to maintain the collection
  • Backup, disaster recovery and process continuity arrangements are considered to be significant factors in ensuring the longevity of a collection and will therefore be included as an integral part of this preservation project plan.
  • All Preservation actions on individual document files will be undertaken after the files have been transferred out of Fish into stand-alone files in Windows folders, so that a substantial number of transferred documents will be subjected to detailed scrutiny thereby improving the chances of identifying any generic errors that may have occurred in the transferrence process.


The Assumptions below have been followed in the course of constructing this Project Plan.

It is assumed that:

  • The analysis of the files remaining in Fish after the ‘Export and Delete’ utility has been run, will take no longer than three weeks elapsed time.
  • There is no publicly available mechanism to convert Microsoft Project (.mpp) files earlier than version 4.0.
  • There is no publicly available mechanism to convert Lotus ScreenCam (.scm) files produced earlier than mid 1998.
  • Application and configuration files that were included in the collection do not need to be able to run in the future as they do not contain content information. The mere presence of the files in the collection is sufficient.
  • The zipping of a website is currently the easiest and most effective way of storing it and providing subsequent easy access.
  • Versions of Microsoft Excel Word from 1997 onwards are not in immediate danger of being unreadable and therefore require no preservation work. Earlier versions are best converted to the latest version of Excel that is currently possessed – Excel 2007.
  • Versions of Microsoft Word for Windows from 6.0/1995 onwards are not in immediate danger of being unreadable and therefore require no preservation work. Earlier versions, including those for Macintosh, are best converted to the latest version of Word that is currently possessed – Word 2007.
  • Versions of Microsoft PowerPoint from 1997 onwards are not in immediate danger of being unreadable and therefore require no preservation work. Earlier versions, including those for Macintosh) are best converted to the latest version of PowerPoint that is currently possessed – PowerPoint 2007
  • None of the versions of HTML, including those pre-dating HTML 2.0, are in immediate danger of being unreadable; and therefore no preservation work is required on any of the Collection’s HTML files.


This project may be limited by the following constraints:

  • Some of the disks and zipped files in the collection contain huge numbers of files of various types and organised in complex arrangements. To address the preservation requirements of these particular items could delay the project indefinitely. Therefore no attempt will be made to undertake preservation work on these items; but, instead, a note will be included in section 3 of the Preservation Maintenance Plan (Possible future preservation issues).
  • Disks that can’t be opened must remain in the Collection in physical form only.
  • No automated tools are available for undertaking conversions of large numbers of files; and the use of macros has been discounted as being too error-prone and risky. Therefore, all the Preservation work defined in this Project Plan has to be undertaken manually by a single individual.


There is a risk that:

  • The Zamzar service may be unable to convert some of the files submitted to it, despite tests having been completed successfully.
    Mitigation: record the need to take further actions on specified files in the future, in section 3 of the Preservation Maintenance Plan
  • The analysis of the files remaining undeleted after the Fish file export has taken place, may throw up unexpected issues and may take much longer than anticipated. Mitigation: After two and a half weeks work on this activity, the issues will be recorded in a document, and the need to address the issues in the future will be recorded in section 3 of the Preservation Maintenance Plan.

Final Planning underway

Since about last April, I’ve been planning various aspects of the project to preserve my PAWDOC document collection.  This has included:

  • Deciding what to do with zip files
  • Analysing problem files identified by the DROID tool
  • Figuring out how to deal with files that won’t open
  • Investigating all the physical disks associated with the collection including backup disks

All of this work has now been completed, and a clear plan identified for each individual item that requires some preservation work.

In parallel, I have been exploring the possibility of moving the collection’s documents out of the Document Management System it currently resides in (Fish), to standard windows application files residing in Windows Explorer folders. This has included detailed planning of the structure of the target files, and of the process that would have to be undertaken to achieve the transformation. The Fish supplier has recently told me that a utility to undertake this move is now available, and I have confirmed that I want to go ahead with this approach. We are now entering a phase of detailed testing and further planning to verify that this is a viable and sensible way forward. Should no significant obstacles be identified, I anticipate being ready to undertake the move out of the Fish system sometime in January 2018.

Since the bulk of the planning work has now been completed, it has been possible to assemble a draft Preservation Project Plan CHART which itemises each piece of work that will be required. Using this is a base, and incorporating the outcome of the work on the utility with the Fish supplier, I shall start to assemble the overall Preservation Project Plan Description document, and to allocate timescales and effort to each task on the plan.

Dealing with Disks

One very specific aspect of digital Preservation is ensuring that the contents of physical disks can be accessed in the future. I found I had four types of challenges in this area: 1) old 5.25 and 3.5 disks that I no longer have the equipment to read; 2) a CD with a protected video on it that couldn’t be copied; 3) two CDs with protected data on them that couldn’t be copied; and 4) about 120 CDs and DVDs containing backups taken over a 20 year period. My experiences with each of these challenges are described below:

1)  Old 5.25 and 3.5 disks: I looked around the net for services that read old disks and I eventually decided to go with LuxSoft after making a quick phone call to reassure myself that this was a bona fide operation and the price would be acceptable. I duly followed the instructions on the website to number and wrap each disk, before dispatching a package of 17 disks in all (14 x 5.25, 2 x 3.5, 1 x CD). Within a week I’d received a zip file by email of the contents of those disks that had been read and an invoice for what I consider to be a very reasonable £51.50.  The two 3.5 disks and 1 CD presented no problems and I was provided with the contents. The 5.25 disks included eight which had been produced on Apple II computers in the mid 1980s and these LuxSoft had been unable to read. I was advised that there are services around that can deal with such disks but that they are very expensive; and that perhaps my best bet would be to ask the people at Bletchley Park (of Enigma fame) who apparently maintain lot of old machines and might be willing to help. However, since these disks were not part of my PAWDOC collection and I didn’t believe there was anything particularly special on them, I decided to do nothing further with them and consigned them to the loft with a note attached saying they could be used for displays etc. or destroyed. Of the six 5.25 disks that were read, most of the material was either in formats which could be read by Notepad or Excel, or in a format that LuxSoft had been able to convert to MS Word, and this was sufficient for me to establish that there was nothing of great import on them. However, one of 5.25 disks (dating from 1990), contained a ReadMe file explaining that the other three files were self-extracting zip files – one to run a communication package called TEAMterm; one to run a TEAMterm tutorial; and one to produce the TEAMterm manual. Since this particular disk was part of the PAWDOC collection (none of the other 5.25 disks were), I asked LuxSoft to do further work to actually run the self-extracting zips and to provide me with whatever contents and screen shots that could be obtained. I was duly provided with about 30 files which included the manual in Word format and several screen shots giving an idea of what the programme was like when it was running. LuxSoft charged a further £25 for this additional piece of work, and I was very pleased with the help I’d been given and the amount I’d been charged.

2) CD with Protected Video files: This CD contained files in VOB format and had been produced for me from the original VHS tape back in 2010. The inbuilt protection prevented me from copying them onto my laptop and converting them to an MP4 file. After searching the net, I found a company called Digital Converters based in the outbuildings of Newby Hall in North Yorkshire which charged a flat rate of £10.99 + postage to convert a VHS tape and to provide the resulting MP4 file in the cloud ready to be downloaded. It worked like a dream: I created the order online, paid the money, sent the tape off, and a few days later I downloaded my mp4 file.

3) CDs with protected data: I’d been advised that one way to preserve the contents of disks is to create an image of them – a sector-by-sector copy of the source medium stored in a single file in ISO image file format. This seemed to be the best way to preserve these two application installation disks which had resisted all my attempts to copy and zip their contents. After reading reviews on the net, I decided to use the AnyBurn software which is free and which is portable (i.e. it doesn’t need to be installed on your machine – you just double click it when you want to use it). This proved extremely easy to use and it duly produced image files of the two CDs in question in the space of a few minutes.

4) Backup CDs and DVDs: The files on these disks were all accessible, so I had a choice of either creating zip files or creating ISO image files. I chose to create zips for two reasons: first, I wanted to minimise the size of the resulting file and I believe that the ISO format is uncompressed; and, second, on some of the disks I only needed to preserve part of the contents and I wasn’t sure if that can be done when creating a disk image.

Having been through each of these 4 exercises, there are some general conclusions that can be drawn:

  • The way to preserve disks is to copy their contents onto other types of computer storage.
  • The file size capacities of old disk formats are much smaller than the capacities of contemporary computer storage formats. For example, none of the 5.25 disks contained files totalling more than 2 Mb; the CDs contain up to about 700 Mb; and even the DVDs contain no more than 4.7 Gb. In an era where 1Tb hard disks are commonplace, these file sizes aren’t a problem.
  • There are three stages in preserving disk contents; first, just getting the contents from the disk onto other storage technology; second, being able to read the files; and third, should the contents include executables, being able to actually run the programs.
  • The decision about whether you want to achieve stages 2 or 3 will depend on whether you think the contents and what they will be used for, merit the extra effort and cost involved. In the case of the 5.25 disk containing TEAMterm software described above, providing a capability to run the application would have involved finding an emulator to run on my current platform and getting the programme to work on it. I judged that to be not worth the effort for the purpose that the disk’s contents were being preserved for (to be a record of the artefacts received by an individual working through that stage of the development of computer technology).

Hikes through the preservation hinterland

I’ve just finished dealing with two particular digital preservation challenges that exist within the document collection I’m currently working on. The first involved two Lotus Notes files; and the second concerned some Windows Help files. My experience with these issues illustrates a) how just a few files can take a lot of work to resolve, and b) that there’s often an answer out there to seemingly impossible preservation problems provide you are prepared to look diligently enough.

I really didn’t believe I was going to find a way to unlock the Lotus Notes files since Notes is a major and very expensive piece of software that I don’t possess; and, in any case, it applies sophisticated time-limited password and encryption controls for its use. Despite being aware of these issues, I thought I’d take a quick look on the net to see if I could find any relevant advice. It was time well spent; I discovered that it’s possible to download a local evaluation copy of Notes for 90 days, and that, because it doesn’t run on a server, this sometimes enables old Lotus Notes files to be opened. I duly downloaded the software and installed it; and then, regardless of the mysteries of Notes access controls, had access to the whole of one of the files (which contained conference-type material) and to parts of the other (which contained sent messages). I still had the username and expired password from the time the files were created and I think this may have helped to access the latter – though I’m not sure about that. Anyway, in both cases, I was able to print out the material to PDF files. I had to manually reorder the conference-type material and to reinstate a few hundred links in it, but that was it – job done!

The Windows Help files were a lot more demanding. Microsoft stopped supporting the WinHelp system (.HLP files) in 2006 in favour of its replacement, Compiled HTML Help (.CHM files). Although Microsoft did issue a WinHelp viewer for Windows 7 in 2009, WinHelp is essentially an obsolete format – it isn’t supported in Windows 10. I’m still running a Windows 7 system so am still able to view the HLP files – but they had to be converted now if they are ever to be accessed again in the future.

There is much material on the net about how to convert HLP files into CHM files, but, as someone with no knowledge at all about how files in either of these systems are constructed, I didn’t find it easy to understand. I soon realised that converting from one to the other was going to be a challenge. However, I did eventually find a web site which offered clear practical advice which I could follow (, and I duly downloaded the recommended HLP decompiler; and the Microsoft HTML Help Workshop software. The process to be followed went something like this:

  • Decompile the HLP file into its component parts (consisting of a help project file with the extension .hpj, along with one or more .rtf documents, an optional .cnt contents file, and any image files – .bmp, .wmf, or .shg – that are used within the Help file).
  • Convert the various HLP files into HTML Help files using a wizard in the HTML Help Workshop tool (the new files consist of a project file with the extension .hhp, one or more HTML files, a .hhc contents file, an optional .hhk index file, and any image files that are used within the Help file).
  • Set parameters in the hhp file to specify a standard Window name and size; and to have a search capability created when the files are compiled into a single CHM file.
  • Reconstruct the Table of Contents using the original HLP file as a guide (in many cases no Table of Contents information comes through the conversion process – and, even when some did, it had lost its numbering). Where the contents had to be created from scratch, each new content item created had to be linked to the specific HTML file to be displayed when that content item is selected.
  • Re-insert spacings in headings: The conversion process also loses the spacing in headings in the base material resulting in headings that look like this, ‘9.1Revised System’ instead of like this ‘9.1  Revised System’. To rectify this problem, the spacings have to be manually re-inserted into each HTML file of base material.
  • Compile the revised files into a single CHM file.

The first HLP file I tried this out on contained just a single Help document with some 130 pages. It took a bit of figuring out, but I eventually got the hang of it. However, the second HLP item was in fact made up of 86 separate HLP files all stitched together to present a unified Table of Contents in a single window in which the base material was also displayed. Many of these 86 separate files had 50 or more pages, and some had many more than that; and each page had to represented separately in the Table of Contents. It was a very long tortuous job converting all 86 HLP files and ensuring that each one had a correct Table of Contents (I didn’t attempt to re-introduce the spacing in the headings – that would have been a torture too far). However, that was not the end of it; the files then had to be stitched together in a single overall file that combined all the individual Tables of Content and that displayed all the base material. This involved inserting a heading for each document, in the master file; and inserting a linking command to call up the Table of Contents for that particular document. Oh, and I should also mention that the HTML Help File Workshop software was very prone to crashing – not a little irritating – I soon learnt to save regularly…..

This overall task must have taken at least 30 or 40 hours – but I did get there in the end. The new CHM file works fine and is perfectly usable, despite three of the documents being displayed in separate windows instead of the single main window (although I spent some time on this issue I was unable to eliminate the problem). Of course, the lack of spacing in the headers is immediately noticeable – but that’s just cosmetics!

No doubt there are specialists out there who would have made a quicker and better job of these conversion activities. However, if you can’t find such people or you haven’t got the money to throw at them, the experiences recounted above show that, with the help of the net, it’s worth having a go yourself at what you may consider to be your most difficult digital preservation challenges.

Scoping Document Finalised

Back in February, work started on the draft Scoping Document for the digital preservation actions required on the PAWDOC collection. Having spent some months actually doing bits of the work identified in the document and refining the document with the insights gained in the process, the final version of the Scoping Document has now been completed. It includes the following list of things that have to be done before a Project Plan can be produced:

  • Decide what document management system or alternative, and any associated databases, are to be used going forward.
  • Decide if Filemaker is to be retained as the platform for the Index or if it is to be replaced going forward.
  • Establish the future platform strategy.
  • Research and understand the actions required to:
    • make any moves planned from one piece of software to another; or from one platform to another;
    • be able to open those documents that don’t currently open;
    • promote the long term accessibility and survivability of all categories of document in the collection;
    • mitigate against the collection’s CDs and DVDs becoming unreadable;
    • mitigate against the electronic part of the collection being separated from the physical part.

Unfortunately Jan Hutar and Ross Spencer have decided they are unable to make any further substantial contributions to the project due to time pressures and other reasons. However, I continue to hope that they will remain associated with the work and be prepared to answer questions by email as needed. Their input to the early part of the work has been invaluable in getting the project to the point where I am actively investigating the practicalities of moving the electronic documents out of the Fish document management system into flat files in a Windows directory. The Fish supplier has a utility which will perform such a transformation, but much will depend on whether it can be customised to produce the file title format required and how much it will cost.

Alongside this activity, work continues on files that can’t be opened and on issues identified by the DROID analysis. Given the position that the project is in at present I would anticipate being able to complete the project plan sometime in the next 9 – 12 months.

Droid explorations and DMS alternatives

Things have started to move in our efforts to perform digital preservation on the PAWDOC collection. I’ve been running the National Archives DROID tool across the 190,000 files and Ross’s automated analysis of the results has turned up a number of issues including several hundred duplicates which we are investigating. Among other things, DROID identifies file types and versions, and this has helped another strand of our investigations to try and gain access to about three hundred files which can no longer be opened. 150 of these are old PowerPoint files from the early 90s which neither the Microsoft viewer nor the earliest version of OpenOffice can open. However, the Zamzar online service, to which you download a file and specify what format you want it to become, successfully converted all of the examples which I submitted, into a version of Powerpoint I can open. Zamzar can’t deal with every problem file, especially those for which I no longer have the relevant application, for example, MS Project and iThink, though it did convert Visio drawings into PDF. We’re continuing to work through these files with the intention of getting a clear decision about what to do with each one so that specific actions can be included in our eventual preservation project plan.

Another substantial investigation underway is to try and identify a suitable alternative to the document management system (DMS) that controls the collection’s files. The future of the current DMS is uncertain, and is too complex to reinstall on upgraded hardware without expensive consultancy support. Jan’s exploration of alternative DMS and preservation repositories, highlighted the fact that, while there are several free to use public domain systems available, they all require multiple components and appear to be relatively complex to install, configure, and maintain. This observation has prompted me to be a lot clearer about the immediate requirements for the collection. It is hoped to find a long term owner, perhaps working in the field of modern history, and it’s possible that that person or organisation may require more sophisticated search and access control functions. However, until that eventual owner is found, only a minimal level of single user functionality is needed, and minimal system management and cost demands are essential. In light of this greater clarity, we are now also considering a low tech, low cost alternative which would involve inserting the Index reference number into the title of every file and storing all the files in the standard Windows folder system. After identifying a required Reference No in the Index, files would be accessed by putting the reference number into the folder system’s standard search facility. As well as looking at the pros and cons of such a solution, we are also investigating the feasibility of  getting the necessary information out of the current DMS and into the titles of all the document files. A further challenge that would have to be overcome is that the current DMS stores multi-page documents as a series of separate TIF files. If we were to move to the low tech Windows folder system solution, it would first be necessary to combine the files making up a single document into one single file. This would need to be an automated process as there are too many documents to contemplate doing it manually.

All these activities and more are required in order to be able to assemble a project plan with unambiguous tasks of known duration. We are continuing to work towards this goal.

Work Underway

Digital preservation work on the PAW/DOC collection has now started in earnest. The first couple of months were spent getting participants up to speed with a common understanding of what the collection consists of and what process we are going to follow in the work. This was achieved with the following reading list:

PawdocDP-N1 – Ergonomic aspects of computer supported personal filing systems, April1990

PawdocDP-N2 – 20 years in the life of a long term personal electronic filing system, Sep2001

PawdocDP-N3 – Checking PAW-DOC, v1.0, 31May2016

PawdocDP-N4 – Preservation Planning for a Personal Digital Archive – DPC Webinar with presenter notes, 29Jun2016, v1.1

PawdocDP-N5 – Preservation Planning for Personal Digital Collections – DPC Case Note, Apr2016

PawdocDP-N6 – Preservation Planning SCOPING Document Template – v1.1, 11Sep2015

PawdocDP-N7 – Preservation Project Plan DESCRIPTION Template, v1.2 – 10Apr2016

PawdocDP-N8 – Preservation Project Plan CHART Template, v1.1 – 11Sep2015

PawdocDP-N9 – Preservation MAINTENANCE PLAN Template – v1.0, 11Sep2015

On Sunday 2nd April we held our first conference call during which I gave a demo of the collection’s Index and Document Management System; and in which we went through the first draft of the Scoping paper and allocated some of the activities that need to be completed before we will be in a position to create a project plan. These include the following:

  1. Document the FISH supplier’s recommended replacement route – Paul – End April
  2. Document possible alternative Document Management systems/Database Systems and their costs – Ross/Jan to advise on how to redefine tasks 2&3
  3. Document any alternative solutions to using a Document Management System for storing and retrieving the collection’s electronic documents – Jan/Ross to advise on how to redefine tasks 2&3
  4. Perform a DROID analysis of the current version of the PAWDOC collection and send out to the team – Paul – End April
  5. List the documents that can’t currently be opened and categorise them – Paul – End April
  6. Decide what should be done for each category of document that can’t currently be opened – Ross/Jan
  7. Identify what categories of document may not be able to be opened in future – Matt
  8. Decide what should be done for each category of document that may not be able to be opened in future – Ross/Jan
  9. List the CDs and DVDs that may become unreadable and categorise them – Paul – End April
  10. Decide what should be done for each of the categories of CD and DVD that may become unreadable – Matt to decide if he can do this after seeing the list
  11. Document possible solutions to the possibility of the electronic documents becoming separated from the physical documents, and recommend a course of action – Paul – End June

Work is now proceeding on these tasks, though timescales are uncertain since all team members other myself have full time jobs. Our next conference call is scheduled for Sunday 21st May.

PawdocDP Participants

The project to perform digital preservation on the PAW/DOC collection (PawdocDP) started at the beginning of 2017 with myself and four other participants: Matt Fox-Wilson (Ambient Design), Ross Spencer and Jan Hutar (Archives New Zealand ), and Nicolaie Constantinescu (Kosson). We have exchanged introductory emails (see below), and  the team is now reading background material to get up to speed with what the collection is and what state it is in.  We aim to hold a screen sharing conference call in March to demonstrate and explore the digital collection and its supporting systems. The introductory texts sent by each member of the team are shown below.

From: Paul Wilson [] Saturday, 14 January 2017 12:15 a.m.

Hello Ross, Jan, Nicolaie and Matt. Very pleased to have you all on board at the start of this project undertaking Digital Preservation on the pawdoc collection (PawdocDP). To give us all some background on each other, I suggest you reply-to-all to this email with a brief intro about yourself. My intro is below. I’m a retired computer consultant living in Lavendon between Northampton and Bedford in the UK. I got a degree in Ergonomics from Loughborough University in 1972 and got my first job with Kodak where I first started working on the application of computers.  In 1978, I joined the UK’s National Computing Centre where I investigated best practice in Office Automation – that’s when the pawdoc collection came into being. I then spent 28 years with Computer Sciences Corporation (CSC) as, first, a computer consultant, and then as a Bid Manager for IT outsourcing deals. During my professional career I’ve been particularly involved in Office Systems, Requirements Analysis, Process Definition, Workflow Technology, HCI, CSCW, and Architecture Definition and Management. I play golf, and collect stamps and first edition books. My study window looks out on a side road with open fields beyond and 7 wind turbines in the far distance.

From: Constantinescu Nicolaie <> 16 Jan at 7:25 AM

Hello! I’m an information architect for a library and information science community online – and a private enterprise manager. I have been involved with building useful content for all parties from my country that are interested in digital practices and resources preservation since over 10 years now. Right now I’m toiling on a JavaScript manual in Romanian needed so much for a solid foundation that will be followed by a series of data management for librarians. My languages is JavaScript and the Web APIs. Part of my time is dedicated now to writing essential learning materials and advocating for Open Access in Romania.

From: Ross Spencer <> Monday, 16 January 2017 7:34 a.m.

Hello everyone! Thank you Paul.  I am a digital preservation analyst at Archives New Zealand. My background is in software engineering and digital humanities. I have worked at Archives New Zealand for three years, and before then, The National Archives, UK. My primary interests are developing tools for others to use to analyse and sentence digital records within an archival context. I release open source tools on GitHub. My languages are Python and Golang, with a modern day preference for Golang because of its easy portability across platforms without the need to run an interpreter. Outside of work I’m still a programmer, but, I’m also a cyclist. Interested in movies and music. I’m also attempting to learn French – but I have been attempting that endeavour for a long long time now!

From: Jan Hutar <> 17 Jan at 11:33 PM

Hi all, Similarly to Ross I am a digital preservation analyst at Archives New Zealand, same role, different focus as you would expect. My background is classic archival science and then libraries. Before joining Archives NZ in February 2012 I was at the National Library of the Czech Republic in Prague, managing the Digital preservation team there for 5 years. I have got a PhD, my dissertation was about metadata for digitisation and digital preservation, the proposed metadata standard and schema is being used across Czech republic libraries since 2012. My main focus at Archives NZ is keeping our digital preservation system in shape, managing the data in it, getting data in and dealing with all sorts of digital preservation problems. Also digital preservation related policies. Mountain biking is my thing.

From: Matthew Fox-Wilson <> 22 Jan at 9:52 AM

Hi everyone, Sorry for the slow introduction! My exact job is sort of hard to describe but technically I’m the director/owner of a software development company here in New Zealand specialising in creative software for the consumer / prosumer market. My main focus here is on application architecture and UI design, but I’m also responsible for coding application structure and front end systems for our products. We’ve been in operation since 2001 but before then I’ve worked for a variety of companies in NZ and remotely for the US on consumer and pro-level graphics software, and consulted on a variety of projects relating to data sorting and natural methods for presentation, hence my interest in this project. When I’m coding I’m mainly old-school, focused primarily on C++ with a bit of Objective C, for Windows, MacOS, and iOS. Outside work I enjoy trying to recover from work, which mainly takes the form of gym, running, and a sword based martial art.