Preservation Maintenance Plan LITE template

Addenda to ‘Preservation Planning’

In 2021 I published v3.0 of a set of Preservation Planning templates which were designed to enable a rigorous Preservation regime to be applied to large collections of digital documents and their accompanying hardcopy material. However, in my recent investigations into the combination of collections it became apparent that a simpler and quicker approach would be more appropriate for multiple smaller collections with less complex formats. Therefore, a new Preservation Maintenance Plan LITE template has been produced and initially tested on two sets of 10 collections each. Further testing will be done over preservation cycles in the coming years, prior to issuing a version that can be said to be fit for purpose.  In the meantime, the current version is available for use at the link below.

Preservation MAINTENANCE PLAN LITE Template – v1.0, 09Sep2025

A Lite Touch

In the previous post I identified a need to understand the additional digital preservation requirements of the overall combined set of collections. To investigate this, I listed all the individual collections in a spreadsheet and noted some points which have a potentially significant impact on preservation work, including:

  • Does the collection have an index? (if there is no index there is no way to check the inventory – the items themselves define what is in the collection).
  • Does the collection have digital items with or without physical equivalents, and/or physical items with or without digital files? (when an item exists in both digital and physical form, there is more preservation work to do).
  • The number of digital and physical items (there is substantially less preservation work to do on a folder of 30 digital items, than there is on a collection of 500 digital items of which 175 have physical equivalents).
  • Whether there is any duplication with other collections (If a collection is part of a larger set of objects which already has a Preservation Plan, there is no need to specify a separate Preservation Plan for it).

Having populated this Preservation Asessment spreadsheet with its long list of 38 collections that might need Preservation work I was filled with some dismay as I’ve now had several years of implementing Preservation plans on many hundreds, if not thousands, of objects: it’s time consuming and exacting work. I knew that I needed to minimise the time and effort on this new set of preservation activities if it was going to be workable and successful. Furthermore, I also realised that for many of the collections on the list I was not really that concerned about the long term: they were accessible currently – many without needing an index, required little intervention, and might be of little interest many years hence.

With these thoughts in the back of my mind, I went through the list deciding what preservation work, if any, was to be done on each collection. Fortunately, 8 of the collections either already had a Preservation Plan or were part of one of those which had; I discounted another one altogether as it only had one insignificant digital file; and another seven were part of another collection on the list. I also combined 3 of the remaining 22 collections into a single overall Healthcare collection (because there were fewer than 90 files across them all), and 2 of the Book collections into a single overall Physical Books collection (because I knew the two would need to be done together). Finally, I added one other collection to the list – my other general laptop folders which I concluded would also benefit from being under the control of a preservation plan. Consequently, I was left with 20 collections to define Preservation Plans for. This was far too many to be practical, and, in any case, the more I looked at the digital files involved, the more I realised that they mainly consisted of pdf, jpg, png, doc/docx, xls/xlsx,, and ppt/pptx formats – not very problematic. For the most part, an eyeball check would be all that was necessary to identify doc, xls, and ppt files that needed converting to docx, xlsx, and pptx respectively, so the detailed 16-step process required in my comprehensive Preservation Maintenance Plan template would be overkill. I needed to create a LITE version of the Preservation Plan with fewer steps and capable of addressing multiple collections. What I came up with were the following 4 steps:

  • Populate a ‘Changes’ section with the significant changes that have occurred to the collection and its digital platform between the previous maintenance exercise and the maintenance you are about to carry out.
  • Populate a ‘Hardware and operating system strategy’ section with the strategy you envisage for the future.
  • List the collections you want to undertake Preservation activities on in a ‘Contents & Location’ section together with the specific actions you want to take for each one (for example, ‘Check file extensions’ or ‘check inventory’).
  • Record a summary of the actions taken and associated results for each collection, in an ‘Actions taken’ section.

With this structure in mind, I separated the 20 collections into two groups – one which included substantial numbers of physical objects, and one which consisted mainly of digital files. The result was two Lite Preservation Plans each dealing with 10 collections (it’s just coincidence that each have the same number of collections).

The actions specified for each collection were established by assessing what I wanted to protect against for each collection and how much effort I was prepared to make. Six different types of possible actions emerged:

  • Check file formats: Check that the current file formats will enable the files to be accessed in the future, and if not make changes to ensure they will.
  • Check Inventory: Check that the index entries have a corresponding physical item and/or digital file, and rectify any inconsistencies.
  • Ensure physical docs are up to date: Ensure that the physical documents are the latest versions.
  • Ensure Index is up to date: Ensure that the latest additions to the collection are included in the Index.
  • Ensure Digital collection is up to date: Ensure that the latest additions are all included in the digital collection.
  • Ensure Physical collection is up to date: Ensure that the latest additions are all included in the physical collection.

The two Preservation Plans fully populated with the results of the preservation work carried out on them can be accessed at the links below:

Objects Preservation Maintenence Plan Lite dealing with 10 collections

Files Preservation Maintenence Plan Lite dealing with 10 collections

The preservation work, as specified and recorded in both plans, took approximately 20 hours over about a week. This included filling in the Plan documents with the results as each collection was tackled. Overall, the main actions taken were:

 1,976 .doc files converted to .docx: 1,937 of these were converted in bulk using the VBA code kindly provided by ExtendOffice (see https://www.extendoffice.com/documents/word/1196-word-convert-doc-to-docx.html). The remainder were simply opened in Word and saved as .docx files. (a few of these were originally .rtf files).

150 .xls files converted to .xlsx: 141 of these were converted in bulk using another set of VBA code provided by ExtendOffice (see https://www.extendoffice.com/documents/excel/1349-excel-batch-convert-xls-to-xlsx.html), with the remainder being opened in Excel and saved as .xslx files ( a few of these were originally .csv files).

564 files deleted: 464 of these files were in an iTunes folder – and I no longer use iTunes. 36 were CD case covers/spines which I created in an application I no longer have – and the CD covers are all now printed out and in place on the CD cases so I no longer need these files. Most of the remainder were odd files which I no longer have a use for. As is apparent from this description, such files tend to be from folders containing more general material rather than specifically collected and indexed items. Many computers probably have an array of such unneeded material.

Around 9 new items added: 7 of these were added to get a collection up to date, and the others were the two new Lite Preservation Plans which were included in the Backing-up collection.

2 Hardcopies updated: One was a physical A5 ring binder of the addresses in my address database; and the other was my Backing-up and Disaster Recovery document which I print out and keep a copy in my desk drawer. It’s really a bit of an effort to update such documents regularly and so they often get out of date. Having a scheduled Preservation Plan does help to keep them relatively current.

The next cycles of these two Preservation Maintenance Plans are now scheduled for 2027 and 2028 respectively: I can now relax, confident that I have done as much as I wish to future-proof the 20 collections that they deal with.

I have included most of my workings in this post largely to help me be clear of what I did. However, the details are of little consequence to readers interested in undertaking digital preservation work on their collections. They only serve to show that you can call anything a collection, and that you can cut and dice collections any way you want. The key point is that, using this approach, it is feasible to exert a measure of preservation control over a large number of collections, including the files on your computer, with relatively little effort. If you try this out, you may find this Preservation MAINTENANCE PLAN LITE Template helpful.

Published!

Events have moved on apace since my last post three weeks ago. For a start, the publication date moved in stages out to 7th August before coming back in to the 4th August, and the Waterstones web advert which had vanished, reappeared. Then, suddenly, on Saturday 28th June we received an email from the Production Editor saying that the book had been published with information available at https://link.springer.com/book/10.1007/978-3-031-86470-4. We have subsequently received a Congratulatory email from Springer and this together with the website information provides a revealing example of how academic publishing is now operating.

The Congratulatory email includes advice on how to ‘Maximize the impact of your book’ and offers use of ‘a suite of bespoke marketing assets to help you spread the word’. Also included was a link to a PDF version of the published text. The Springer site advises that the ebook (£119.50) was published on 27June, the hardback (£149.99) on 28June, and that the softback will be published on 12July 2026 (price not yet specified). The site also provides a list of the book’s chapters, each of which can be opened to reveal the summary abstract we had been asked to provide, and the full set of references together with any digital links we had included. Each chapter can be purchased separately for £19.95, or one can take out a Springer subscription for £29.99 a month entitling you to download 10 Chapters/articles per month (which, interestingly, would get you pretty much the whole of Collecting in the Icon Age!). Those with appropriate credentials may also be able to login via their institution and get content for free if the institution concerned has come to a separate arrangement with the publisher.

Since hearing that the book has been published, I’ve been working on the supplementary material we are providing in the pwofc website. This includes a single document containing all the references each with an appropriate web link. In searching for such links over the last week I’ve noticed that in several cases, extracts from our book are already appearing in the hit lists. Furthermore, I discovered that previews of many pages of the book (including the whole of chapter 1) are available in Google Books ‘displayed by permission of Springer Nature. Copyright’. All this in less than 7 days since publication.

Two things stand out to me from all this: first, there is a surprisingly large amount of information available for free about the book. It is probably not sufficient if you really are interested in the subject – but you can get a pretty good idea about what the book contains. Second, there is clearly a focused effort to monetise the publication in every possible way.

Now that we’ve achieved publication, I don’t intend to provide any further running commentaries on progress. The material we are providing to supplement the book is in the Icon Age Collecting section of this website, and that is where we intend to conduct any dialogues about the book that should arise.

Welcome to Icon Age Collecting!

Hello. Welcome to this set of materials in support of the book ‘Collecting in the Icon Age‘. There are four main items – all listed below and available in separate files that can be downloaded and viewed on your own device at your leisure. We would be very happy to engage with people who are interested in asking about or using our material. To make contact just provide a reply to the relevant post. We need time to moderate the replies, so please be aware that they may not appear on the website for a few days. However, be assured that we will get back to you one way or the other.

To get to the material you’re interested in, click on the relevant item below:

  1. Questionnaire – answers to questions asked about some of the research materials, which were used to derive the practice hierarchy.
  2. Practice Categorisation – a spreadsheet ordering the practices identified from the questionnaire answers and assessing the impact that IT has had on them.
  3. Practice Hierarchy – image files containing the process hierarchy in single diagrams for both the pre-Icon Age and in the Icon Age.
  4. Expanded References – a single document listing all the references in ‘Collecting in the Icon Age’ and the locations in the referenced texts which are being referred to.

Expanded References

The references in the book ‘Collecting in the Icon Age’ (citia) are at the end of each chapter and without the page number of where the referenced text is located. This Expanded References document combines all the references together into a single, alphabetically-ordered set complete with the number of the page they appear in citia AND the relevant location in the referenced work AND part of the text being referenced, for example, “p268 in citia: p75 (The application of…)”.

p268 in citia is the page number in the book Collecting in the Icon Age                            p75 is the page number of the work being referenced                                                (The application of…) is the start of the text being referenced.

An internet link to the referenced text is also provided where available.                    Printed errors in the citia text are identified in the expanded references document in italics prefaced by ‘NB’.

Plot profile for the movie ‘Eerie AI’

Gronk Pistolbury knew quite a bit about AI. After doing a Phd on ‘Extreme perturbationery and calmic episodes in deeply embedded AI neuron nodes’, he had moved around various high-profile organisations operating LLMs (Large Language Models) in the 2020s and 30s. During those years he had continued to develop his Phd ideas, and, by the mid-2030s, had come to the conclusion that something odd was going on.

His research was based around the analysis of AI hallucinations, and he collected instances of the same from both his own vast bank of automatically generated content, and from whatever other sources reported such an event. His analysis of this material had started to show up similarities and even some duplications across the more recent data sets – and Gronk couldn’t figure out why. He suspected that the hallucinatory material was going back into the internet data pool and affecting the content of the LLM – but he had no real evidence to back up his theory.

In 2038, he had used a large chunk of his savings to take out a three-year subscription to the Jonah Vault – the most extensive and advanced AI Data Centre conglomerate in the world; and to acquire an extremely powerful computing configuration for his own home. His idea was to test out his theory by using the Jonah Bank to produce enormous numbers of AI outputs for analysis by his own specialised system. The analysis would identify hallucinations and map similarities between them – and insert them back into the training data for his own LLM in the Jonah Vault. This was to be done at scale – over a billion instances a month.

By 2041, his research was beginning to show some significant convergences in hallucinatory events; but his Jonah Vault lease had only a few weeks to run and he had no money available to continue to fund his work. It was at this point, however, that Gronk Pistolbury won the Inter-Continental Lottery and pocketed a cool $7.9 billion.

2041 was also the year when Quantum Computing became truly commercially accessible. There had been a few start-ups in the late 30s offering both hardware systems and cloud services. However, it was the arrival of Quiver inc. in 2041, that made Quantum a practical and affordable alternative to conventional digital systems. Gronk took out a $500 million, one-year service contract with Quiver and hired half a dozen of the best quantum/compute engineers he could find, and built a quantum version of his hallucination test bed.

When Gronk set his Quantum operation going, he had hoped that it would significantly speed up the circulatory process of hallucination production and LLM development. However, the system was far more powerful than he had dared hope. It reduced the cycle time by tens of thousands. After 3 months operation it became clear that the LLM was converging on a relatively small number of answers to any question asked of it; and after 6 months it was down to a few hundred characters. Needless to say, the answers now bore no relation to the questions that had been asked. In puzzled awe, Pistolbury and his engineers watched in fascination as the LLM continued to narrow its answers to the questions put to it relentlessly by the Quiver Quantum machine. Finally, after 7 months, 26 days 14 hours, 9 minutes and 4.278 seconds the LLM settled on its final answer to any question about anything – 42.

They had seen it coming but couldn’t quite believe it would happen. It was bewildering, weird, crazy, eerie, but the hallucination machine had said that the answer to any question was 42; and some 63 years earlier, Douglas Adams had said in The Hitch Hiker’s Guide to the Galaxy that the answer to the great question of Life, the Universe and everything was 42. From that answer onwards the hallucination model LLM would give no other answer to any question. It did not reduce the number or change the number or add to it. It stayed, unmoving, at the two characters that a humorous author had just thought up on the spur of the moment in the previous century.

…Should the movie be a success, a possible sequel could follow Pistolbury over the following three decades on an epic quest to understand what had happened, by undertaking a whole variety of way-out experiments producing eerie LLM results. For example, neural node pairing, star refraction hypnosis, and, in all its gory detail, LLM brain fluid crossover.

Note: All of the above is pure fiction. None of the names or dates or scientific claims are real (and some of the science bits don’t even make sense!). Should any of this material find its way into AI answers, it will be because it has been purloined for AI training data; and it would be a graphic example of AIs inability to distinguish reality from fantasy. This little idea for a (really bad) movie plot might even end up playing a supporting role in an AI hallucination… now that would be amusing!

Revised Proofing

Despite me thinking that the proofing process was closed, Springer sent us ‘Revised Proofs’ on Saturday 7th June to check and return by Monday 9th June. This was good news as far as I was concerned as it provided opportunities to both check that the proofing changes we had specified had all been done correctly (and, indeed, I did spot 27 shortcomings); and to specify a further 15 changes which my continuing checks on the references had identified (I might add that the vast majority of all these changes were relatively minor involving changes to only a few words, if that). This time round, we had been asked to specify changes in annotations to a revised PDF, so I used the pdf callout facility to document the change needed in a box with an arrow next to the relevant text. My co-author, Peter, had work priorities over these few days, so the changes – and anything missed – are all down to me.

I duly submitted the annotated proof around 9pm on the night of Monday 9th June; and the next day we received an email from Springer acknowledging receipt of our comments and saying that they would review and incorporate them in accordance with Springer’s guidelines after which they would proceed with the online publication process. I’m not too clear with what ‘the online publication process’ entails; nor do I understand why the publication date continues to move – as at the date of this post in Springer’s web site it currently stands at 26th July. However, I do think that the proofing process is now truly complete. In an interesting development, Waterstones appears to have pulled its web page advertising the book, and I wonder if that is because of they have grown impatient with the continual movement of the publication date. Beck-Shop and Amazon, however, are still offering the title.