An AI Roadmap

Having got the Collecting book published, I’ve been wondering what’s left to do in this OFC journey. Clearly, I need to update the OFC tutorial which is now some 8 years old. However, it was an email some ten days ago announcing a report on AI Preparedness for Archives that started me on a mini-voyage of realisation, and that prompted me to write this post.

I took each piece of the report’s guidance and wrote notes of how I would apply it to my PAWDOC document collection. Then I asked ChatGPT how I could focus an AI chatbot onto a specific archive of data. In the follow-up Q&A, and after providing a 10-line description of PAWDOC, ChatGPT designed a production architecture for such a research grade archive followed by a grant-fundable academic architecture and a cost estimate. At this point, I resisted ChatGPT’s offer to create a ‘ready-to-copy grant proposal document (including abstract, methodology, outcomes, evaluation plan)’ and went for a walk, stunned BOTH by ChatGPTs capabilities AND by the potential for enhancing PAWDOC with an AI interrogation capability.

I should say at this point, that the ChatGPT descriptions I had read were very general in nature and assume an understanding of many activities – they were most definitely not a cook book with ‘do this then that’ instructions. I was aware that my actual knowledge and understanding of what would be required was pretty much zero, and actually doing it for real would be a steep learning curve.

Having mulled all this around in my head for a few days, it seems clear to me that closure of this OFC journey cannot occur until I understand and experience how AI can be used to augment the interrogation of two types of private collection:

  1. primarily text-based archives; and
  2. collections more focused on objects.

These are the types of collections that, in my experience, are most likely to be possessed by private individuals. Note that I am explicitly focusing on ‘private’ collections, because institutions undoubtedly manage their archives and collections differently from private individuals: processes are formally designed to ensure effectiveness and longevity; tasks get done because staff are paid to do them; and IT support is usually at a far greater scale and complexity. Much work is underway to apply AI to institutional collections; however, my focus is to understand how individuals can apply it to their own private collections. With my current level of understanding, I believe I need to investigate the following specific aspects:

  1. The practicalities of preparing a private archive for AI. To explore this, I would most likely use PAWDOC (a primarily text-based archive), and b. my Mementos collection (more focused on objects), to explore this topic.
  2. Researching if AI is capable of accessing and understanding the contents of files other than text – sound, image, and video (Large Language Models – LLMs – specifically deal with text). This is particularly relevant to collections more focused on objects.
  3. The practicalities of building an AI interrogation capability for a private archive which has only an index and information within its digital file names. This would probably be the simplest implementation and so the best one to do first to learn some basics. I would use my Mementos collection to investigate this.
  4. The practicalities of building an AI interrogation capability for a private archive which does have machine readable text content. To investigate this, I could just use the app-generated content within PAWDOC (for example, all the Microsoft office documents within PAWDOC). Alternatively, and more ambitiously, I could try to OCR some or all of the PAWDOC scanned documents and include them in the investigation.
  5. The practicalities of building an AI interrogation facility for a private collection containing a combination of text, sound, image, and video. The viability of this investigation would depend on the outcome of 2. above. It would probably involve extending one or both of the implementations described in 3 and 4.

If I get round to any or all of these journeys, they would be recorded in their own separate spaces within this pwofc.com website; and, should they get completed, their results would be used to update the OFC tutorial. Only then would I consider closing this OFC journey.

Published!

Events have moved on apace since my last post three weeks ago. For a start, the publication date moved in stages out to 7th August before coming back in to the 4th August, and the Waterstones web advert which had vanished, reappeared. Then, suddenly, on Saturday 28th June we received an email from the Production Editor saying that the book had been published with information available at https://link.springer.com/book/10.1007/978-3-031-86470-4. We have subsequently received a Congratulatory email from Springer and this together with the website information provides a revealing example of how academic publishing is now operating.

The Congratulatory email includes advice on how to ‘Maximize the impact of your book’ and offers use of ‘a suite of bespoke marketing assets to help you spread the word’. Also included was a link to a PDF version of the published text. The Springer site advises that the ebook (£119.50) was published on 27June, the hardback (£149.99) on 28June, and that the softback will be published on 12July 2026 (price not yet specified). The site also provides a list of the book’s chapters, each of which can be opened to reveal the summary abstract we had been asked to provide, and the full set of references together with any digital links we had included. Each chapter can be purchased separately for £19.95, or one can take out a Springer subscription for £29.99 a month entitling you to download 10 Chapters/articles per month (which, interestingly, would get you pretty much the whole of Collecting in the Icon Age!). Those with appropriate credentials may also be able to login via their institution and get content for free if the institution concerned has come to a separate arrangement with the publisher.

Since hearing that the book has been published, I’ve been working on the supplementary material we are providing in the pwofc website. This includes a single document containing all the references each with an appropriate web link. In searching for such links over the last week I’ve noticed that in several cases, extracts from our book are already appearing in the hit lists. Furthermore, I discovered that previews of many pages of the book (including the whole of chapter 1) are available in Google Books ‘displayed by permission of Springer Nature. Copyright’. All this in less than 7 days since publication.

Two things stand out to me from all this: first, there is a surprisingly large amount of information available for free about the book. It is probably not sufficient if you really are interested in the subject – but you can get a pretty good idea about what the book contains. Second, there is clearly a focused effort to monetise the publication in every possible way.

Now that we’ve achieved publication, I don’t intend to provide any further running commentaries on progress. The material we are providing to supplement the book is in the Icon Age Collecting section of this website, and that is where we intend to conduct any dialogues about the book that should arise.

Revised Proofing

Despite me thinking that the proofing process was closed, Springer sent us ‘Revised Proofs’ on Saturday 7th June to check and return by Monday 9th June. This was good news as far as I was concerned as it provided opportunities to both check that the proofing changes we had specified had all been done correctly (and, indeed, I did spot 27 shortcomings); and to specify a further 15 changes which my continuing checks on the references had identified (I might add that the vast majority of all these changes were relatively minor involving changes to only a few words, if that). This time round, we had been asked to specify changes in annotations to a revised PDF, so I used the pdf callout facility to document the change needed in a box with an arrow next to the relevant text. My co-author, Peter, had work priorities over these few days, so the changes – and anything missed – are all down to me.

I duly submitted the annotated proof around 9pm on the night of Monday 9th June; and the next day we received an email from Springer acknowledging receipt of our comments and saying that they would review and incorporate them in accordance with Springer’s guidelines after which they would proceed with the online publication process. I’m not too clear with what ‘the online publication process’ entails; nor do I understand why the publication date continues to move – as at the date of this post in Springer’s web site it currently stands at 26th July. However, I do think that the proofing process is now truly complete. In an interesting development, Waterstones appears to have pulled its web page advertising the book, and I wonder if that is because of they have grown impatient with the continual movement of the publication date. Beck-Shop and Amazon, however, are still offering the title.

Proofs Submitted

The proofs for Collecting in the Icon Age arrived, as scheduled by Springer, on Friday 9th May in the form of a web site providing unformatted web pages for each chapter which could be edited to a certain degree. In addition, formatted versions of each chapter were provided in separate PDFs. We duly completed the editing after getting answers to some queries; and we submitted the revised chapters yesterday morning.

We were advised to provide comments adjacent to issues for which no editing functions were available, so we hope these will be sufficient to prompt the revisions we want. We also requested changes to the layout of some figures and tables, but they are subject to house style, so we are less confident that they will be enacted. However, we have done all we can – the proofing process is now closed. The only remaining influence we can have on the book is if Springer asks us questions or asks us for advice on specific points.

The Springer web site is currently advertising 6th July as the publication date – though this does seem quite fluid – a week or so ago it was 3rd July and then it went to 10th July for a day or so. However, the site has been consistent in advertising a softcover version and an ebook version – though no prices are provided. I also believe the book’s chapters will be available for purchase separately – but have seen no information about that. I have no idea if anything special happens on the day of publication, though I’m hoping we will be sent our copies of the book on the day or shortly afterwards. The next couple of months will be an interesting eye-opener for me of how contemporary publishers operate.

Proofs w/c 05May, Publication 02June

We received the schedule from Springer yesterday. It’s planned to send the book page proofs to us in the week commencing 5th May, and we have 12 days to review them and provide corrections. The Springer web site is now specifying a publication date of 2nd June. In the meantime, we’re working on some supplementary material that we plan to publish elsewhere in pwofc.com also on 2nd June – the analysis we undertook to identify the practice hierarchy, an overall practice hierarchy diagram, and an expanded overall set of references.

Publication Date – 16th May 2025

Springer are now advertising that our book – Collecting in the Icon Age – will be published on 16th May 2025. A write up of the book, its contents, authors, and the formats it will appear in, is on the Springer website. Various booksellers such as Waterstones are also advertising the book and enabling customers to pre-order it. The price of the hardback version is rather high….

We have had no communications from Springer about the detailed contents of the book as yet, but have been advised that we will be sent a schedule which will include the expected date the proofs will be sent to us.

Collecting in the Icon Age – delivered

It’s been just over a year since Peter Tolmie and I signed a contract with Springer for a book on collecting – a year in which we’ve put a lot of work in. But at last, this morning we downloaded ‘Collecting in the Icon Age: IT’s impact on collecting practices’ to the publisher’s portal. It has ten chapters:

  1. The Icon Age and collecting practices – a primer
  2. Collecting contexts
  3. Source materials and their analysis
  4. Collecting practices
  5. IT’s impact on collecting practices
  6. The objects of collecting
  7. IT impacts on collectors forty years into the Icon Age
  8. The slide towards collecting context conformity
  9. Notes on collecting in the digital future
  10. Closing summary

I’m just hoping that it’ll still have 10 chapters and most of the contents when it emerges from the publisher’s editing machine…. Publication should be sometime in 2025.

Springing into action

Yesterday Peter Tolmie and I reached a significant milestone in our work on a book about collecting in the IT era: we signed a contract with the publisher Springer. It commits us to deliver the completed text to their editors by the end of June 2024. We would expect to have a firm publication date by the end of that year. So now, it’s a matter of feeding in some additional material, refining our arguments, and modifying the layout and text to match the Springer Style Guide.

The Spreadsheet – an OFC Superstar

Since my last post here, over 7 months ago, we’ve completed first substantial drafts of all 10 chapters of the book on Collecting in the IT era. The literature survey has made a substantial contribution to the material; and the use of an Excel spreadsheet enabled the process. This is just another example of the massive contribution that the humble spreadsheet has made to modern life since its inception in 1979. Designed ostensibly for manipulating numbers, it has proved equally useful for organising text.

In my first foray into writing books at the National Computing Centre in the 1980s, I tried recording key points that I read or discovered about a subject, in a Word document, and then rearranging them into separate chapters. It was a pretty effective method – but only worked for fairly concise units of text and relatively few of them. For this book I have used a spreadsheet to assemble more than 3,400 chunks of relevant points from over 300 books, papers and other sources; many of the chunks consisting of part-paragraphs of over 80 words of text either copied from digital texts or hand-typed-in. Against each chunk are columns of reference details and allocations to particular chapters. The ability to apply consistent organisation over such a large volume of material, and to be able to search and filter every column, provides a huge advancement in capability over my 1980’s efforts; a capability to identify key points, to assess differing views, and to construct new thoughts and ideas around a particular topic.

The simplicity and power of its structures across both numbers and text, makes the spreadsheet a premier performer in creating order from chaos; it is the hammer and wheel for 21st century individuals.