Having got the Collecting book published, I’ve been wondering what’s left to do in this OFC journey. Clearly, I need to update the OFC tutorial which is now some 8 years old. However, it was an email some ten days ago announcing a report on AI Preparedness for Archives that started me on a mini-voyage of realisation, and that prompted me to write this post.
I took each piece of the report’s guidance and wrote notes of how I would apply it to my PAWDOC document collection. Then I asked ChatGPT how I could focus an AI chatbot onto a specific archive of data. In the follow-up Q&A, and after providing a 10-line description of PAWDOC, ChatGPT designed a production architecture for such a research grade archive followed by a grant-fundable academic architecture and a cost estimate. At this point, I resisted ChatGPT’s offer to create a ‘ready-to-copy grant proposal document (including abstract, methodology, outcomes, evaluation plan)’ and went for a walk, stunned BOTH by ChatGPTs capabilities AND by the potential for enhancing PAWDOC with an AI interrogation capability.
I should say at this point, that the ChatGPT descriptions I had read were very general in nature and assume an understanding of many activities – they were most definitely not a cook book with ‘do this then that’ instructions. I was aware that my actual knowledge and understanding of what would be required was pretty much zero, and actually doing it for real would be a steep learning curve.
Having mulled all this around in my head for a few days, it seems clear to me that closure of this OFC journey cannot occur until I understand and experience how AI can be used to augment the interrogation of two types of private collection:
- primarily text-based archives; and
- collections more focused on objects.
These are the types of collections that, in my experience, are most likely to be possessed by private individuals. Note that I am explicitly focusing on ‘private’ collections, because institutions undoubtedly manage their archives and collections differently from private individuals: processes are formally designed to ensure effectiveness and longevity; tasks get done because staff are paid to do them; and IT support is usually at a far greater scale and complexity. Much work is underway to apply AI to institutional collections; however, my focus is to understand how individuals can apply it to their own private collections. With my current level of understanding, I believe I need to investigate the following specific aspects:
- The practicalities of preparing a private archive for AI. To explore this, I would most likely use PAWDOC (a primarily text-based archive), and b. my Mementos collection (more focused on objects), to explore this topic.
∼ - Researching if AI is capable of accessing and understanding the contents of files other than text – sound, image, and video (Large Language Models – LLMs – specifically deal with text). This is particularly relevant to collections more focused on objects.
∼ - The practicalities of building an AI interrogation capability for a private archive which has only an index and information within its digital file names. This would probably be the simplest implementation and so the best one to do first to learn some basics. I would use my Mementos collection to investigate this.
∼ - The practicalities of building an AI interrogation capability for a private archive which does have machine readable text content. To investigate this, I could just use the app-generated content within PAWDOC (for example, all the Microsoft office documents within PAWDOC). Alternatively, and more ambitiously, I could try to OCR some or all of the PAWDOC scanned documents and include them in the investigation.
∼ - The practicalities of building an AI interrogation facility for a private collection containing a combination of text, sound, image, and video. The viability of this investigation would depend on the outcome of 2. above. It would probably involve extending one or both of the implementations described in 3 and 4.
If I get round to any or all of these journeys, they would be recorded in their own separate spaces within this pwofc.com website; and, should they get completed, their results would be used to update the OFC tutorial. Only then would I consider closing this OFC journey.