Some simple Evaluation Metrics

Once I got the software installed and working, I started to ask questions about my Memento collection. However, the answers that came back were not very encouraging: some were incomplete and others were just completely wrong. I hoped that the suggestions ChatGPT had previously made about structuring the CSV file would improve the results, but I realised that in order to check if that was indeed the case, I would need some way of evaluating how well the system was performing – just as had been suggested in the Preparedness guidelines.

I deliberately decided not to go overboard with my evaluation metrics at this stage: what I needed was a small simple set that could be applied relatively quickly and that would produce some numbers I could compare across different versions of the input documents and the system configuration. I came up with the following six questions (the percentages are the first set of results as described further on below):

  • What items are to do with the KRS? [KRS standing for Kodak Recreational Society] (0%)
  • What happened on the 20th? (0%)
  • List the items relating to exam results (25%)
  • What linen is in the collection? (50%)
  • Are there any items relating to Aston Martin cars? [there are some individuals called Martin in the Index] (100%)
  • What documents are there about finances? (50%)

For each question I knew that, when I opened the CSV file in Excel, I could use the filter facility to get a definitive number of items that answered the question. So, to assess the answers provided by AI I added the number of the items identified by the Filter that the AI had reported correctly, to the number of additional correct answers identified by the AI (Total correct answers); and then divided that number by the sum of a) the number of answers identified by the filter, b) the number of additional correct answers the AI identified, and c) the number of incorrect answers that the AI identified (Total number of answers overall).

For example, for the question about listing the items relating to exam results, the filter identified 2 items (2 FILTER) but AI didn’t report either of them (0 CORRECT). However, it did report two items in which the word exam and results appeared separately (2 ADDITIONAL CORRECT). It also reported 3 items in which just the words exam or exams appeared (3 INCORRECT), and another item concerning an assessment but in which neither the words exam or results are present (1 INCORRECT). This produced a result of (0+2)/(2+2+3+1) = 2/8 = 25%.

The results for this rudimentary version of the CSV file were as shown against each of the questions listed above. The overall result was 38%. While this is in no way a definitive analysis, it nevertheless will enable a comparison to be made between different implementations. I intend to use it at least for the remainder of this first phase.

Installing the AI Software

The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (https://anythingllm.com/) and selected the ‘Download for desktop’ box. It took about 13 minutes to download the 370Mb programme and install it. On opening the application, I was told that a) it had selected the best model (Qwen3Vision2BInstruct) for my hardware (a 9-year old Windows 11 laptop with 8Mb of RAM); b) that I was to use the LanceDB Vector Database; c) that these settings could be modified anytime; and d) that the model, chats, vectors and document text would all be stored privately on that instance of AnythingLLM and would only be accessible on that device.

I uploaded my mementos CSV file, and then got a Warning Message saying something like ‘the workspace is using 102,OOO of its 3,500  available tokens. Choose how you want to proceed – Continue anyway or Embed file’. The associated help page says:

“Continue Anyway: Will continue to add the document full text to the chat window, but data will be lost in this process as AnythingLLM will automatically prune the context to fit. You should not do this as you will experience inaccurate LLM behaviour.

Embed: Will embed the document (RAG) and add it to the workspace. This will allow the LLM to use the document as a source of information, but it will not be able to use the full text of the document. This option may or may not be visible depending on your permissions on the workspace.”

I selected Embed and that process took about 5 minutes.

I then asked some questions such as ‘Show me items relating to xxx” but consistently got the reply: ‘Could not respond to message. Model requires more system memory (8.7 GiB) than is available (5.4 GiB)’. I asked ChatGPT how much memory AnythingLLM needed to run and it said that the model that had been selected wasn’t suitable for a machine with 8Gb of RAM. Instead, it said I should use the Ollama phi3:mini model and advised how to obtain it. However, that didn’t work, so ChatGPT said that meant that Ollama wasn’t on my machine and that I needed to download and install that first, and provided me a website link to do so.

I installed Ollama (which included installing a redistributable version of Visual C) and restarted my laptop as instructed by ChatGPT. Then I installed phi3:mini by typing ‘ollama pull phi3:mini’ at the Command Line prompt as instructed by ChatGPT. Then I had to select the Ollama LLM in AnythingLLL by going into the Workspace settings (the little rose icon) and selecting Ollama. While in that Ollama section of the drop-down there was another settings rose icon which had to be clicked to access the Ollama-specific settings screen in which ChatGPT had advised me to place ‘http://localhost:11434’ in the ‘Ollama Base URL’ field.

At this point I noted that ‘phi3:mini’ was correctly displayed in the ‘Ollama Model’ field. Having done all this I was able to select the Mementos CSV document in Anything LLM and have it embedded; after which I was able to ask some questions and to get some answers.

Now, what was going on in all of this? This is what I discovered after having a few exchanges with ChatGPT:

The software that is needed for AI has three layers:

  1. The model (Phi-3 Mini) → the “brain” that generates text.
  2. Ollama → the engine that runs the model locally.
  3. AnythingLLM → the interface and workflow tool you interact with.

The Model (Phi-3 Mini) is the AI brain – the trained neural network that produces answers by:

  • Predicting the next token in text
  • Generating responses to prompts
  • Using knowledge learned during training.

Ollama is the system that runs the AI model on the computer. It does the following:

  • Loads the model
  • Sends your prompt to it
  • Streams the response back
  • Performs other functions such as loading models into RAM and providing an API server for other applications.

AnythingLLM is the user interface and AI workflow platform which does the following:

  • Connects to Ollama
  • Sends prompts to the model
  • Displays responses
  • Manages workspaces
  • Embeds knowledge sources (RAG)
  • Keeps chat history
  • Handles embeddings and document search.

This architecture is flexible: it enables different products to be switched into any of the three components while keeping the other two the same.

AnythingLLM embeds knowledge sources by retrieving information in external documents, encoding that information and placing it into a Vector Database (in this case, LanceDB). The steps it takes to do this are:

  • It turns text into numbers. This is known as Embedding in which text is converted into numerical vectors that represent meaning. For example , “The cat sat on the mat” becomes something like: [0.213, -0.551, 0.889, …]
  • Storing the embeddings in the Vector Database along with Chunks of text and References to the original documents. For example, Vector: [0.213, -0.551, 0.889 …]; Text: “The mitochondria is the powerhouse of the cell.”; Source: biology_notes.pdf.

When you ask a question, AnythingLLM finds the closest matching vectors. So, overall, the RAG (Retrieval-Augmented Generation) process works as follows:

Step 1 — Document chunking. When you upload a document to AnythingLLM, the document is split into small sections (PDF → paragraphs → chunks).

Step 2 — Embedding creation. Each chunk is converted to a vector.

Step 3 — Storage. The vectors are stored in the vector database.

Step 4 — Question time. When you ask a question such as “What causes tides?”, AnythingLLM:

  • converts the question into an embedding
  • searches the vector database
  • retrieves the most similar chunks.

Step 5 — Context injection. The retrieved chunks are added to the question and the combined prompt is sent to Phi-3 Mini.

Step 6 — AI generates answer. Now the model answers using your documents, not just training data; and the answer is shown in shown in AnythingLLM.

Unfortunately, even after I had successfully installed a working configuration, the system occasionally could not respond, and the results were often incomplete or incorrect. In ChatGPT’s opinion these problems are most likely being caused by:

  • The model temporarily exhausting RAM: I only have 8Gb of RAM and the AI components probably take up between 5 and 6GB (AnythingLLM app ~500 MB, Vector database + embeddings ~200–400 MB, Phi-3 Mini loaded in Ollama ~3–4 GB, Prompt + generation buffers ~0.5–1 GB). Adding the 2-3 GB taken up by the Operating System, means that every now and again I’m probably hitting a 7-8 GB total resulting in the Operating System occasionally swapping memory to disk, Ollama pausing or timing out, and AnythingLLM reporting “Ollama not responding”.
  • Poor data structure in the CSV file: RAG systems like AnythingLLM perform best with short natural-language passages, not table rows. When CSVs are embedded directly, column relationships are lost, retrieval becomes noisy, and the model guesses incorrectly. Hence ChatGPT’ suggestions in the previous post for how to refine the contents of the CSV file.
  • A limitation in the Phi-3 Mini model’s capability. Phi-3 Mini is optimized for low-memory environments, while larger models typically provide higher completeness and accuracy.

Still, I do at least now have a working system which I can experiment with – even if I have to occasionally put up with “Ollama not responding”. The following post documents how well or otherwise this initial configuration performed.

Index Adjustments for AI

Having completed the Preparedness steps, I asked ChatGPT the following question:

“I have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on the Reference No and Description information. The Index file is named ‘Memento Collection Index for AI’ and it is located in my laptop at C:\Users\pwils\Documents\AI. The first two rows of the Index file contain descriptive information about the file and can be ignored. The 3rd row contains the headers for each of the Index fields. There are fourteen fields in all with the first two titled ‘Reference No’ and ‘Description’. What’s the first thing I should do to create the RAG interrogation capability?”

ChatGPT responded with advice to remove the first two rows in the spreadsheet, and to convert it to a csv file. In subsequent exchanges, ChatGPT suggested the following changes and additions to the csv file which would enable the AI to provide more insightful answers:

  • Create a new column called ‘Item Label’ which combines the Reference No and the Description separated by a hyphen (see the relevant ChatGPT conversation).
  • Normalize the two Facet fields (the index has a Facet 1 and a Facet 2 field. If there is only 1 entry in Facet 1, Facet 2 is empty. If there is a second keyword in facet 1 (separated from the first keyword by a comma), then both keywords appear in Facet 2 but in reverse order). Normalizing means, a) lowercasing all the words, b) avoiding plurals, c) keeping the facets short – preferably just 1 word.
  • Add a ‘Primary Facet’ column which contains whichever of the two facets is considered to be the dominant one.
  • Add an ‘AI Context’ column which combines the ‘Item Label’ text with the ‘Facet 1’ text in the format [Item Label text]. Facets: [Facet 1 text].
  • Add a ‘Collection Themes’ column which contains 1-3 broader thematic categories than the more specific Facets. For a collection this size there should be between 12 and 20 Themes. These do not currently exist in the Index and would have to be identified and then allocated to each line item. However, it seems that the AI could come up with an initial list of themes by analysing the contenst of the ‘Item Label’ and the ‘Facet’ fields.
  • Add a ‘Theme Cluster’ column – containing a short name representing a group of objects that share a pattern. For a collection this size there should be between 25 and 40 clusters. Again, it seems that the AI could come up with an initial list of clusters by analysing the ‘Item Label’ and ‘Facet’ fields.
  • Add a ‘Cluster Signature’ column which combines the ‘Primary Facet’ and the ‘Collection Theme’ fields in the format [Primary Facet text] | [Collection Theme text].
  • Add a ‘Related concepts’ column which contains 1 -3 broader conceptual ideas associated with the object. For a collection this size there should be 20-30 of these – preferably single words. These do not currently exist in the Index and would have to be identified and allocated. I’m not sure if the AI could help to identify them or not.
  • Add an ‘Outlier score’ column which indicates how unusual an item is within the collection. Possible values could be: 1 Very typical object, 2 Moderately distinctive, 3 Unusual, 4 Very Unusual, 5 Unique or rare in the collection. This information does not currently exist in the database and would have to be specified for each item (though among the fields that have been removed for this AI exercise, ‘Unusual’ items are identified).
  • Add an ‘Object links’ column which lists the Reference Numbers of other objects that are meaningfully related, in the format RefNo, RefNo, RefNo. This information does not currently exist in the Index and would have to be specified for each item – potentially quite a big job.

At this point I decided that, for this first stage in this journey, I would simply stick with the very first suggestion – to create a new column called ‘Item Label’ combining the Reference No and the Description separated by a hyphen. Once I have something working, I can return to these other sophistications.

In the course of this extended exchange, ChatGPT also offered to provide “the exact 40-line Python script that will turn your spreadsheet into a working RAG search system for the 2993 mementos”. I accepted and in the course of the subsequent interchange was offered an easier approach which involves acquiring a desktop RAG tool called AnythingLLM which would run locally and require no programming. The latter sounded exactly what I needed and I set about downloading and installing it.

Preparing the Memento Collection

This is the start of my attempt to undertake Phase1 of my investigation into providing an AI interrogation capability for archives. Phase 1 concerns providing AI support for my Memento collection’s index entries.

The first step was to apply the advice in the recent publication “AI preparedness guidelines for archivists” by Prof. Giovanni Colavizza and Prof. Lise Jaillant. This suggests addressing four main areas (referred to as Pillars). My analysis of the Memento collection’s preparedness relating to each of the four areas is recorded in a Memento Preparedness document and summarised below:

Pillar 1 – Completeness and excluded data. The collection is complete; all items in the Index are to be interrogated by AI in this phase; no items have been excluded.

Pillar 2 – Metadata and access. 14 fields are in columns in the Index and these are available for use for AI interrogation. However, there are many additional columns (containing various analytical data) which are to be removed for this exercise. All information remaining in the Index after the additional columns have been removed, will be available for AI interrogation; no information will be subjected to restricted access in this exercise. Provenance and relationship information is embedded in the Reference No, and sometimes in the Notes field. An extensive range of narrative information about the collection and the Index is contained in a Guide worksheet within the Index spreadsheet.

Pillar 3 – Data types, formats, and file structures. Before making any changes to the Index file an assessment will be made as to whether the change is wanted in the original index or not. If it is not, a copy of the Index will be made. A variety of different file formats are present in the digital files of the collection, but the vast majority are either .pdf, .docx, or .jpg documents. Some standardisation changes may be required in some of the index fields. Folder names distinguish between the different collection components and link back into the overall Collection folder structure. All item digital File Titles contain the relevant Reference Number.

Pillar 4 – Application-specific metrics and evaluation. The ability to find what you are looking for in the Index is the primary requirement of collection users. Another requirement is to find the last Reference Number used overall or in a particular series, in order to specify the appropriate next Reference Number for an item you are adding to the collection. How these and other criteria should be translated into evaluation metrics will be considered through the course of the project.

As a result of the above analysis the following 14 actions were identified (the notation ‘P1.1’ stands for ‘the first action in P1 – Phase 1’; P1.2 is the second action in Phase 1; and so on).

P1.1 Mem(Index). Ensure completeness and normalisation. All 14 fields were checked to eliminate blanks and to normalise content where necessary.

P1.2 Mem(Index). Remove columns O to BX from the file used for this AI work. Columns O to BX were removed.

P1.3 Mem(All). Document all the Provenance and Relationship info embedded within the Index and the File Titles. The Guide was expanded to describe, a) the 14 fields, b) how the digital filename is constructed, and c) how the collection came about (which includes references to posts in the pwofc.com website).

P1.4 Mem(All). Observe how the Provenance and Relationship info is used to create guidelines for producing such documentation. To be revisited during the implementation of Phase 1

P1.5 Mem(Index). Identify any extra narrative info that is available or is needed. None needed.

P1.6 Mem(Index). Produce any extra narrative info that is required. None needed

P1.7 Mem(All). Carry out ‘wanted or not in the original index’’ check before each action. Done

P1.8 Mem(Items). Check what formats exist in the collection files. 16 different file formats are present in the collection – DOC, DOCX, FMP12, HTM, JPG, M4A, MP3, MP4, PDF, PDF-A1-b, PPTX, TIFF, XLSM, XLSX, XLS, ZIP.

P1.9 Mem(Items). Define AI-friendly standard formats: Only the Index to the collection (an XLSX document) is to be used in this phase, and this will be converted into a csv file for the purpose.

P1.10 Mem(Items). Make any changes to existing formats to conform to new standards. An AI friendly file in csv format was derived from the original Index document. Since you can’t create a csv with multiple worksheets, two new files were created: one with the file name ‘Mementos Collection Index for AI Phase 1.csv’, and another with the file name ‘Mementos Collection Guide for AI Phase 1’.

P1.11 Mem(All). Document the folder structure for the derivative file. For all this AI work, a new folder was created into which these derived files, and all other files derived for AI purposes, will be placed: C:\Users\pwils\Documents\AI.

P1.12 Mem(All) – Find out what ‘supports programmatic retrieval’ means in practice. ChatGPT advised that this usually means: querying a vector database, calling a search API, pulling documents from a content repository, and fetching structured data from a database.

P1.13 Mem(All) – Make any changes necessary to support programmatic retrieval. I don’t have enough knowledge yet to understand if any changes are needed to support that process. I will have to revisit this question when I actually start to try to implement the capability.

P1.14 Mem(All) – Prompt for ideas about success metrics as each action is taken in the course of the project. This question will be revisited as work on this phase progresses.

This brought to an end the Preparedness work which took a total of approximately 7 hours. The next step was to try and implement an AI capability: to start this process I asked ChatGPT what should be the first thing I do to create a RAG interrogation capability for the Memento’s Index (RAG stands for Retrieval-Augmented Generation – whereby the AI is not trained on the archive, but instead the archive data is provided to the AI at answer time). What followed is reported in the next post.

A Plan with a rather empty Kitbag

A couple of weeks ago I realised that there was a gap in the research I’ve been doing on Collecting by Individuals – how to apply AI to Personal Collections. After a further two weeks thinking it over, I’ve decided that I need to bite the bullet and to learn by doing it myself on some of my own collections.

I’m embarking on this journey with very little relevant knowledge. However, I do at least have the recent report on AI Preparedness guidelines for Archivists, as well as some exchanges with ChatGPT about what can be done. I’m hoping that these will get me up and running, and that ChatGPT may be able to help me with things I don’t understand. With these tools in my (rather empty) kitbag the notes below outline what I plan to do.

My overall objective is to provide detailed guidelines for individuals who want to apply AI to interrogate their own private collections. This may also involve enhancing the current OFC Tutorial and/or creating a separate tutorial.

The strategy I intend to follow is to conduct the work in a series of phases, going from the simplest possible implementation I can define to progressively more comprehensive and complex implementations. I will use two of my own collections: a collection of Mementos which has an index of some 2390 entries and 2730 digital files; and my PAWDOC collection of work files which has an index of some 17380 entries and around 31,300 digital files.

The phases I currently plan to undertake are as follows (though these may well be rejigged as I gain more experience and knowledge):

  1. AI support for the Memento collection’s index entries;
  2. AI support for the Memento collection’s combined Index entries and file titles;
  3. AI support for the Memento collection’s index entries, file titles and textual items;
  4. AI support for PAWDOC’s index entries;
  5. AI support for PAWDOC’s combined index entries and file titles
  6. AI support for PAWDOC’s combined index entries, file titles, and some or all of the born digital items
  7. AI support for a subset of PAWDOC’s scanned items
  8. AI support for a combination of index entries, file titles, some born digital material and some scanned items
  9. AI support for the whole of PAWDOC

Timescales: With my current lack of knowledge I don’t know how long this is all going to take. However, I shall aim to try and have the first phase completed in not more than 1 year.

Ideally, I would like to find some knowledgeable collaborators who have relevant experience and who would guide me through the work (please do get in touch if you are interested). However, it could be hard to find the right people who have sufficient interest and the time to spare. I shall take some steps to try and find some such individuals, but won’t let that endeavour delay my start on Phase 1. I am reconciled to probably having to do most of the work without any permanent collaborator support.

An AI Roadmap

Having got the Collecting book published, I’ve been wondering what’s left to do in this OFC journey. Clearly, I need to update the OFC tutorial which is now some 8 years old. However, it was an email some ten days ago announcing a report on AI Preparedness for Archives that started me on a mini-voyage of realisation, and that prompted me to write this post.

I took each piece of the report’s guidance and wrote notes of how I would apply it to my PAWDOC document collection. Then I asked ChatGPT how I could focus an AI chatbot onto a specific archive of data. In the follow-up Q&A, and after providing a 10-line description of PAWDOC, ChatGPT designed a production architecture for such a research grade archive followed by a grant-fundable academic architecture and a cost estimate. At this point, I resisted ChatGPT’s offer to create a ‘ready-to-copy grant proposal document (including abstract, methodology, outcomes, evaluation plan)’ and went for a walk, stunned BOTH by ChatGPTs capabilities AND by the potential for enhancing PAWDOC with an AI interrogation capability.

I should say at this point, that the ChatGPT descriptions I had read were very general in nature and assume an understanding of many activities – they were most definitely not a cook book with ‘do this then that’ instructions. I was aware that my actual knowledge and understanding of what would be required was pretty much zero, and actually doing it for real would be a steep learning curve.

Having mulled all this around in my head for a few days, it seems clear to me that closure of this OFC journey cannot occur until I understand and experience how AI can be used to augment the interrogation of two types of private collection:

  1. primarily text-based archives; and
  2. collections more focused on objects.

These are the types of collections that, in my experience, are most likely to be possessed by private individuals. Note that I am explicitly focusing on ‘private’ collections, because institutions undoubtedly manage their archives and collections differently from private individuals: processes are formally designed to ensure effectiveness and longevity; tasks get done because staff are paid to do them; and IT support is usually at a far greater scale and complexity. Much work is underway to apply AI to institutional collections; however, my focus is to understand how individuals can apply it to their own private collections. With my current level of understanding, I believe I need to investigate the following specific aspects:

  1. The practicalities of preparing a private archive for AI. To explore this, I would most likely use PAWDOC (a primarily text-based archive), and b. my Mementos collection (more focused on objects), to explore this topic.
  2. Researching if AI is capable of accessing and understanding the contents of files other than text – sound, image, and video (Large Language Models – LLMs – specifically deal with text). This is particularly relevant to collections more focused on objects.
  3. The practicalities of building an AI interrogation capability for a private archive which has only an index and information within its digital file names. This would probably be the simplest implementation and so the best one to do first to learn some basics. I would use my Mementos collection to investigate this.
  4. The practicalities of building an AI interrogation capability for a private archive which does have machine readable text content. To investigate this, I could just use the app-generated content within PAWDOC (for example, all the Microsoft office documents within PAWDOC). Alternatively, and more ambitiously, I could try to OCR some or all of the PAWDOC scanned documents and include them in the investigation.
  5. The practicalities of building an AI interrogation facility for a private collection containing a combination of text, sound, image, and video. The viability of this investigation would depend on the outcome of 2. above. It would probably involve extending one or both of the implementations described in 3 and 4.

If I get round to any or all of these journeys, they would be recorded in their own separate spaces within this pwofc.com website; and, should they get completed, their results would be used to update the OFC tutorial. Only then would I consider closing this OFC journey.

My Life in a Book

In December 2024, I was given a rather unusual birthday present by my daughter and her husband. It was a subscription to an internet service called My Life in a Book. The service sends you a different question to “help you reflect on key moments in your life” every Monday for a year; and provides a web site in which you can write your answers (which can include images as well). After 52 weeks you tidy up your manuscript, select and edit a cover design, and then press ‘Print’ to have your stories “beautifully bound into a cherished keepsake book”. The gift included 1 physical copy of the book.

After receiving an introductory email advising me of the gift, I duly received my first question the following Monday. It read “ Hi Paul. Your question of the week is ready” and provided a button to take me to the writing web site and the question “What are your favourite childhood memories?”. This was the pattern for the following 12 months (I believe that the person who buys the subscription selects the questions from a pick-list). Sometimes I was too busy to answer straight away, or simply wanted some time to think about my answer, and in these cases I either did not respond to the email until I was ready; or else I accessed the question, inserted some placeholder text, and labelled it as ‘draft’. I found some of the questions really quite hard to answer; for example, “What were your greatest fears about becoming a parent?” or “How do you navigate decision-making when confronted with uncertainty or fear?”. Initially, I dealt with these by selecting the option to skip a question but, as the year wore on, I thought better of it on the assumption that any honest answer – even along the lines of ‘I don’t know’ – would be worthwhile. However, if I had skipped a question I could have simply replaced it using the facility to create your own questions at will (the answers are essentially text blocks which don’t have to relate to a question – they can be sections of any kind – Forward, Contents, Introduction, Index etc.).

The editing facilities in the writing platform have clearly been designed to help people unfamiliar with word processing systems, to produce their answers: the margins, font, and font size are all predefined with no choice. However, bold, and italics can be specified for selected text. The facilities to edit imported images are also limited: there are three size options – small, medium and large – and the ability to crop. This overall limitation of choice is quite refreshing, relieving the writer of having to take a variety of actions.

The final version of the book is produced as a PDF file with page numbers, which can be reviewed at will. Unfortunately, this disconnect between the editing facility and the PDF version does mean that, while you are writing, you can’t be certain whether imported images will fit onto the bottom of a page or will get moved to the following page leaving a large gap. To check, the PDF has to be generated which, in December 2025, for my book of around 230 pages, was taking at least 40 seconds and sometimes a lot more (I’m guessing that response time is dependent on system load and that a lot of subscriptions were coming due around Christmas). A further annoyance was that, for a reason I don’t understand, the changes I made to image size didn’t seem to appear until I had generated the PDF version for a second time. Hence to ascertain whether an adjusted image was going to fit onto the bottom of a page was taking me around a minute and a half or more – very frustrating, especially when finding that the adjustment you have made was insufficient and the image is still being pushed to the following page leaving a large gap. A fourth option – “Fit at bottom of current page” – to go with the small, medium and large image sizes would really improve the system’s usability.

Other than the issue described above, I found the system generally easy to use and flexible enough to include whatever content you want. For example, although each answer comes with a suggested heading, the user can change the heading text at will. I used that ability to add numbers to each heading and then created a Contents page (which is not automatically generated). I also created a Preface section.

Having completed the content of the book, the web site guides you through a completion process which first advises detailed checking of the contents using the preview PDF (essential, as bitter experience has proved to me it is almost impossible to spot and remove all typos, grammatical errors and factual mistakes from the draft of a book). I was then asked to choose a template book cover from a choice of several dozen designs; and to supply a title, author and image, which were automatically included in the template. When I was satisfied with the book cover, I was taken into the ordering process where I specified where I wanted the book to be sent, and the number of copies I wanted (whoever bought you the gift will have paid for one or more printings, however additional copies can be purchased). That was the 10th of December; then it was time to sit back and wait for delivery. I received confirmation that the book had been printed and shipped just two days later; and it was delivered by Royal Mail 5 days after that on 17Dec – which I thought was an impressively fast turnaround.

The book itself is around A5 size and looks quite good. The text block appears to be secured to the case only by the end papers so I’m not so sure how long-lasting the joint will be – but the book does open satisfactorily. The text is clear and an easy-to-read 12pt in size; but the images, though perfectly adequate, are less than pin-sharp. However, there was one thing that was wrong: the printed Contents list had slipped over onto three pages whereas the PDF I had checked showed it on just two pages. Consequently, all the page numbers quoted in the Contents list were out by 1 page. I immediately used the web site chat facility to report the problem and was told that someone would get back to me.

The following morning, I was asked to provide photos of the problem and to specify the type of device and browser that I was using. I responded saying, “The device I’m using is a Windows 11 laptop with the Firefox browser (version 146.0 – 64-bit)”. I was then told that,

“It seems the issue may be related to the browser you used. Please know that while this is rare, we’re working to ensure all browsers display the correct format. That said, we will take full responsibility and send you your books again at no extra cost. To ensure everything appears perfectly this time and to prevent the same issue from happening again, I encourage you to log in to your account using Google Chrome and make any necessary adjustments. Once you’ve made the changes, please let us know, and we’ll provide a PDF copy for your final review. After you approve it, we will reprint your book and ensure it is sent to you as quickly as possible.”

A subsequent exchange confirmed that it would be worth trying with Microsoft Edge which I duly did; and after comparing with the PDF I was sent, all seemed to be well and the book was sent for reprinting on 21st December. The two replacement copies were delivered to me by Royal Mail on 27Dec – and they did have the correct pagination.

I felt so pleased with the support I had been given that I was prompted to send the following message to the support team: “I must say that the response of you and your colleagues in the Support team has been an outstanding example of prompt and excellent customer service.”. Having said that, though, the problem I encountered should not have happened, and my euphoric response probably also had something to do with the fact that my general experience with online support these days is poor. Furthermore, my subsequent dealings with the support team were less satisfactory and revealing – but more of that at the end of this post.

So, having completed the whole 12-month cycle of My Life in a Book (MLIAB), how do I feel about the experience? Well, it certainly prompted the exploration, re-use and perhaps rethinking of old memories and artefacts; and it’s satisfying to have the results all neatly packaged up and sitting on my bookshelf. The completed book is intended as much for the current family and future generations as it is for me (as is pointed out in much of the MLIAB promotional material), and, as yet, I have no idea what my daughter and her husband think of the artefact they commissioned; nor what my other offspring, who will be the lucky recipients of the copies with incorrect paging, will think. Perhaps they won’t even read it. However, as the author, I do know that I made some specific choices about the content. First, being conscious that the book might well be perused by all members of the family, I was careful to be inclusive and not to favour anyone in particular. Second, I naturally only included material I was happy with other people knowing about. Thirdly, some of the contents are things that the family almost certainly will not have been aware of; and fourthly, after I’d finished, I began to have doubts about some of the material I had included; and inevitably started to think of other things I could have included – but I certainly wasn’t going to take up the service’s offer to extend the process: one year was quite enough answering questions, researching, and editing. Overall, I think it’s a pretty effective way of exploiting one’s collections of mementos, photos, correspondence and other personal material – but it does require work and persistence.

I should mention a couple of other things at this point. One is that the marketing effort by MLIAB is one of the most intensive I have ever experienced: during December I received over 40 general marketing emails unrelated to my account or the book I was producing. The other is that there are several other similar services available on the net (for example, The Story Keepers, Storyworth, Remento, and No Story Lost), but I haven’t investigated any of them.

Now, to return to my further dealings with the Support team. Throughout my exchanges with them I’d been a little bemused by the gushy nature of the responses. It wasn’t normal and smacked of AI (see messages 1-6 in this linked file). This view was cemented in the next message (Message 7) that I received in answer to me asking if they had encountered the problem with the generation of the PDFs and if they knew what the cause was. The response was strangely imbalanced. It ignored the issues associated with PDF generation and instead it explicitly described how large images are placed onto the following page leaving gaps – a fact I was very familiar with – and was gushing about a potential solution I had offered. At that point I replied with the question, “Julia, are you and your colleagues Pauline and Sandra real? How much of your reply below was generated by an AI Large Language Model (LLM)?”.  The reply insisted that they were real people aided by tools such as AI (see Message 8 in the linked file).

Now despite my satisfaction with the way my book had been reprinted, there are some hard facts about customer service to be taken away from these exchanges:

  1. The positive impact of warm, gushy, language just disappears after it becomes obvious it’s machine generated. At the point I confirmed what was happening, I ceased to feel I was dealing with people and became rather hard-nosed and cynical – as will become apparent from my comments below.
  2. The fact that my PDF question had been ignored was very frustrating; but I decided not to follow it up because, in my experience, bots are useless when they are dealing with unfamiliar issues, and the organisations that implement them always seem more intent on saving headcount than addressing customer problems. The issue with PDF generation is a genuine problem that the MLIAB organisation should know about and be able to advise customers about. It’s disappointing that it wasn’t addressed in the response.
  3. Despite Message 8 insisting that all named members of the support team that I had been dealing with were real people, I’m not sure whether to believe it or not. LLMs are notorious for getting things wrong and saying what suits their prediction algorithms; and I think that organisations are all too often happy to obscure the real capabilities of their customer support operations. I may be wrong about the MLIAB support operation, but I’m afraid this is the view I now have after my experiences with a variety of bots and support operations; and after reading quite a bit about contemporary AI systems.
  4. Even if the messages to me were being reviewed by real people, the fact that my question about PDF generation had been so studiously ignored, suggests that either the reviewing wasn’t very good (or there was insufficient bandwidth to undertake a proper scrutiny of my question); or that the AI/people combination had deliberately decided to ignore my question.
  5. My overall attitude towards the MLIAB support operation is now one of ambivalence – despite its excellent response to the incorrect pagination of my book. I really have no idea how many real people they have in their support team, what their real names are, or how they actually operate. Does the AI create all responses immediately messages are received with the replies being quickly reviewed by real people (or even just a single person?); or do the real people look at messages from clients and then enlist the AI to create a response? Knowing that this is the way the world is going, I’ll inevitably have to draw on this experience when I interact with other customer service operations in the future. This will be a self-perpetuating vicious circle until customer service is once again considered important enough to give a sufficient number of human representatives the time to be able to interact in detail with all customers wanting help, support and answers to questions.

Donating Documents to an Archive

Around 1927, a group of people from Yorkshire who were working in Malaya established a social club for themselves. They called it ‘The Society of Yorkshiremen in Malaya’, and it operated successfully until the fall of Singapore to the Japanese in 1942. After the 2nd World War had ended there were still some of the original members living in Singapore and Malaya, and, at a meeting held in 1949, they decided to reconstitute the Society.

My parents were both born in Yorkshire, so when they arrived in Singapore in 1953, they duly joined the Society. From 1959 onwards one or other or both of them were members of the Society’s Committee and acted in various capacities (Secretary, Treasurer or President) until dissolution of the Society in 1970 due to lack of members. The Society’s minutes were initially recorded in a hefty, foolscap notebook with 296 numbered pages; and after the notebook ran out of pages, on loose sheets of paper. They have been in a briefcase belonging to my parents for the last 55 years – until this summer when I decided to try and find them a home.

I decided I would try to create a book to accompany the Minute Book and Loose Papers: a book that would provide some summary information about the Society, and which would also include scans of the loose documents. I hoped that the accompanying book would make the whole package attractive to an archive somewhere in Yorkshire.

The book ended up with the following contents in 8 folios of 16 pages – 128 pages altogether:

Front matter (Preface & Contents)
1. A short history                                                         3
2. Lists of Members 1953 – 1969                                8
3. Lists of Committee Members 1950 – 1969            43
4. Committee Meeting Minutes 1950 – 1969             57

Appendices
A. Correspondence regarding the pre-war Society    92
B. The Benevolent Fund                                          101
C. Scroll given to ex-President Fred Wilson             105
D. 1966 Annual Dinner menu                                  106
E. 1968 Annual Dinner menu                                  113
F. Photos from two Annual Dinners                         120
G. The Society’s Minute Book                                 121

Having finished the text, I researched possible archives in Yorkshire and decided to contact York Libraries and Archives. I was told that decisions about acquiring new material are taken at Collection Meetings held at the end of every month; so, I provided a copy of the first folio of the book for consideration by the meeting and held my breath. On 2nd September I got an email saying the organisation would like to acquire the material.

Meanwhile, I’d been producing the hardcopy book. The first task had been to print each of the 8 folios using Word’s ‘Bookfold’ Page Setup in MS Word. This requires the pages to be printed out in landscape on both sides of, in this case, A3 pages.

Although the Bookfold printing process has been described in previous posts, here’s a recap of what to do. When you press Print in Bookfold mode, the first side of the pages are printed – 2 pages side by side on one A3 page. The pages must then be reordered by moving the page on top to one side, then placing the next page on top of the first page, then placing the next page on top of the second page, and finally the next page on top of the third page. The reordered set of pages are then placed back into the paper tray pointing in the same direction as they came out. Instructing the printer to continue will then produce the four pages printed similarly on the other side. Reordering the pages in the same way as before, and folding the pages in half will magically produce the 16-page set in the correct page number order.

With all 8 folios printed out and folded, the next step was to sew them together; and then to paint PVA glue over the stitching (but not the tapes) to hold the set firm during subsequent steps.

The edges of the text block were then trimmed and squared in a bookbinders plough.

The final work on the text block was to glue on the end papers, the end bands, and a piece of fraynot on the spine with a 2cm overlap on either side and a piece of Kraft paper on top of it. Next, a cover was made using 2mm board with a buckram fabric; and the end papers were glued to the cover to complete the hardback book.

Now that the dimensions of the completed hardback book could be measured, work on the Dust Jacket (DJ) started in PowerPoint. The design included 8.2 cm wide flaps, making the total length of the DJ 62cm. The maximum length of paper that my HP Officejet Pro 7720 A3 printer will deal with is 43.1 cm, so printing this DJ required that the image be split in two and one part to be rotated through 180 degrees. The first part was then printed and the paper was then turned round and fed back through the printer to print the other part at the other end.

Having printed the DJ, it was cut to size and folded accurately around the physical book so the vertical title was central on the spine of the book. Finally, archival plastic was fitted around the DJ with a 5/6cm overlap folded over along the top and bottom edges (the folds hold the plastic in place so no fixative is required).

Although I had committed to giving all the physical materials (the hardback book, the Minute Book and the Loose Papers) to York Libraries & Archives, I wanted to have an electronic version of all the material which we could keep in the family. I already had an electronic version of the hardback book that I had created, and that already included all the Loose Papers; all that was missing were the pages in the Minute Book. So, I took photographs with my iPhone of the Minute Book opened up on each double page (photos taken by the modern mobile phone provide photos of more than sufficient quality for such a job). It was surprisingly simple and quick: I found a coffee table that was just the right height that I could put my iPhone on with the camera end sticking out over the side; and I put the opened Minute Book on the floor in position so that its full extent appeared in the photo image. I held the camera in position with one hand and pressed the photo button with the other. Then, using the photo button hand I turned the Minute book page over and took the next photo. In all there were 169 photos (because, while most pages were written on, some had minutes glued in – some with multiple sheets of paper stuck into them).

Armed with digital versions of all the material, I simply assembled them all into a single PDF, starting with the pages from the Hardback Book and adding the Minute Book images on at the end. Then I added the front of the DJ and a page with both flaps on it to the very front of the file; and a page with the back of the DJ to the very end of the file.

There was now only one task left to do: to make the references in the digital version of the Hardback Book actually link to the pages they were referring to. So, for example, the reference to MB63 in the first para of Chapter 2 needed to be linked to the 170th digital page in the PDF. It was a long job, given that there were over 400 references to deal with. However, with it completed, the whole electronic book provides quick and easy access to all the interlinked material.

I now turned to the practicalities of shipping the physical material to York. I decided to drive to York so that I could take a look at where the collection would be stored and accessed, and I agreed a date of 28th November with my contact at the York Archives. The legalities of the transfer of the material also needed to be dealt with: there was a 4-page Gift Agreement, two copies of which were signed by my mother (the owner of the Minute Book and Loose Pages) and a witness. The agreement essentially passed all rights to “The Council of the City of York (‘the Council’) acting by Explore York Libraries and Archives Mutual Ltd” subject to any specified limitations; we specified no limitations.

On Friday 28th November, I set out for York, arrived at York Park & Ride at around 1.30pm, and was deposited by the bus in York town Centre next to Clifford Tower just after 2pm. A 20-minute walk through the vibrant York shopping centre teaming with Black Friday shoppers, took me to Museum Street where York Archives and Library is located. I met with the Archivist who I was dealing with and handed over the books and papers. She signed the two copies of the Gift Agreement and gave one back to me; and then very kindly gave me a short tour of the three main areas (the Archives Reading Room, the Family History Room and the Local History Room). I left to celebrate the completion of my mission with a cup of coffee and excellent bacon sandwich in one of York’s many coffee shops.

A few weeks later I received a thank-you letter from York Archives which included the following:

“Thank you for depositing the records of the Society of Yorkshiremen in Malaya with the archive here at Explore. Your records are a unique part of Yorkshire’s heritage and depositing them with us will ensure that the history of the Society of Yorkshiremen in Malaya is not lost. Your deposit will help us to share these stories with future generations and enable researchers to gain a richer picture of life in York.
Now that your records have been deposited, we will put them through a programme of cataloguing and packaging that will aid online discovery and the preservation of your collection.  Once this process is complete, we will be able to make them available to researchers, subject to any access restrictions.
Your records will form part of the city’s c450 cubic metres of physical collections and our growing digital archive.  Together, these collections document nearly 900 years of York’s history…..”

If you’ve read all this way to the end of the story you may be interested in reading a bit about what the Society of Yorkshiremen in Malaya actually did. So, here are the 5 pages of Chapter 1 which provide a brief history of the Society (note that page number references in this text are preceded by either MB or TV. MB refers to the Society’s Minute Book. TV refers to pages in This Volume – the Hardback Book). I believe that the documents will have been indexed, packaged and made available in the York Archives within approximately 6 months – around the middle of 2026.

Figuring on playing to your Age?

Recently, a couple of people at my golf club have spoken to me about playing a round with a gross score that was equal to or lower than their age. One had just managed it, and the other would have done if one hole hadn’t been closed. Over the years several people have told me about their desire to achieve this feat; and I would certainly like to – though, being realistic, I’m probably not good enough. Anyway, these conversations got me thinking that a measure indicating how close you are to achieving it might be quite easy to calculate and maintain in the automated scoring and handicap systems that we use these days. It could simply be the gross score minus your age. To take account of courses having different par ratings, the score could be multiplied by course-par divided by 72. So, the formula would be: Gross Score x (Course Par/72) – Age. It could be called the RoundAge number (capital A to distinguish it from roundage which apparently is a local tax paid by a ship for the ground or space it occupies while in port). For example, if a 75-year-old got a gross score of 82 on a par 72 course, the RoundAge would be 82(72/72)-75 = 7. If, by some miracle, the golfer had a stellar round of 74 gross the following week, that RoundAge would be 74(72/72)-75 = -1. The RoundAge could be calculated for every card recorded, and averaged over each year to provide a longer-term graphical view of progress.

Preservation Maintenance Plan LITE template

Addenda to ‘Preservation Planning’

In 2021 I published v3.0 of a set of Preservation Planning templates which were designed to enable a rigorous Preservation regime to be applied to large collections of digital documents and their accompanying hardcopy material. However, in my recent investigations into the combination of collections it became apparent that a simpler and quicker approach would be more appropriate for multiple smaller collections with less complex formats. Therefore, a new Preservation Maintenance Plan LITE template has been produced and initially tested on two sets of 10 collections each. Further testing will be done over preservation cycles in the coming years, prior to issuing a version that can be said to be fit for purpose.  In the meantime, the current version is available for use at the link below.

Preservation MAINTENANCE PLAN LITE Template – v1.0, 09Sep2025