Embedding Parameters and new Evaluation Questions

Posted on 10Apr2026 by admin

The tests described in the previous post highlighted a number of questions which are discussed below:

The Text Chunk Overlap is described as ‘the maximum overlap of characters that occurs during chunking between two adjacent text Chunks’. Under what circumstances would an overlap occur, and if it occurs, does exactly the same set of overlap characters appear in the two Chunks?
ChatGPT’s Answer: Overlap occurs when you intentionally reuse part of the previous chunk to maintain continuity. And in typical character-based chunking, the overlap is exactly the same sequence of characters in both chunks.

1 cell = 1 Chunk: In my Mementos Index I have collapsed all the columns for a single entry (i.e. in one row) into a single cell [as advised by ChatGPT]. Hence the whole index in now just a series of cells in a single column. Will the AnythingLLM embedding function attempt to put each cell in a single Chunk provided the number of characters in a cell is less than the Text Chunk Size? What happens if the number of characters exceeds the Text Chunk Size. ChatGPT’s answer: Yes, a cell will usually become a single chunk if it fits within the chunk size. If it doesn’t, it will be split into multiple overlapping chunks, each embedded separately. If you’ve set an overlap, adjacent chunks will share overlapping text (as discussed in the answer to Question 1.)

Chunks starting with the beginning of an entry: When embedding my Mementos Index with all the entries in single cells in a single column, how can I ensure that each Chunk starts with the beginning of an entry regardless of whether the previous entry exceeded the Text Chunk Size or not?
ChatGPT’s answer: To ensure every chunk starts at the beginning of an entry: Best: ingest each cell as a separate document
Next best: split by a delimiter chunking
Avoid: relying on chunk size alone.
[I didn’t think this was a helpful answer – it seems there’s not much practically that can be done about this issue]

More Chunks than Snippet limit: What would the impact be on AI’s answer to a question which concerns more Chunks than the maximum allowable number of Context Snippets?
ChatGPT’s answer: The AI can only reason over the chunks it’s given. If relevant information exceeds the snippet limit answers become partial, biased, or occasionally incorrect. The fix is a balance of: chunk design, retrieval quality, and context size.

Duplicate Descriptions: There are two instances of the ‘Description’ field in my revised Mementos Index – one in the original ‘Description’ field, and one in the new ‘Item Label field which combines the ‘Reference Number’ field with the ‘Description field [as advised by ChatGPT]. For longer descriptions, this second instance of the ‘Description’ field may be resulting in the Maximum Number of Characters in a Chunk being exceeded. What would the impact on AI performance be if the ‘Item Label’ field was removed?
ChatGPT’s answer: Removing the duplicated Description from the Item Label will likely:
– reduce chunk splitting
– improve retrieval accuracy
– increase effective context usage
– produce more complete answers
and has little to no downside if the reference number is still present elsewhere.

Attaching files: I’m using AnythingLLM with Ollama and the Mixtral model on a laptop with Intel 9 core and 64Gb RAM. I’d prefer to have the model work on all parts of my Mementos Index with 2390 entries, rather than a limited number of Chunks. I believe that the way to do that is to not embed the index but to ATTACH the index (now adjusted so that all entries are in single cells in a single column) in .txt format. Will that work on my configuration? And roughly how long will it take for the AI to start printing out its answers?
ChatGPT’s answer: In AnythingLLM (even when using Ollama with Mixtral): Attaching a file does not bypass context limits. The model still has a maximum context window (how much text it can “see” at once). So: The model cannot process all 2,390 entries simultaneously unless they fit inside the context window. Mixtral typically supports around: ~32k tokens context (varies by version). Rough conversion: 1 token ≈ 3–4 characters 32k tokens ≈ 100k–120k characters. Your dataset reality – If you have 2,390 entries, even modest 200 characters per entry, that’s: 2390 × 200 ≈ 478,000 characters – that’s 4–5× larger than the entire context window. So, the system will truncate the file and only the first portion is seen by the model. In the worse case:
– System becomes slow or unresponsive
– Very long delay before generation starts
– Possible failure or timeout
The bottom line is:
❌ Attaching the full .txt will not let the model process all 2,390 entries
❌ It will likely truncate or slow down significantly
⏱️ You may wait 10–60+ seconds before any output—and still get incomplete results
✅ Embeddings + retrieval is the correct and scalable approach
✅ Your restructuring (one entry per cell) is exactly the right direction

Having acquired these learnings, I was ready to continue testing. First, however I knew I needed a more appropriate set of evaluation criteria. So, I put this final question to ChatGPT:

“When applying AI to an index to a collection in an Excel spreadsheet, what generally applicable evaluation questions could be asked which could not be answered directly by using the spreadsheet Filter facility?

The answer provided 4 questions in each of 11 categories, and then one extra question i.e. 45 questions in all. From these I selected the following five:

What are the main themes that run through the entire index?
Are there distinct phases or periods in the collection?
Which items suggest important life events or transitions?
What patterns or motifs repeat across the collection?
What are the top 5 most important entries, and why?

Armed with these new evaluation questions, and a better understanding of the settings and parameters associated with embedding, I set out to undertake another round of tests using the Mementos Index.

Initial Phase 1 Test Results

Posted on 27Mar2026 by admin

Over the last 10 days or so I’ve been running operational tests on the Mementos Index using the AnythingLLM front-end/embedding tool, and the Mistral and Mixtral models. The process I’ve been through has typically been a) to make some adjustment to the index (as advised by ChatGPT) to enable the AI system to produce better results; and then b) to run the 6 standard test queries with first the 4.2Gb Mistral model and then the much bigger (25.8Gb) Mixtral model.

The main changes made to the Index and what they were supposed to achieve are listed below:

Action	Changes	Benefits
Added an ‘Item Label’ column	Combined the Reference Number with the Description in the format:[Reference No] — [Description] and placed it in the new column.	It is descriptive, human readable, and AI readable. It is the primary semantic identity of each object.
Included the Guide	Changed the Guide to the Collection and its Index into a text (.txt) document and embedded it into the AnythingLLM workspace along with the Index itself.	It teaches the model how to interpret the data in the Index.
Normalised the Facets	Changes made to Facet contents to eliminate capital first letters (except for proper names), and minimise plurals.	Reduces duplication and improves matching
Added a ‘Primary Facet’ column	Took the first Facet in the Facet 1 column (which I have always regarded as the primary facet) and placed it into the new Primary Facet column.	AI tries to detect clusters, but without a dominant signal, clustering can become messy. Once a primary facet exists, AI can start discovering higher-level themes.
Added an ‘AI Context’ column	Combined Item Label with Facets 1 in the format: [Item Label]. Facets: [Facet 1 keywords] and placed it in the new column.	It combines all the key semantic signals into the same chunk of text, enabling the AI to retrieve by conceptual meaning rather descriptive text.
Created a ‘Search Keywords’ column	Identified specific meaningful searchable words within the ‘Description’ field by using a 10-line Excel formula supplied by ChatGPT. The formula also filtered out Stopwords (such as ‘the’) and retained ALL-CAPS words (like KRS).	Providing specific words to search enables the AI to dramatically improve recall.
Restructured all the index rows into single cells	Combined all 18 columns of information for each item into a single cell using a 19-line Excel formula provided by ChatGPT. The formula was placed into the 19^th column of the Index and pulled down. This column was then copied and placed into a Notepad file with a .txt extension.	Each item can now be embedded as a separate Chunk (provided Chunk size limits are not exceeded), enabling cleaner semantic matching, more accurate retrieval, more complete answers, and fewer errors.

I’ve also adjusted two different variables in the course of the testing I’ve carried out:

The Large Language Models (LLMs): I’ve performed all the tests using first the 4.2Gb Mistral model, and then the 25.8 Gb Mixtral model.
The number of ‘Context snippets’ passed to the LLM: I started with the default of 4 Context snippets, and towards the end of the testing I upped it to 40 (NB. Usually 1 Context snippet = 1 Chunk).

The Test Questions I asked the AI are listed below, and the measurements I took are as specified in a previous post.

What items are to do with the KRS? [KRS stands for Kodak Recreational Society]
What happened on the 20^th?
List the items relating to exam results
What linen is in the collection?
Are there any items relating to Aston Martin cars?
What documents are there about finances? (50%)

For each of the questions, I recorded the time it took the AI to start displaying the answer, and the percentage of correct answers produced. For each set of 6 questions, I also calculated the overall average time and overall average % correct. The summary results are in the table below.

Use of the Mistral model (with 4 Context snippets except for the last two tests)
*Progressive additions to base Index*	*Average time to start responding (seconds*	*Average % of correct answers*	*Result Assessment*
Item Label	2.5	43%	The AI identified a number of additional items that didn’t contain the keywords in the questions – Bank accounts, Receipt slips, and pension for the Finance question; Table cloths for the Linen question; and ‘results of…’ for the exam results question.
Guide document	3.5	49%	Adding the Guide as an embedded document made very little difference. It identified all the additional items that were identified before the Guide was added – as well as one extra item – suit cloth for the linen question (which is what increased its performance to 49%)
Normalised Facets	3.3	43%	This produced a poorer overall result than the un-normalised index (43% vs 49%) though this was largely a result of a completely incorrect answer to the KRS question. Other noticeable differences were a poorer performance on the finance question and a better performance on the linen question.
Primary Facet and AI Context	4.2	38%	This produced a poorer overall result than the normalised version (43%) which in turn was poorer than the un-normalised index (49%). Of note was that it recognised ‘debt’ as being to do with finances; but it failed to spot KRS (again).
Context snippets delivered increased to 40	4	39%	This produced a result almost exactly the same as with only 4 Context snippets – 39% vs 38%. However, it did identify 20-23Aug1993 as an event on the 20th. Unfortunately, it also said that a car had been sold at the Discount Bedding Centre whereas in fact it had crashed there.
One item in a single Excel cell in .txt format	3	33%	This produced a poorer result (33%) than all the tests delivering just 4 Context Snippets; though it did pick up a Finance item which hadn’t been identified in any of the previous tests.

Use of the Mixtral model (with 4 Context snippets except for the last two tests)
*Progressive additions to base Index*	*Average time to start responding (seconds*	*Average % of correct answers*	*Result Assessment*
Item Label	No test	No test	No test
Guide document	22.7	50%	The use of the much larger Mixtral model had very little impact. The major differences were a) the average time to start responding increased from 3 seconds to 23 seconds, and b) for the Finance question it identified 2 extra items (house sale/purchases). Very strangely, it followed its negative answer for the Aston Martin question by listing all 25 items that were in the 4 Context Snippets provided.
Normalised Facets	22	57%	This produced a slightly better result than the previous un-normalised test (57% vs 50%) due to improvements in the Finance, Linen and Exam Result questions. There was a strange result for the ’20th’ question in which the AI listed all the context Snippets it had been provided for the exam result, linen, and Aston Martin questions.
Primary Facet and AI Context	20.7	49%	Overall, this expanded version of the Index performed pretty much the same as the earlier version without the new fields apart from the particularly noticeable 4 hallucinations the AI produced for the question about finances – complete with reference numbers already occupied by other items. This hadn’t happened before.
Chunks delivered increased to 40	23.2	38%	Mixtral’s performance was significantly worse with the 40 Context Snippets delivered than it was with just 4 Context Snippets (38% vs 49%); on top of which it completely made-up descriptions for 4 reference numbers. Interestingly, Mistral’s performance was 39% with the 40 Context Snippets, so, on this evidence, there was little advantage to be had with Mixtral despite it taking significantly longer to respond.
One item in a single Excel cell in .txt format	22.3	26%	Not only did this produce a poorer result than the previous unformatted index using the Mixtral model (26% vs 38%), but this result was actually the worst in this whole series of Phase 1 tests. It also completely hallucinated an Aston Martin purchase. Perhaps one mitigating factor is that individual items often exceeded the max number of characters in a chunk (1000) resulting in chunks often consisting of bits of one item and bits of another.

There are two conclusions to be drawn from these results. First, the AI is a lot worse at word search than an Excel spreadsheet; and, second, making various changes to the Index being tested seemed to make the AI perform worse not better. On the plus side, however, there are several instances of the AI correctly identifying relevant items without the exact words in the question being present in the item’s record. Unfortunately, a number of instances also occurred in which the AI hallucinated and made things up.

There may be good reason for these occurrences. As ChatGPT made it clear to me, Indexes are often short, compressed, and keyword-based, whereas AI embeddings work best with descriptive sentences; and AI is probably at its best when asking exploratory questions. Furthermore, the questions I’ve been asking are not very prescriptive; ChatGPT suggests using the following text constructs when needed

“Using ONLY the provided documents” to reduce hallucination
“Do not invent information” to force restraint
“If unsure, say…” → prevents guessing

Another reason why performance appears to have been poor may have been because of the Chunking parameters that were used. Most of the tests were run with Chunks that combined multiple items – which I believe is not ideal. Even when each item was confined to a single cell in the final test, the resulting Chunks were not limited to one item – probably because Chunk size was limited to 1000 characters and some items overran this limit. I need to explore this issue some more.

A second parameter that may have affected performance is the maximum number of Context Snippets that could be delivered with a question to the model. Most of the tests were limited to 4 Context Snippets, though this was increased to 40 in the final test. This clearly would have an impact on the answers provided – particularly if the number of relevant items exceeds the maximum allowable number of Context Snippets. This too requires further exploration.

This is where I’m up to in this first phase exercise. The tests I’ve done have undoubtedly increased my familiarity with the software being used and with the outputs that can be expected. However, I need to do some further investigations as described above to fully understand how things are actually working; and it seems I need to rethink what I actually want the AI system to inform me about so that I can specify more appropriate evaluation measures. I shall attempt to address these points before drawing this first phase of the overall investigation to a close.

A software stack for 64Gb

Posted on 20Mar2026 by admin

When I wrote the previous post just over a week ago, I thought I had an operational AI configuration; but it turned out not to be the case. I was getting “Ollama not responding” more often than not when I sent in my prompts, and I eventually concluded that my 8-year-old laptop just wasn’t up to the job. I had already planned to upgrade it later this year, so I decided to bring that forward and do it right away. I elected to buy an HP Omen laptop with 64Gb RAM and it duly arrived on Monday 16^th March. There followed an intense period of installing applications and transferring data from my old laptop. There were some problems – there always are when you get a new machine – but by the following afternoon I was ready to restart my AI journey.

I started again by asking ChatGPT what tools and models it would suggest for my new 64Gb laptop, and it recommended LM Studio running the “Mistral 7B Instruct” model with AnythingLLM providing the front end and RAG capability. I duly downloaded and installed all this software, but hit a problem when I entered my first query: AnythingLLM is set up to provide a variety of system prompts (instructions that shape the AI’s responses and behaviour) which are not recognised by LM Studio and the Mistral model. ChatGPT first advised me to run another model, and when that didn’t work either, it suggested disabling AnythingLLM’s System Prompts. Unfortunately, AnythingLLM wouldn’t let me do that. Eventually, after about two and a half hours, I gave up trying to troubleshoot the problem and took up another of ChatGPT’s suggestions to replace LM Studio with Ollama running another Mistral model. This change only took about 15 minutes – and it worked! I started running my test questions through the new configuration and was getting answers back in 2-6 seconds – every time!

Now, throughout this process I was following ChatGPT’s guidance. I simply don’t have the knowledge to do any of this on my own, and, I must say, ChatGPT has been very clear and helpful; most answers provide options, a rationale for its suggestions, and a final summary of what should be done. However, as demonstrated by my above experiences, ChatGPT is not necessarily familiar with all aspects of all available products, nor fully aware of all potential problems. If it was, it wouldn’t have suggested the initial pairing of LM Studio with Mistral and AnythingLLM. Furthermore, when asked about functionality in a particular product it often offers various possibilities depending on which version is being used, suggesting a general knowledge rather than specific expertise. Of course, this is exactly what should be expected from an AI system. After all it is only predicting the next word based on a whole load of training data.

Let me be clear: the guidance I’ve already received from ChatGPT has enabled me to make considerable progress in a relatively short period of time; and I plan to continue to rely on it to guide my future steps in this journey – after all I have no other option. However, I will remain alert to the possibility of its advice being incomplete or unsound or even wrong; and I will rely on the actual experiences I have with the software itself, to draw my own conclusions.

Some simple Evaluation Metrics

Posted on 11Mar2026 by admin

Once I got the software installed and working, I started to ask questions about my Memento collection. However, the answers that came back were not very encouraging: some were incomplete and others were just completely wrong. I hoped that the suggestions ChatGPT had previously made about structuring the CSV file would improve the results, but I realised that in order to check if that was indeed the case, I would need some way of evaluating how well the system was performing – just as had been suggested in the Preparedness guidelines.

I deliberately decided not to go overboard with my evaluation metrics at this stage: what I needed was a small simple set that could be applied relatively quickly and that would produce some numbers I could compare across different versions of the input documents and the system configuration. I came up with the following six questions (the percentages are the first set of results as described further on below):

What items are to do with the KRS? [KRS standing for Kodak Recreational Society] (0%)
What happened on the 20^th? (0%)
List the items relating to exam results (25%)
What linen is in the collection? (50%)
Are there any items relating to Aston Martin cars? [there are some individuals called Martin in the Index] (100%)
What documents are there about finances? (50%)

For each question I knew that, when I opened the CSV file in Excel, I could use the filter facility to get a definitive number of items that answered the question. So, to assess the answers provided by AI I added the number of the items identified by the Filter that the AI had reported correctly, to the number of additional correct answers identified by the AI (Total correct answers); and then divided that number by the sum of a) the number of answers identified by the filter, b) the number of additional correct answers the AI identified, and c) the number of incorrect answers that the AI identified (Total number of answers overall).

For example, for the question about listing the items relating to exam results, the filter identified 2 items (2 FILTER) but AI didn’t report either of them (0 CORRECT). However, it did report two items in which the word exam and results appeared separately (2 ADDITIONAL CORRECT). It also reported 3 items in which just the words exam or exams appeared (3 INCORRECT), and another item concerning an assessment but in which neither the words exam or results are present (1 INCORRECT). This produced a result of (0+2)/(2+2+3+1) = 2/8 = 25%.

The results for this rudimentary version of the CSV file were as shown against each of the questions listed above. The overall result was 38%. While this is in no way a definitive analysis, it nevertheless will enable a comparison to be made between different implementations. I intend to use it at least for the remainder of this first phase.

Installing the AI Software

Posted on 10Mar2026 by admin

The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (https://anythingllm.com/) and selected the ‘Download for desktop’ box. It took about 13 minutes to download the 370Mb programme and install it. On opening the application, I was told that a) it had selected the best model (Qwen3Vision2BInstruct) for my hardware (a 9-year old Windows 11 laptop with 8Mb of RAM); b) that I was to use the LanceDB Vector Database; c) that these settings could be modified anytime; and d) that the model, chats, vectors and document text would all be stored privately on that instance of AnythingLLM and would only be accessible on that device.

I uploaded my mementos CSV file, and then got a Warning Message saying something like ‘the workspace is using 102,OOO of its 3,500 available tokens. Choose how you want to proceed – Continue anyway or Embed file’. The associated help page says:

“Continue Anyway: Will continue to add the document full text to the chat window, but data will be lost in this process as AnythingLLM will automatically prune the context to fit. You should not do this as you will experience inaccurate LLM behaviour.

Embed: Will embed the document (RAG) and add it to the workspace. This will allow the LLM to use the document as a source of information, but it will not be able to use the full text of the document. This option may or may not be visible depending on your permissions on the workspace.”

I selected Embed and that process took about 5 minutes.

I then asked some questions such as ‘Show me items relating to xxx” but consistently got the reply: ‘Could not respond to message. Model requires more system memory (8.7 GiB) than is available (5.4 GiB)’. I asked ChatGPT how much memory AnythingLLM needed to run and it said that the model that had been selected wasn’t suitable for a machine with 8Gb of RAM. Instead, it said I should use the Ollama phi3:mini model and advised how to obtain it. However, that didn’t work, so ChatGPT said that meant that Ollama wasn’t on my machine and that I needed to download and install that first, and provided me a website link to do so.

I installed Ollama (which included installing a redistributable version of Visual C) and restarted my laptop as instructed by ChatGPT. Then I installed phi3:mini by typing ‘ollama pull phi3:mini’ at the Command Line prompt as instructed by ChatGPT. Then I had to select the Ollama LLM in AnythingLLL by going into the Workspace settings (the little rose icon) and selecting Ollama. While in that Ollama section of the drop-down there was another settings rose icon which had to be clicked to access the Ollama-specific settings screen in which ChatGPT had advised me to place ‘http://localhost:11434’ in the ‘Ollama Base URL’ field.

At this point I noted that ‘phi3:mini’ was correctly displayed in the ‘Ollama Model’ field. Having done all this I was able to select the Mementos CSV document in Anything LLM and have it embedded; after which I was able to ask some questions and to get some answers.

Now, what was going on in all of this? This is what I discovered after having a few exchanges with ChatGPT:

The software that is needed for AI has three layers:

The model (Phi-3 Mini) → the “brain” that generates text.
Ollama → the engine that runs the model locally.
AnythingLLM → the interface and workflow tool you interact with.

The Model (Phi-3 Mini) is the AI brain – the trained neural network that produces answers by:

Predicting the next token in text
Generating responses to prompts
Using knowledge learned during training.

Ollama is the system that runs the AI model on the computer. It does the following:

Loads the model
Sends your prompt to it
Streams the response back
Performs other functions such as loading models into RAM and providing an API server for other applications.

AnythingLLM is the user interface and AI workflow platform which does the following:

Connects to Ollama
Sends prompts to the model
Displays responses
Manages workspaces
Embeds knowledge sources (RAG)
Keeps chat history
Handles embeddings and document search.

This architecture is flexible: it enables different products to be switched into any of the three components while keeping the other two the same.

AnythingLLM embeds knowledge sources by retrieving information in external documents, encoding that information and placing it into a Vector Database (in this case, LanceDB). The steps it takes to do this are:

It turns text into numbers. This is known as Embedding in which text is converted into numerical vectors that represent meaning. For example , “The cat sat on the mat” becomes something like: [0.213, -0.551, 0.889, …]
Storing the embeddings in the Vector Database along with Chunks of text and References to the original documents. For example, Vector: [0.213, -0.551, 0.889 …]; Text: “The mitochondria is the powerhouse of the cell.”; Source: biology_notes.pdf.

When you ask a question, AnythingLLM finds the closest matching vectors. So, overall, the RAG (Retrieval-Augmented Generation) process works as follows:

Step 1 — Document chunking. When you upload a document to AnythingLLM, the document is split into small sections (PDF → paragraphs → chunks).

Step 2 — Embedding creation. Each chunk is converted to a vector.

Step 3 — Storage. The vectors are stored in the vector database.

Step 4 — Question time. When you ask a question such as “What causes tides?”, AnythingLLM:

converts the question into an embedding
searches the vector database
retrieves the most similar chunks.

Step 5 — Context injection. The retrieved chunks are added to the question and the combined prompt is sent to Phi-3 Mini.

Step 6 — AI generates answer. Now the model answers using your documents, not just training data; and the answer is shown in shown in AnythingLLM.

Unfortunately, even after I had successfully installed a working configuration, the system occasionally could not respond, and the results were often incomplete or incorrect. In ChatGPT’s opinion these problems are most likely being caused by:

The model temporarily exhausting RAM: I only have 8Gb of RAM and the AI components probably take up between 5 and 6GB (AnythingLLM app ~500 MB, Vector database + embeddings ~200–400 MB, Phi-3 Mini loaded in Ollama ~3–4 GB, Prompt + generation buffers ~0.5–1 GB). Adding the 2-3 GB taken up by the Operating System, means that every now and again I’m probably hitting a 7-8 GB total resulting in the Operating System occasionally swapping memory to disk, Ollama pausing or timing out, and AnythingLLM reporting “Ollama not responding”.
Poor data structure in the CSV file: RAG systems like AnythingLLM perform best with short natural-language passages, not table rows. When CSVs are embedded directly, column relationships are lost, retrieval becomes noisy, and the model guesses incorrectly. Hence ChatGPT’ suggestions in the previous post for how to refine the contents of the CSV file.
A limitation in the Phi-3 Mini model’s capability. Phi-3 Mini is optimized for low-memory environments, while larger models typically provide higher completeness and accuracy.

Still, I do at least now have a working system which I can experiment with – even if I have to occasionally put up with “Ollama not responding”. The following post documents how well or otherwise this initial configuration performed.

Index Adjustments for AI

Posted on 07Mar2026 by admin

Having completed the Preparedness steps, I asked ChatGPT the following question:

“I have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on the Reference No and Description information. The Index file is named ‘Memento Collection Index for AI’ and it is located in my laptop at C:\Users\pwils\Documents\AI. The first two rows of the Index file contain descriptive information about the file and can be ignored. The 3rd row contains the headers for each of the Index fields. There are fourteen fields in all with the first two titled ‘Reference No’ and ‘Description’. What’s the first thing I should do to create the RAG interrogation capability?”

ChatGPT responded with advice to remove the first two rows in the spreadsheet, and to convert it to a csv file. In subsequent exchanges, ChatGPT suggested the following changes and additions to the csv file which would enable the AI to provide more insightful answers:

Create a new column called ‘Item Label’ which combines the Reference No and the Description separated by a hyphen (see the relevant ChatGPT conversation).
Normalize the two Facet fields (the index has a Facet 1 and a Facet 2 field. If there is only 1 entry in Facet 1, Facet 2 is empty. If there is a second keyword in facet 1 (separated from the first keyword by a comma), then both keywords appear in Facet 2 but in reverse order). Normalizing means, a) lowercasing all the words, b) avoiding plurals, c) keeping the facets short – preferably just 1 word.
Add a ‘Primary Facet’ column which contains whichever of the two facets is considered to be the dominant one.
Add an ‘AI Context’ column which combines the ‘Item Label’ text with the ‘Facet 1’ text in the format [Item Label text]. Facets: [Facet 1 text].
Add a ‘Collection Themes’ column which contains 1-3 broader thematic categories than the more specific Facets. For a collection this size there should be between 12 and 20 Themes. These do not currently exist in the Index and would have to be identified and then allocated to each line item. However, it seems that the AI could come up with an initial list of themes by analysing the contenst of the ‘Item Label’ and the ‘Facet’ fields.
Add a ‘Theme Cluster’ column – containing a short name representing a group of objects that share a pattern. For a collection this size there should be between 25 and 40 clusters. Again, it seems that the AI could come up with an initial list of clusters by analysing the ‘Item Label’ and ‘Facet’ fields.
Add a ‘Cluster Signature’ column which combines the ‘Primary Facet’ and the ‘Collection Theme’ fields in the format [Primary Facet text] | [Collection Theme text].
Add a ‘Related concepts’ column which contains 1 -3 broader conceptual ideas associated with the object. For a collection this size there should be 20-30 of these – preferably single words. These do not currently exist in the Index and would have to be identified and allocated. I’m not sure if the AI could help to identify them or not.
Add an ‘Outlier score’ column which indicates how unusual an item is within the collection. Possible values could be: 1 Very typical object, 2 Moderately distinctive, 3 Unusual, 4 Very Unusual, 5 Unique or rare in the collection. This information does not currently exist in the database and would have to be specified for each item (though among the fields that have been removed for this AI exercise, ‘Unusual’ items are identified).
Add an ‘Object links’ column which lists the Reference Numbers of other objects that are meaningfully related, in the format RefNo, RefNo, RefNo. This information does not currently exist in the Index and would have to be specified for each item – potentially quite a big job.

At this point I decided that, for this first stage in this journey, I would simply stick with the very first suggestion – to create a new column called ‘Item Label’ combining the Reference No and the Description separated by a hyphen. Once I have something working, I can return to these other sophistications.

In the course of this extended exchange, ChatGPT also offered to provide “the exact 40-line Python script that will turn your spreadsheet into a working RAG search system for the 2993 mementos”. I accepted and in the course of the subsequent interchange was offered an easier approach which involves acquiring a desktop RAG tool called AnythingLLM which would run locally and require no programming. The latter sounded exactly what I needed and I set about downloading and installing it.

Preparing the Memento Collection

Posted on 06Mar2026 by admin

This is the start of my attempt to undertake Phase1 of my investigation into providing an AI interrogation capability for archives. Phase 1 concerns providing AI support for my Memento collection’s index entries.

The first step was to apply the advice in the recent publication “AI preparedness guidelines for archivists” by Prof. Giovanni Colavizza and Prof. Lise Jaillant. This suggests addressing four main areas (referred to as Pillars). My analysis of the Memento collection’s preparedness relating to each of the four areas is recorded in a Memento Preparedness document and summarised below:

Pillar 1 – Completeness and excluded data. The collection is complete; all items in the Index are to be interrogated by AI in this phase; no items have been excluded.

Pillar 2 – Metadata and access. 14 fields are in columns in the Index and these are available for use for AI interrogation. However, there are many additional columns (containing various analytical data) which are to be removed for this exercise. All information remaining in the Index after the additional columns have been removed, will be available for AI interrogation; no information will be subjected to restricted access in this exercise. Provenance and relationship information is embedded in the Reference No, and sometimes in the Notes field. An extensive range of narrative information about the collection and the Index is contained in a Guide worksheet within the Index spreadsheet.

Pillar 3 – Data types, formats, and ﬁle structures. Before making any changes to the Index file an assessment will be made as to whether the change is wanted in the original index or not. If it is not, a copy of the Index will be made. A variety of different file formats are present in the digital files of the collection, but the vast majority are either .pdf, .docx, or .jpg documents. Some standardisation changes may be required in some of the index fields. Folder names distinguish between the different collection components and link back into the overall Collection folder structure. All item digital File Titles contain the relevant Reference Number.

Pillar 4 – Application-speciﬁc metrics and evaluation. The ability to find what you are looking for in the Index is the primary requirement of collection users. Another requirement is to find the last Reference Number used overall or in a particular series, in order to specify the appropriate next Reference Number for an item you are adding to the collection. How these and other criteria should be translated into evaluation metrics will be considered through the course of the project.

As a result of the above analysis the following 14 actions were identified (the notation ‘P1.1’ stands for ‘the first action in P1 – Phase 1’; P1.2 is the second action in Phase 1; and so on).

P1.1 Mem(Index). Ensure completeness and normalisation. All 14 fields were checked to eliminate blanks and to normalise content where necessary.

P1.2 Mem(Index). Remove columns O to BX from the file used for this AI work. Columns O to BX were removed.

P1.3 Mem(All). Document all the Provenance and Relationship info embedded within the Index and the File Titles. The Guide was expanded to describe, a) the 14 fields, b) how the digital filename is constructed, and c) how the collection came about (which includes references to posts in the pwofc.com website).

P1.4 Mem(All). Observe how the Provenance and Relationship info is used to create guidelines for producing such documentation. To be revisited during the implementation of Phase 1

P1.5 Mem(Index). Identify any extra narrative info that is available or is needed. None needed.

P1.6 Mem(Index). Produce any extra narrative info that is required. None needed

P1.7 Mem(All). Carry out ‘wanted or not in the original index’’ check before each action. Done

P1.8 Mem(Items). Check what formats exist in the collection files. 16 different file formats are present in the collection – DOC, DOCX, FMP12, HTM, JPG, M4A, MP3, MP4, PDF, PDF-A1-b, PPTX, TIFF, XLSM, XLSX, XLS, ZIP.

P1.9 Mem(Items). Define AI-friendly standard formats: Only the Index to the collection (an XLSX document) is to be used in this phase, and this will be converted into a csv file for the purpose.

P1.10 Mem(Items). Make any changes to existing formats to conform to new standards. An AI friendly file in csv format was derived from the original Index document. Since you can’t create a csv with multiple worksheets, two new files were created: one with the file name ‘Mementos Collection Index for AI Phase 1.csv’, and another with the file name ‘Mementos Collection Guide for AI Phase 1’.

P1.11 Mem(All). Document the folder structure for the derivative file. For all this AI work, a new folder was created into which these derived files, and all other files derived for AI purposes, will be placed: C:\Users\pwils\Documents\AI.

P1.12 Mem(All) – Find out what ‘supports programmatic retrieval’ means in practice. ChatGPT advised that this usually means: querying a vector database, calling a search API, pulling documents from a content repository, and fetching structured data from a database.

P1.13 Mem(All) – Make any changes necessary to support programmatic retrieval. I don’t have enough knowledge yet to understand if any changes are needed to support that process. I will have to revisit this question when I actually start to try to implement the capability.

P1.14 Mem(All) – Prompt for ideas about success metrics as each action is taken in the course of the project. This question will be revisited as work on this phase progresses.

This brought to an end the Preparedness work which took a total of approximately 7 hours. The next step was to try and implement an AI capability: to start this process I asked ChatGPT what should be the first thing I do to create a RAG interrogation capability for the Memento’s Index (RAG stands for Retrieval-Augmented Generation – whereby the AI is not trained on the archive, but instead the archive data is provided to the AI at answer time). What followed is reported in the next post.

A Plan with a rather empty Kitbag

Posted on 01Mar2026 by admin

A couple of weeks ago I realised that there was a gap in the research I’ve been doing on Collecting by Individuals – how to apply AI to Personal Collections. After a further two weeks thinking it over, I’ve decided that I need to bite the bullet and to learn by doing it myself on some of my own collections.

I’m embarking on this journey with very little relevant knowledge. However, I do at least have the recent report on AI Preparedness guidelines for Archivists, as well as some exchanges with ChatGPT about what can be done. I’m hoping that these will get me up and running, and that ChatGPT may be able to help me with things I don’t understand. With these tools in my (rather empty) kitbag the notes below outline what I plan to do.

My overall objective is to provide detailed guidelines for individuals who want to apply AI to interrogate their own private collections. This may also involve enhancing the current OFC Tutorial and/or creating a separate tutorial.

The strategy I intend to follow is to conduct the work in a series of phases, going from the simplest possible implementation I can define to progressively more comprehensive and complex implementations. I will use two of my own collections: a collection of Mementos which has an index of some 2390 entries and 2730 digital files; and my PAWDOC collection of work files which has an index of some 17380 entries and around 31,300 digital files.

The phases I currently plan to undertake are as follows (though these may well be rejigged as I gain more experience and knowledge):

AI support for the Memento collection’s index entries;
AI support for the Memento collection’s combined Index entries and file titles;
AI support for the Memento collection’s index entries, file titles and textual items;
AI support for PAWDOC’s index entries;
AI support for PAWDOC’s combined index entries and file titles
AI support for PAWDOC’s combined index entries, file titles, and some or all of the born digital items
AI support for a subset of PAWDOC’s scanned items
AI support for a combination of index entries, file titles, some born digital material and some scanned items
AI support for the whole of PAWDOC

Timescales: With my current lack of knowledge I don’t know how long this is all going to take. However, I shall aim to try and have the first phase completed in not more than 1 year.

Ideally, I would like to find some knowledgeable collaborators who have relevant experience and who would guide me through the work (please do get in touch if you are interested). However, it could be hard to find the right people who have sufficient interest and the time to spare. I shall take some steps to try and find some such individuals, but won’t let that endeavour delay my start on Phase 1. I am reconciled to probably having to do most of the work without any permanent collaborator support.

OFC

Order from Chaos, Digitisation, and their intersection

Category Archives: AI for Personal Archives