A software stack for 64Gb

When I wrote the previous post just over a week ago, I thought I had an operational AI configuration; but it turned out not to be the case. I was getting “Ollama not responding” more often than not when I sent in my prompts, and I eventually concluded that my 8-year-old laptop just wasn’t up to the job. I had already planned to upgrade it later this year, so I decided to bring that forward and do it right away. I elected to buy an HP Omen laptop with 64Gb RAM and it duly arrived on Monday 16th March. There followed an intense period of installing applications and transferring data from my old laptop. There were some problems – there always are when you get a new machine – but by the following afternoon I was ready to restart my AI journey.

I started again by asking ChatGPT what tools and models it would suggest for my new 64Gb laptop, and it recommended LM Studio running the “Mistral 7B Instruct” model with AnythingLLM providing the front end and RAG capability. I duly downloaded and installed all this software, but hit a problem when I entered my first query: AnythingLLM is set up to provide a variety of system prompts (instructions that shape the AI’s responses and behaviour) which are not recognised by LM Studio and the Mistral model. ChatGPT first advised me to run another model, and when that didn’t work either, it suggested disabling AnythingLLM’s System Prompts. Unfortunately, AnythingLLM wouldn’t let me do that. Eventually, after about two and a half hours, I gave up trying to troubleshoot the problem and took up another of ChatGPT’s suggestions to replace LM Studio with Ollama running another Mistral model. This change only took about 15 minutes – and it worked! I started running my test questions through the new configuration and was getting answers back in 2-6 seconds – every time!

Now, throughout this process I was following ChatGPT’s guidance. I simply don’t have the knowledge to do any of this on my own, and, I must say, ChatGPT has been very clear and helpful; most answers provide options, a rationale for its suggestions, and a final summary of what should be done. However, as demonstrated by my above experiences, ChatGPT is not necessarily familiar with all aspects of all available products, nor fully aware of all potential problems. If it was, it wouldn’t have suggested the initial pairing of LM Studio with Mistral and AnythingLLM. Furthermore, when asked about functionality in a particular product it often offers various possibilities depending on which version is being used, suggesting a general knowledge rather than specific expertise. Of course, this is exactly what should be expected from an AI system. After all it is only predicting the next word based on a whole load of training data.

Let me be clear: the guidance I’ve already received from ChatGPT has enabled me to make considerable progress in a relatively short period of time; and I plan to continue to rely on it to guide my future steps in this journey – after all I have no other option. However, I will remain alert to the possibility of its advice being incomplete or unsound or even wrong; and I will rely on the actual experiences I have with the software itself, to draw my own conclusions.

Some simple Evaluation Metrics

Once I got the software installed and working, I started to ask questions about my Memento collection. However, the answers that came back were not very encouraging: some were incomplete and others were just completely wrong. I hoped that the suggestions ChatGPT had previously made about structuring the CSV file would improve the results, but I realised that in order to check if that was indeed the case, I would need some way of evaluating how well the system was performing – just as had been suggested in the Preparedness guidelines.

I deliberately decided not to go overboard with my evaluation metrics at this stage: what I needed was a small simple set that could be applied relatively quickly and that would produce some numbers I could compare across different versions of the input documents and the system configuration. I came up with the following six questions (the percentages are the first set of results as described further on below):

  • What items are to do with the KRS? [KRS standing for Kodak Recreational Society] (0%)
  • What happened on the 20th? (0%)
  • List the items relating to exam results (25%)
  • What linen is in the collection? (50%)
  • Are there any items relating to Aston Martin cars? [there are some individuals called Martin in the Index] (100%)
  • What documents are there about finances? (50%)

For each question I knew that, when I opened the CSV file in Excel, I could use the filter facility to get a definitive number of items that answered the question. So, to assess the answers provided by AI I added the number of the items identified by the Filter that the AI had reported correctly, to the number of additional correct answers identified by the AI (Total correct answers); and then divided that number by the sum of a) the number of answers identified by the filter, b) the number of additional correct answers the AI identified, and c) the number of incorrect answers that the AI identified (Total number of answers overall).

For example, for the question about listing the items relating to exam results, the filter identified 2 items (2 FILTER) but AI didn’t report either of them (0 CORRECT). However, it did report two items in which the word exam and results appeared separately (2 ADDITIONAL CORRECT). It also reported 3 items in which just the words exam or exams appeared (3 INCORRECT), and another item concerning an assessment but in which neither the words exam or results are present (1 INCORRECT). This produced a result of (0+2)/(2+2+3+1) = 2/8 = 25%.

The results for this rudimentary version of the CSV file were as shown against each of the questions listed above. The overall result was 38%. While this is in no way a definitive analysis, it nevertheless will enable a comparison to be made between different implementations. I intend to use it at least for the remainder of this first phase.

Installing the AI Software

The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (https://anythingllm.com/) and selected the ‘Download for desktop’ box. It took about 13 minutes to download the 370Mb programme and install it. On opening the application, I was told that a) it had selected the best model (Qwen3Vision2BInstruct) for my hardware (a 9-year old Windows 11 laptop with 8Mb of RAM); b) that I was to use the LanceDB Vector Database; c) that these settings could be modified anytime; and d) that the model, chats, vectors and document text would all be stored privately on that instance of AnythingLLM and would only be accessible on that device.

I uploaded my mementos CSV file, and then got a Warning Message saying something like ‘the workspace is using 102,OOO of its 3,500  available tokens. Choose how you want to proceed – Continue anyway or Embed file’. The associated help page says:

“Continue Anyway: Will continue to add the document full text to the chat window, but data will be lost in this process as AnythingLLM will automatically prune the context to fit. You should not do this as you will experience inaccurate LLM behaviour.

Embed: Will embed the document (RAG) and add it to the workspace. This will allow the LLM to use the document as a source of information, but it will not be able to use the full text of the document. This option may or may not be visible depending on your permissions on the workspace.”

I selected Embed and that process took about 5 minutes.

I then asked some questions such as ‘Show me items relating to xxx” but consistently got the reply: ‘Could not respond to message. Model requires more system memory (8.7 GiB) than is available (5.4 GiB)’. I asked ChatGPT how much memory AnythingLLM needed to run and it said that the model that had been selected wasn’t suitable for a machine with 8Gb of RAM. Instead, it said I should use the Ollama phi3:mini model and advised how to obtain it. However, that didn’t work, so ChatGPT said that meant that Ollama wasn’t on my machine and that I needed to download and install that first, and provided me a website link to do so.

I installed Ollama (which included installing a redistributable version of Visual C) and restarted my laptop as instructed by ChatGPT. Then I installed phi3:mini by typing ‘ollama pull phi3:mini’ at the Command Line prompt as instructed by ChatGPT. Then I had to select the Ollama LLM in AnythingLLL by going into the Workspace settings (the little rose icon) and selecting Ollama. While in that Ollama section of the drop-down there was another settings rose icon which had to be clicked to access the Ollama-specific settings screen in which ChatGPT had advised me to place ‘http://localhost:11434’ in the ‘Ollama Base URL’ field.

At this point I noted that ‘phi3:mini’ was correctly displayed in the ‘Ollama Model’ field. Having done all this I was able to select the Mementos CSV document in Anything LLM and have it embedded; after which I was able to ask some questions and to get some answers.

Now, what was going on in all of this? This is what I discovered after having a few exchanges with ChatGPT:

The software that is needed for AI has three layers:

  1. The model (Phi-3 Mini) → the “brain” that generates text.
  2. Ollama → the engine that runs the model locally.
  3. AnythingLLM → the interface and workflow tool you interact with.

The Model (Phi-3 Mini) is the AI brain – the trained neural network that produces answers by:

  • Predicting the next token in text
  • Generating responses to prompts
  • Using knowledge learned during training.

Ollama is the system that runs the AI model on the computer. It does the following:

  • Loads the model
  • Sends your prompt to it
  • Streams the response back
  • Performs other functions such as loading models into RAM and providing an API server for other applications.

AnythingLLM is the user interface and AI workflow platform which does the following:

  • Connects to Ollama
  • Sends prompts to the model
  • Displays responses
  • Manages workspaces
  • Embeds knowledge sources (RAG)
  • Keeps chat history
  • Handles embeddings and document search.

This architecture is flexible: it enables different products to be switched into any of the three components while keeping the other two the same.

AnythingLLM embeds knowledge sources by retrieving information in external documents, encoding that information and placing it into a Vector Database (in this case, LanceDB). The steps it takes to do this are:

  • It turns text into numbers. This is known as Embedding in which text is converted into numerical vectors that represent meaning. For example , “The cat sat on the mat” becomes something like: [0.213, -0.551, 0.889, …]
  • Storing the embeddings in the Vector Database along with Chunks of text and References to the original documents. For example, Vector: [0.213, -0.551, 0.889 …]; Text: “The mitochondria is the powerhouse of the cell.”; Source: biology_notes.pdf.

When you ask a question, AnythingLLM finds the closest matching vectors. So, overall, the RAG (Retrieval-Augmented Generation) process works as follows:

Step 1 — Document chunking. When you upload a document to AnythingLLM, the document is split into small sections (PDF → paragraphs → chunks).

Step 2 — Embedding creation. Each chunk is converted to a vector.

Step 3 — Storage. The vectors are stored in the vector database.

Step 4 — Question time. When you ask a question such as “What causes tides?”, AnythingLLM:

  • converts the question into an embedding
  • searches the vector database
  • retrieves the most similar chunks.

Step 5 — Context injection. The retrieved chunks are added to the question and the combined prompt is sent to Phi-3 Mini.

Step 6 — AI generates answer. Now the model answers using your documents, not just training data; and the answer is shown in shown in AnythingLLM.

Unfortunately, even after I had successfully installed a working configuration, the system occasionally could not respond, and the results were often incomplete or incorrect. In ChatGPT’s opinion these problems are most likely being caused by:

  • The model temporarily exhausting RAM: I only have 8Gb of RAM and the AI components probably take up between 5 and 6GB (AnythingLLM app ~500 MB, Vector database + embeddings ~200–400 MB, Phi-3 Mini loaded in Ollama ~3–4 GB, Prompt + generation buffers ~0.5–1 GB). Adding the 2-3 GB taken up by the Operating System, means that every now and again I’m probably hitting a 7-8 GB total resulting in the Operating System occasionally swapping memory to disk, Ollama pausing or timing out, and AnythingLLM reporting “Ollama not responding”.
  • Poor data structure in the CSV file: RAG systems like AnythingLLM perform best with short natural-language passages, not table rows. When CSVs are embedded directly, column relationships are lost, retrieval becomes noisy, and the model guesses incorrectly. Hence ChatGPT’ suggestions in the previous post for how to refine the contents of the CSV file.
  • A limitation in the Phi-3 Mini model’s capability. Phi-3 Mini is optimized for low-memory environments, while larger models typically provide higher completeness and accuracy.

Still, I do at least now have a working system which I can experiment with – even if I have to occasionally put up with “Ollama not responding”. The following post documents how well or otherwise this initial configuration performed.

Index Adjustments for AI

Having completed the Preparedness steps, I asked ChatGPT the following question:

“I have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on the Reference No and Description information. The Index file is named ‘Memento Collection Index for AI’ and it is located in my laptop at C:\Users\pwils\Documents\AI. The first two rows of the Index file contain descriptive information about the file and can be ignored. The 3rd row contains the headers for each of the Index fields. There are fourteen fields in all with the first two titled ‘Reference No’ and ‘Description’. What’s the first thing I should do to create the RAG interrogation capability?”

ChatGPT responded with advice to remove the first two rows in the spreadsheet, and to convert it to a csv file. In subsequent exchanges, ChatGPT suggested the following changes and additions to the csv file which would enable the AI to provide more insightful answers:

  • Create a new column called ‘Item Label’ which combines the Reference No and the Description separated by a hyphen (see the relevant ChatGPT conversation).
  • Normalize the two Facet fields (the index has a Facet 1 and a Facet 2 field. If there is only 1 entry in Facet 1, Facet 2 is empty. If there is a second keyword in facet 1 (separated from the first keyword by a comma), then both keywords appear in Facet 2 but in reverse order). Normalizing means, a) lowercasing all the words, b) avoiding plurals, c) keeping the facets short – preferably just 1 word.
  • Add a ‘Primary Facet’ column which contains whichever of the two facets is considered to be the dominant one.
  • Add an ‘AI Context’ column which combines the ‘Item Label’ text with the ‘Facet 1’ text in the format [Item Label text]. Facets: [Facet 1 text].
  • Add a ‘Collection Themes’ column which contains 1-3 broader thematic categories than the more specific Facets. For a collection this size there should be between 12 and 20 Themes. These do not currently exist in the Index and would have to be identified and then allocated to each line item. However, it seems that the AI could come up with an initial list of themes by analysing the contenst of the ‘Item Label’ and the ‘Facet’ fields.
  • Add a ‘Theme Cluster’ column – containing a short name representing a group of objects that share a pattern. For a collection this size there should be between 25 and 40 clusters. Again, it seems that the AI could come up with an initial list of clusters by analysing the ‘Item Label’ and ‘Facet’ fields.
  • Add a ‘Cluster Signature’ column which combines the ‘Primary Facet’ and the ‘Collection Theme’ fields in the format [Primary Facet text] | [Collection Theme text].
  • Add a ‘Related concepts’ column which contains 1 -3 broader conceptual ideas associated with the object. For a collection this size there should be 20-30 of these – preferably single words. These do not currently exist in the Index and would have to be identified and allocated. I’m not sure if the AI could help to identify them or not.
  • Add an ‘Outlier score’ column which indicates how unusual an item is within the collection. Possible values could be: 1 Very typical object, 2 Moderately distinctive, 3 Unusual, 4 Very Unusual, 5 Unique or rare in the collection. This information does not currently exist in the database and would have to be specified for each item (though among the fields that have been removed for this AI exercise, ‘Unusual’ items are identified).
  • Add an ‘Object links’ column which lists the Reference Numbers of other objects that are meaningfully related, in the format RefNo, RefNo, RefNo. This information does not currently exist in the Index and would have to be specified for each item – potentially quite a big job.

At this point I decided that, for this first stage in this journey, I would simply stick with the very first suggestion – to create a new column called ‘Item Label’ combining the Reference No and the Description separated by a hyphen. Once I have something working, I can return to these other sophistications.

In the course of this extended exchange, ChatGPT also offered to provide “the exact 40-line Python script that will turn your spreadsheet into a working RAG search system for the 2993 mementos”. I accepted and in the course of the subsequent interchange was offered an easier approach which involves acquiring a desktop RAG tool called AnythingLLM which would run locally and require no programming. The latter sounded exactly what I needed and I set about downloading and installing it.

Preparing the Memento Collection

This is the start of my attempt to undertake Phase1 of my investigation into providing an AI interrogation capability for archives. Phase 1 concerns providing AI support for my Memento collection’s index entries.

The first step was to apply the advice in the recent publication “AI preparedness guidelines for archivists” by Prof. Giovanni Colavizza and Prof. Lise Jaillant. This suggests addressing four main areas (referred to as Pillars). My analysis of the Memento collection’s preparedness relating to each of the four areas is recorded in a Memento Preparedness document and summarised below:

Pillar 1 – Completeness and excluded data. The collection is complete; all items in the Index are to be interrogated by AI in this phase; no items have been excluded.

Pillar 2 – Metadata and access. 14 fields are in columns in the Index and these are available for use for AI interrogation. However, there are many additional columns (containing various analytical data) which are to be removed for this exercise. All information remaining in the Index after the additional columns have been removed, will be available for AI interrogation; no information will be subjected to restricted access in this exercise. Provenance and relationship information is embedded in the Reference No, and sometimes in the Notes field. An extensive range of narrative information about the collection and the Index is contained in a Guide worksheet within the Index spreadsheet.

Pillar 3 – Data types, formats, and file structures. Before making any changes to the Index file an assessment will be made as to whether the change is wanted in the original index or not. If it is not, a copy of the Index will be made. A variety of different file formats are present in the digital files of the collection, but the vast majority are either .pdf, .docx, or .jpg documents. Some standardisation changes may be required in some of the index fields. Folder names distinguish between the different collection components and link back into the overall Collection folder structure. All item digital File Titles contain the relevant Reference Number.

Pillar 4 – Application-specific metrics and evaluation. The ability to find what you are looking for in the Index is the primary requirement of collection users. Another requirement is to find the last Reference Number used overall or in a particular series, in order to specify the appropriate next Reference Number for an item you are adding to the collection. How these and other criteria should be translated into evaluation metrics will be considered through the course of the project.

As a result of the above analysis the following 14 actions were identified (the notation ‘P1.1’ stands for ‘the first action in P1 – Phase 1’; P1.2 is the second action in Phase 1; and so on).

P1.1 Mem(Index). Ensure completeness and normalisation. All 14 fields were checked to eliminate blanks and to normalise content where necessary.

P1.2 Mem(Index). Remove columns O to BX from the file used for this AI work. Columns O to BX were removed.

P1.3 Mem(All). Document all the Provenance and Relationship info embedded within the Index and the File Titles. The Guide was expanded to describe, a) the 14 fields, b) how the digital filename is constructed, and c) how the collection came about (which includes references to posts in the pwofc.com website).

P1.4 Mem(All). Observe how the Provenance and Relationship info is used to create guidelines for producing such documentation. To be revisited during the implementation of Phase 1

P1.5 Mem(Index). Identify any extra narrative info that is available or is needed. None needed.

P1.6 Mem(Index). Produce any extra narrative info that is required. None needed

P1.7 Mem(All). Carry out ‘wanted or not in the original index’’ check before each action. Done

P1.8 Mem(Items). Check what formats exist in the collection files. 16 different file formats are present in the collection – DOC, DOCX, FMP12, HTM, JPG, M4A, MP3, MP4, PDF, PDF-A1-b, PPTX, TIFF, XLSM, XLSX, XLS, ZIP.

P1.9 Mem(Items). Define AI-friendly standard formats: Only the Index to the collection (an XLSX document) is to be used in this phase, and this will be converted into a csv file for the purpose.

P1.10 Mem(Items). Make any changes to existing formats to conform to new standards. An AI friendly file in csv format was derived from the original Index document. Since you can’t create a csv with multiple worksheets, two new files were created: one with the file name ‘Mementos Collection Index for AI Phase 1.csv’, and another with the file name ‘Mementos Collection Guide for AI Phase 1’.

P1.11 Mem(All). Document the folder structure for the derivative file. For all this AI work, a new folder was created into which these derived files, and all other files derived for AI purposes, will be placed: C:\Users\pwils\Documents\AI.

P1.12 Mem(All) – Find out what ‘supports programmatic retrieval’ means in practice. ChatGPT advised that this usually means: querying a vector database, calling a search API, pulling documents from a content repository, and fetching structured data from a database.

P1.13 Mem(All) – Make any changes necessary to support programmatic retrieval. I don’t have enough knowledge yet to understand if any changes are needed to support that process. I will have to revisit this question when I actually start to try to implement the capability.

P1.14 Mem(All) – Prompt for ideas about success metrics as each action is taken in the course of the project. This question will be revisited as work on this phase progresses.

This brought to an end the Preparedness work which took a total of approximately 7 hours. The next step was to try and implement an AI capability: to start this process I asked ChatGPT what should be the first thing I do to create a RAG interrogation capability for the Memento’s Index (RAG stands for Retrieval-Augmented Generation – whereby the AI is not trained on the archive, but instead the archive data is provided to the AI at answer time). What followed is reported in the next post.

A Plan with a rather empty Kitbag

A couple of weeks ago I realised that there was a gap in the research I’ve been doing on Collecting by Individuals – how to apply AI to Personal Collections. After a further two weeks thinking it over, I’ve decided that I need to bite the bullet and to learn by doing it myself on some of my own collections.

I’m embarking on this journey with very little relevant knowledge. However, I do at least have the recent report on AI Preparedness guidelines for Archivists, as well as some exchanges with ChatGPT about what can be done. I’m hoping that these will get me up and running, and that ChatGPT may be able to help me with things I don’t understand. With these tools in my (rather empty) kitbag the notes below outline what I plan to do.

My overall objective is to provide detailed guidelines for individuals who want to apply AI to interrogate their own private collections. This may also involve enhancing the current OFC Tutorial and/or creating a separate tutorial.

The strategy I intend to follow is to conduct the work in a series of phases, going from the simplest possible implementation I can define to progressively more comprehensive and complex implementations. I will use two of my own collections: a collection of Mementos which has an index of some 2390 entries and 2730 digital files; and my PAWDOC collection of work files which has an index of some 17380 entries and around 31,300 digital files.

The phases I currently plan to undertake are as follows (though these may well be rejigged as I gain more experience and knowledge):

  1. AI support for the Memento collection’s index entries;
  2. AI support for the Memento collection’s combined Index entries and file titles;
  3. AI support for the Memento collection’s index entries, file titles and textual items;
  4. AI support for PAWDOC’s index entries;
  5. AI support for PAWDOC’s combined index entries and file titles
  6. AI support for PAWDOC’s combined index entries, file titles, and some or all of the born digital items
  7. AI support for a subset of PAWDOC’s scanned items
  8. AI support for a combination of index entries, file titles, some born digital material and some scanned items
  9. AI support for the whole of PAWDOC

Timescales: With my current lack of knowledge I don’t know how long this is all going to take. However, I shall aim to try and have the first phase completed in not more than 1 year.

Ideally, I would like to find some knowledgeable collaborators who have relevant experience and who would guide me through the work (please do get in touch if you are interested). However, it could be hard to find the right people who have sufficient interest and the time to spare. I shall take some steps to try and find some such individuals, but won’t let that endeavour delay my start on Phase 1. I am reconciled to probably having to do most of the work without any permanent collaborator support.