Installing the AI Software

The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (https://anythingllm.com/) and selected the ‘Download for desktop’ box. It took about 13 minutes to download the 370Mb programme and install it. On opening the application, I was told that a) it had selected the best model (Qwen3Vision2BInstruct) for my hardware (a 9-year old Windows 11 laptop with 8Mb of RAM); b) that I was to use the LanceDB Vector Database; c) that these settings could be modified anytime; and d) that the model, chats, vectors and document text would all be stored privately on that instance of AnythingLLM and would only be accessible on that device.

I uploaded my mementos CSV file, and then got a Warning Message saying something like ‘the workspace is using 102,OOO of its 3,500  available tokens. Choose how you want to proceed – Continue anyway or Embed file’. The associated help page says:

“Continue Anyway: Will continue to add the document full text to the chat window, but data will be lost in this process as AnythingLLM will automatically prune the context to fit. You should not do this as you will experience inaccurate LLM behaviour.

Embed: Will embed the document (RAG) and add it to the workspace. This will allow the LLM to use the document as a source of information, but it will not be able to use the full text of the document. This option may or may not be visible depending on your permissions on the workspace.”

I selected Embed and that process took about 5 minutes.

I then asked some questions such as ‘Show me items relating to xxx” but consistently got the reply: ‘Could not respond to message. Model requires more system memory (8.7 GiB) than is available (5.4 GiB)’. I asked ChatGPT how much memory AnythingLLM needed to run and it said that the model that had been selected wasn’t suitable for a machine with 8Gb of RAM. Instead, it said I should use the Ollama phi3:mini model and advised how to obtain it. However, that didn’t work, so ChatGPT said that meant that Ollama wasn’t on my machine and that I needed to download and install that first, and provided me a website link to do so.

I installed Ollama (which included installing a redistributable version of Visual C) and restarted my laptop as instructed by ChatGPT. Then I installed phi3:mini by typing ‘ollama pull phi3:mini’ at the Command Line prompt as instructed by ChatGPT. Then I had to select the Ollama LLM in AnythingLLL by going into the Workspace settings (the little rose icon) and selecting Ollama. While in that Ollama section of the drop-down there was another settings rose icon which had to be clicked to access the Ollama-specific settings screen in which ChatGPT had advised me to place ‘http://localhost:11434’ in the ‘Ollama Base URL’ field.

At this point I noted that ‘phi3:mini’ was correctly displayed in the ‘Ollama Model’ field. Having done all this I was able to select the Mementos CSV document in Anything LLM and have it embedded; after which I was able to ask some questions and to get some answers.

Now, what was going on in all of this? This is what I discovered after having a few exchanges with ChatGPT:

The software that is needed for AI has three layers:

  1. The model (Phi-3 Mini) → the “brain” that generates text.
  2. Ollama → the engine that runs the model locally.
  3. AnythingLLM → the interface and workflow tool you interact with.

The Model (Phi-3 Mini) is the AI brain – the trained neural network that produces answers by:

  • Predicting the next token in text
  • Generating responses to prompts
  • Using knowledge learned during training.

Ollama is the system that runs the AI model on the computer. It does the following:

  • Loads the model
  • Sends your prompt to it
  • Streams the response back
  • Performs other functions such as loading models into RAM and providing an API server for other applications.

AnythingLLM is the user interface and AI workflow platform which does the following:

  • Connects to Ollama
  • Sends prompts to the model
  • Displays responses
  • Manages workspaces
  • Embeds knowledge sources (RAG)
  • Keeps chat history
  • Handles embeddings and document search.

This architecture is flexible: it enables different products to be switched into any of the three components while keeping the other two the same.

AnythingLLM embeds knowledge sources by retrieving information in external documents, encoding that information and placing it into a Vector Database (in this case, LanceDB). The steps it takes to do this are:

  • It turns text into numbers. This is known as Embedding in which text is converted into numerical vectors that represent meaning. For example , “The cat sat on the mat” becomes something like: [0.213, -0.551, 0.889, …]
  • Storing the embeddings in the Vector Database along with Chunks of text and References to the original documents. For example, Vector: [0.213, -0.551, 0.889 …]; Text: “The mitochondria is the powerhouse of the cell.”; Source: biology_notes.pdf.

When you ask a question, AnythingLLM finds the closest matching vectors. So, overall, the RAG (Retrieval-Augmented Generation) process works as follows:

Step 1 — Document chunking. When you upload a document to AnythingLLM, the document is split into small sections (PDF → paragraphs → chunks).

Step 2 — Embedding creation. Each chunk is converted to a vector.

Step 3 — Storage. The vectors are stored in the vector database.

Step 4 — Question time. When you ask a question such as “What causes tides?”, AnythingLLM:

  • converts the question into an embedding
  • searches the vector database
  • retrieves the most similar chunks.

Step 5 — Context injection. The retrieved chunks are added to the question and the combined prompt is sent to Phi-3 Mini.

Step 6 — AI generates answer. Now the model answers using your documents, not just training data; and the answer is shown in shown in AnythingLLM.

Unfortunately, even after I had successfully installed a working configuration, the system occasionally could not respond, and the results were often incomplete or incorrect. In ChatGPT’s opinion these problems are most likely being caused by:

  • The model temporarily exhausting RAM: I only have 8Gb of RAM and the AI components probably take up between 5 and 6GB (AnythingLLM app ~500 MB, Vector database + embeddings ~200–400 MB, Phi-3 Mini loaded in Ollama ~3–4 GB, Prompt + generation buffers ~0.5–1 GB). Adding the 2-3 GB taken up by the Operating System, means that every now and again I’m probably hitting a 7-8 GB total resulting in the Operating System occasionally swapping memory to disk, Ollama pausing or timing out, and AnythingLLM reporting “Ollama not responding”.
  • Poor data structure in the CSV file: RAG systems like AnythingLLM perform best with short natural-language passages, not table rows. When CSVs are embedded directly, column relationships are lost, retrieval becomes noisy, and the model guesses incorrectly. Hence ChatGPT’ suggestions in the previous post for how to refine the contents of the CSV file.
  • A limitation in the Phi-3 Mini model’s capability. Phi-3 Mini is optimized for low-memory environments, while larger models typically provide higher completeness and accuracy.

Still, I do at least now have a working system which I can experiment with – even if I have to occasionally put up with “Ollama not responding”. The following post documents how well or otherwise this initial configuration performed.

Leave a Reply

Your email address will not be published. Required fields are marked *