{"id":2750,"date":"2026-03-10T13:12:07","date_gmt":"2026-03-10T13:12:07","guid":{"rendered":"https:\/\/www.pwofc.com\/ofc\/?p=2750"},"modified":"2026-03-11T17:05:29","modified_gmt":"2026-03-11T17:05:29","slug":"installing-the-ai-software","status":"publish","type":"post","link":"https:\/\/www.pwofc.com\/ofc\/2026\/03\/10\/installing-the-ai-software\/","title":{"rendered":"Installing the AI Software"},"content":{"rendered":"<p>The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (<a href=\"https:\/\/anythingllm.com\/\">https:\/\/anythingllm.com\/<\/a>) and selected the \u2018Download for desktop\u2019 box. It took about 13 minutes to download the 370Mb programme and install it. On opening the application, I was told that a) it had selected the best model (Qwen3Vision2BInstruct) for my hardware (a 9-year old Windows 11 laptop with 8Mb of RAM); b) that I was to use the LanceDB Vector Database; c) that these settings could be modified anytime; and d) that the model, chats, vectors and document text would all be stored privately on that instance of AnythingLLM and would only be accessible on that device.<\/p>\n<p>I uploaded my mementos CSV file, and then got a Warning Message saying something like \u2018<em>the workspace is using 102,OOO of its 3,500 \u00a0available tokens. Choose how you want to proceed \u2013 Continue anyway or Embed file\u2019<\/em>. The associated help page says:<\/p>\n<p><strong><em>\u201cContinue Anyway<\/em><\/strong><em>: Will continue to add the document full text to the chat window, but data will be lost in this process as AnythingLLM will automatically prune the context to fit. You should not do this as you will experience inaccurate LLM behaviour.<\/em><\/p>\n<p><strong><em>Embed<\/em><\/strong><em>: Will embed the document (RAG) and add it to the workspace. This will allow the LLM to use the document as a source of information, but it will not be able to use the full text of the document. This option may or may not be visible depending on your permissions on the workspace.\u201d<\/em><\/p>\n<p>I selected <em>Embed<\/em> and that process took about 5 minutes.<\/p>\n<p>I then asked some questions such as \u2018Show me items relating to xxx\u201d but consistently got the reply: <em>\u2018Could not respond to message. Model requires more system memory (8.7 GiB) than is available (5.4 GiB)\u2019<\/em>. I asked ChatGPT how much memory AnythingLLM needed to run and it said that the model that had been selected wasn\u2019t suitable for a machine with 8Gb of RAM. Instead, it said I should use the Ollama phi3:mini model and advised how to obtain it. However, that didn\u2019t work, so ChatGPT said that meant that Ollama wasn\u2019t on my machine and that I needed to download and install that first, and provided me a website link to do so.<\/p>\n<p>I installed Ollama (which included installing a redistributable version of Visual C) and restarted my laptop as instructed by ChatGPT. Then I installed phi3:mini by typing \u2018ollama pull phi3:mini\u2019 at the Command Line prompt as instructed by ChatGPT. Then I had to select the Ollama LLM in AnythingLLL by going into the Workspace settings (the little rose icon) and selecting Ollama. While in that Ollama section of the drop-down there was another settings rose icon which had to be clicked to access the Ollama-specific settings screen in which ChatGPT had advised me to place <em>\u2018http:\/\/localhost:11434\u2019<\/em> in the \u2018Ollama Base URL\u2019 field.<\/p>\n<p>At this point I noted that \u2018<em>phi3:mini\u2019<\/em> was correctly displayed in the \u2018Ollama Model\u2019 field. Having done all this I was able to select the Mementos CSV document in Anything LLM and have it embedded; after which I was able to ask some questions and to get some answers.<\/p>\n<p>Now, what was going on in all of this? This is what I discovered after having a few exchanges with ChatGPT:<\/p>\n<p>The software that is needed for AI has three layers:<\/p>\n<ol>\n<li><strong>The model (Phi-3 Mini)<\/strong> \u2192 the \u201cbrain\u201d that generates text.<\/li>\n<li><strong>Ollama<\/strong> \u2192 the engine that runs the model locally.<\/li>\n<li><strong>AnythingLLM<\/strong> \u2192 the interface and workflow tool you interact with.<\/li>\n<\/ol>\n<p><strong>The Model (Phi-3 Mini)<\/strong> is the AI brain &#8211; the trained neural network that produces answers by:<\/p>\n<ul>\n<li>Predicting the next token in text<\/li>\n<li>Generating responses to prompts<\/li>\n<li>Using knowledge learned during training.<\/li>\n<\/ul>\n<p><strong>Ollama <\/strong>is the system that runs the AI model on the computer. It does the following:<\/p>\n<ul>\n<li>Loads the model<\/li>\n<li>Sends your prompt to it<\/li>\n<li>Streams the response back<\/li>\n<li>Performs other functions such as loading models into RAM and providing an API server for other applications.<\/li>\n<\/ul>\n<p><strong>AnythingLLM <\/strong>is the user interface and AI workflow platform which does the following:<\/p>\n<ul>\n<li>Connects to Ollama<\/li>\n<li>Sends prompts to the model<\/li>\n<li>Displays responses<\/li>\n<li>Manages workspaces<\/li>\n<li>Embeds knowledge sources (RAG)<\/li>\n<li>Keeps chat history<\/li>\n<li>Handles embeddings and document search.<\/li>\n<\/ul>\n<p>This architecture is flexible: it enables different products to be switched into any of the three components while keeping the other two the same.<\/p>\n<p><strong>AnythingLLM <\/strong>embeds knowledge sources by retrieving information in external documents, encoding that information and placing it into a Vector Database (in this case, LanceDB). The steps it takes to do this are:<\/p>\n<ul>\n<li><strong>It turns text into numbers<\/strong>. This is known as Embedding in which text is converted into numerical vectors that represent meaning. For example , &#8220;The cat sat on the mat&#8221; becomes something like: [0.213, -0.551, 0.889, &#8230;]<\/li>\n<li><strong>Storing the embeddings<\/strong> in the Vector Database along with Chunks of text and References to the original documents. For example, <em>Vector<\/em>: [0.213, -0.551, 0.889 &#8230;]; <em>Text<\/em>: &#8220;The mitochondria is the powerhouse of the cell.&#8221;; <em>Source<\/em>: biology_notes.pdf.<\/li>\n<\/ul>\n<p>When you ask a question, AnythingLLM finds the closest matching vectors. So, overall, the RAG (Retrieval-Augmented Generation) process works as follows:<\/p>\n<p><strong>Step 1 \u2014 Document chunking<\/strong>. When you upload a document to AnythingLLM, the document is split into small sections (PDF \u2192 paragraphs \u2192 chunks).<\/p>\n<p><strong>Step 2 \u2014 Embedding creation. <\/strong>Each chunk is converted to a vector.<\/p>\n<p><strong>Step 3 \u2014 Storage. <\/strong>The vectors are stored in the vector database.<\/p>\n<p><strong>Step 4 \u2014 Question time. <\/strong>When you ask a question such as &#8220;What causes tides?&#8221;<strong>, <\/strong>AnythingLLM:<\/p>\n<ul>\n<li>converts the question into an embedding<\/li>\n<li>searches the vector database<\/li>\n<li>retrieves the most similar chunks.<\/li>\n<\/ul>\n<p><strong>Step 5 \u2014 Context injection. <\/strong>The retrieved chunks are added to the question and the combined prompt is sent to <strong>Phi-3 Mini.<\/strong><\/p>\n<p><strong>Step 6 \u2014 AI generates answer. <\/strong>Now the model answers using your documents, not just training data; and the answer is shown in shown in AnythingLLM.<\/p>\n<p>Unfortunately, even after I had successfully installed a working configuration, the system occasionally could not respond, and the results were often incomplete or incorrect. In ChatGPT\u2019s opinion these problems are most likely being caused by:<\/p>\n<ul>\n<li><strong>The model temporarily exhausting RAM: <\/strong>I only have 8Gb of RAM and the AI components probably take up between 5 and 6GB (AnythingLLM app ~500 MB, Vector database + embeddings ~200\u2013400 MB, Phi-3 Mini loaded in Ollama ~3\u20134 GB, Prompt + generation buffers ~0.5\u20131 GB). Adding the 2-3 GB taken up by the Operating System, means that every now and again I\u2019m probably hitting a 7-8 GB total resulting in the Operating System occasionally swapping memory to disk, Ollama <strong>pausing or timing out<\/strong>, and AnythingLLM reporting <em>\u201cOllama not responding\u201d.<\/em><\/li>\n<li><strong>Poor data structure in the CSV file:<\/strong> RAG systems like AnythingLLM perform best with short natural-language passages, not table rows. When CSVs are embedded directly, column relationships are lost, retrieval becomes noisy, and the model guesses incorrectly. Hence ChatGPT\u2019 suggestions in the previous post for how to refine the contents of the CSV file.<\/li>\n<li><strong>A limitation in the Phi-3 Mini model\u2019s capability.<\/strong> Phi-3 Mini is optimized for low-memory environments, while larger models typically provide higher completeness and accuracy.<\/li>\n<\/ul>\n<p>Still, I do at least now have a working system which I can experiment with \u2013 even if I have to occasionally put up with <em>\u201cOllama not responding\u201d<\/em>. The following post documents how well or otherwise this initial configuration performed.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The software that ChatGPT had advised me to install was called AnythingLLM (LLM standing for, of course, Large Language Model). I duly opened its website (https:\/\/anythingllm.com\/) and selected the \u2018Download for desktop\u2019 box. It took about 13 minutes to download &hellip; <a href=\"https:\/\/www.pwofc.com\/ofc\/2026\/03\/10\/installing-the-ai-software\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40],"tags":[],"class_list":["post-2750","post","type-post","status-publish","format-standard","hentry","category-ai-for-personal-archives"],"_links":{"self":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2750","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/comments?post=2750"}],"version-history":[{"count":4,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2750\/revisions"}],"predecessor-version":[{"id":2758,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2750\/revisions\/2758"}],"wp:attachment":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/media?parent=2750"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/categories?post=2750"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/tags?post=2750"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}