{"id":2786,"date":"2026-04-15T20:59:34","date_gmt":"2026-04-15T19:59:34","guid":{"rendered":"https:\/\/www.pwofc.com\/ofc\/?p=2786"},"modified":"2026-04-16T06:59:42","modified_gmt":"2026-04-16T05:59:42","slug":"enter-copilot-and-chatgpt","status":"publish","type":"post","link":"https:\/\/www.pwofc.com\/ofc\/2026\/04\/15\/enter-copilot-and-chatgpt\/","title":{"rendered":"Enter Copilot and ChatGPT"},"content":{"rendered":"<p>In the <a href=\"https:\/\/www.pwofc.com\/ofc\/2026\/04\/10\/embedding-parameters-and-new-evaluation-questions\/\">previous post<\/a> I explained why I changed the evaluation criteria I was using to the following:<\/p>\n<ul>\n<li>What are the main themes that run through the entire index?<\/li>\n<li>Are there distinct phases or periods in the collection?<\/li>\n<li>Which items suggest important life events or transitions?<\/li>\n<li>What patterns or motifs repeat across the collection?<\/li>\n<li>What are the top 5 most important entries, and why?<\/li>\n<\/ul>\n<p>I tried using them with a version of the Index which had extraneous fields such as \u2018Physical Location\u2019 and \u2018No of digital files\u2019 removed leaving just Ref No, Description, Item Label, Set, Facets, AI Context, and Year all in a single column. This produced a result of 4.8 out of 10 using the Mistral model and 5 out of 10 for Mixtral \u2013 and I proved to myself that I was able to apply the new evaluation criteria, albeit with rather more subjective scoring.<\/p>\n<p>At this point it dawned on me that it might be worth trying to use the MS Copilot AI provided as an integral part of my Windows operating system. After making some initial enquiries through the net about its possible use with Excel, I got the impression it could only be used with MS Office 365 which I do not have (I have the home version of Office); but Copilot itself set me straight explaining that the 365 version enabled support for the functionality within the Office applications, whereas the free-to-use version of Copilot simply ingests documents up to the cloud where it works out the answer to the question it has been asked. I tried it out using a very simple version of the Mementos index with just the Ref No, Description, Set, Facets, and Year fields in their own columns, and was excited by the result which I described as follows:<\/p>\n<p><em>\u201cThis is a strong result (8 out of 10): comprehensive answers with just a few poor interpretations, but with no obvious hallucinations. Importantly, Copilot was able to deal with the whole attached Index in one go which delivers far better answers than the RAG approach which can only work on a pre-selected subset of the material. The response time (average 4.4 seconds) was very quick considering that the whole document had to be sent to, and analysed in, the cloud before delivering its answer.\u201d<\/em><\/p>\n<p>I then tried to compare the AnythingLLM and the Copilot systems by using yet another version of the Mementos index based on suggestions from ChatGPT. It still contained just Ref No, Description, Set, Facets, and Year with the all the fields in a single cell, but this time with the Description replaced by keywords derived automatically using an Excel formula supplied by ChatGPT. The results were very clear: Mistral scored 3.1, Mixtral 4.8, and Copilot 8.6. For the Copilot Test I wasn\u2019t sure if the size of the file would cause a problem so I split it into three files of between 200kb and 300kb each, and these seemed to have been ingested successfully. However, I subsequently discovered that the files had been truncated so that only about the first 120 lines of each were ingested &#8211; making about 360 entries in all out of the total 2393 (I guess Copilot must have truncated the file I used in the previous test as well). Given this, Copilot\u2019s 8.6 score was even more impressive.<\/p>\n<p>Finding that Copilot had truncated the files prompted me to do some digging and experimentation to find out just what its limits are. I established that it will ingest up to 20 whole documents of up to about 30,000 characters\/30kb file size each in a single turn, and will work across all those documents to construct its answer. Furthermore, more batches of 20 can be submitted in subsequent turns up to a total of about 150 after which \u2018the conversation becomes unwieldy\u2019 (I\u2019m not sure what is meant by this). I duly split the Mementos Index into 17 files, all between 20,000 \u2013 30,000 characters and conducted the test again, this time including a version of the Guide document. This produced an even better score of 9.1.<\/p>\n<p>Discovering Copilot&#8217;s capabilities made me wonder what ChatGPT could do. I discovered that the free version which I was using allows you to upload a maximum of 3 files of up to about 512Mb in any one day. However, to be sure that all the contents of all the files will be taken into account in answering a question, the total of the 3 files need to be a maximum of about 300k characters with 250k being a safe limit. I put ChatGPT to the test with the same index used in the first Copilot test but broken into three files of between 111k and 130k characters each. This produced a result of 5.9 which was probably expected given that the file size limits had been exceeded.<\/p>\n<p>As I was exploring the file size limits for Copilot and ChatGPT, it became clear that the ultimate determinant of the amount of text that can be reliably reasoned about at once is the AI model\u2019s maximum \u2018Context Window\u2019. This is effectively the model\u2019s working memory which contains both the input prompt and the answer. If the inputs exceed the Context Window then some content may be simply left out and the answer may be less complete. The table below summarises the Context Window limits for the different AI models I\u2019ve been using.<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"102\"><strong>AI System<\/strong><\/td>\n<td width=\"151\"><strong>Model<\/strong><\/td>\n<td width=\"363\"><strong>Maximum Context Window<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"102\">AnythingLLM<\/td>\n<td width=\"151\">Mistral 7B Instruct (32k context), Q4_K_M quantization<\/td>\n<td width=\"363\">32k tokens ( ~65k\u2013100k characters in csv files) (assuming 2-3 characters\/token)<\/td>\n<\/tr>\n<tr>\n<td width=\"102\">AnythingLLM<\/td>\n<td width=\"151\">Mixtral 8x7B Instruct (32k context), Q4_0 quantization<\/td>\n<td width=\"363\">32k tokens ( ~65k\u2013100k characters in csv files) (assuming 2-3 characters\/token)<\/td>\n<\/tr>\n<tr>\n<td width=\"102\">Copilot<\/td>\n<td width=\"151\">MS LLM (Microsoft does not publish the names of its models<\/td>\n<td width=\"363\">The MS LLM doesn\u2019t have a single fixed Context Window; it\u2019s designed around task\u2011adaptive context management, and hence the effective context it can use is much larger and more flexible than a single token number would suggest.<\/td>\n<\/tr>\n<tr>\n<td width=\"102\">ChatGPT (free version)<\/td>\n<td width=\"151\">GPT-5.3<\/td>\n<td width=\"363\">The maximum Context Window is 128k tokens, but because this includes system instructions. conversation history, and output tokens, the usable Context Window is about 80k \u2013 100k tokens (160k \u2013 300k\u00a0 characters in csv files) (assuming 2-3 characters\/token)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><strong>\u00a0<\/strong>If one wishes Index files to be considered in full by an AI system, the primary requirement is to ensure that the whole set fits into the Context Window. However, when it comes to assembling and submitting those files there are additional considerations to be taken into account, as summarised below.<\/p>\n<table>\n<tbody>\n<tr>\n<td width=\"93\"><strong>AI system<\/strong><\/td>\n<td width=\"73\"><strong>Approach<\/strong><\/td>\n<td width=\"172\"><strong>Max total file size<\/strong><\/td>\n<td width=\"132\"><strong>Max No of files<\/strong><\/td>\n<td width=\"146\"><strong>Max size of each file<\/strong><\/td>\n<\/tr>\n<tr>\n<td width=\"93\">AnythingLLM<\/td>\n<td width=\"73\">Embed (RAG)<\/td>\n<td width=\"172\">No limit. The limit is the size of the Chunks the files are divided into. The default is set at 256 tokens (800-1000 characters).<\/td>\n<td width=\"132\">No limit. Though more than 50K Chunks may cause retrieval problems.<\/td>\n<td width=\"146\">No limit.<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">AnythingLLM<\/td>\n<td width=\"73\">Attach function<\/td>\n<td width=\"172\">Default is set to 30k-50k characters for csv files. Can be adjusted within AnythingLLM settings.<\/td>\n<td width=\"132\">No Limit<\/td>\n<td width=\"146\">No limit.<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">Copilot<\/td>\n<td width=\"73\">Attach function<\/td>\n<td width=\"172\">No absolute limit, but above about 150 files (4,500k characters) the conversation becomes unwieldy.<\/td>\n<td width=\"132\">About 7 batches of up to 20 files in a single prompt.<\/td>\n<td width=\"146\">No absolute limit, but for csv files 10,000 rows or approximately 30k characters, should work fine.<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">ChatGPT (free version)<\/td>\n<td width=\"73\">Attach function<\/td>\n<td width=\"172\">To be confident that the model will read everything and not overlook anything the total should be kept to about 250k characters.<\/td>\n<td width=\"132\">3 within any one day. This limit can be circumvented by putting multiple files in a zip file, or by putting the contents of a file into the prompt with the question.<\/td>\n<td width=\"146\">512Mb is the absolute limit but to be sure all the contents will be acted on, keep it to 200k-300k characters. If 3 files are to be used, keep their total to 200k-300k characters.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>With these constraints in mind, I conducted a final set of tests to compare the three AI Systems \u2013 AnythingLLM\u2019s Rag approach, Copilot, and ChatGPT. I assembled a cut-down version of the Index (by removing the MW set) which was small enough (around 274k characters, 270kb file size) to fit within the limits of all three systems. \u00a0As can be seen in the table below, there was a clear winner.<\/p>\n<table width=\"621\">\n<thead>\n<tr>\n<td width=\"93\"><strong>System<\/strong><\/td>\n<td width=\"77\"><strong>Average Evaluation Score out of 10<\/strong><\/td>\n<td width=\"348\"><strong>Performance Summary<\/strong><\/td>\n<td width=\"104\"><strong>Average time taken to start responding (seconds)<\/strong><\/td>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td width=\"93\">AnythingLLM (Mistral)<\/td>\n<td width=\"77\">4.4<\/td>\n<td width=\"348\">Overall, this was a disappointing result. The answers were very sparse with little rationale or summation. Some of the answers given were of dubious worth.<\/td>\n<td width=\"104\">6.6<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">AnythingLLM (Mixtral)<\/td>\n<td width=\"77\">3.6<\/td>\n<td width=\"348\">This score was a little bit worse than Mistral&#8217;s result despite it taking a lot longer to start printing out its answers. Two of the five answers seemed to mainly regurgitate the different sets that were described in the Index; and some of the answers were of rather dubius relevance. However, of most concern is that there were at least 4 instances of hallucinations &#8211; in two cases, Reference Numbers which don&#8217;t exist were cited.<\/td>\n<td width=\"104\">36.4<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">ChatGPT (GPT-5.3)<\/td>\n<td width=\"77\">6.3<\/td>\n<td width=\"348\">All the responses were comprehensive with extensive rationale and good summation. However, the content wasn&#8217;t always appropriate. It wasn&#8217;t necessarily wrong but was sometimes just a little dubious. Furthermore, there was rather too much emphasis on the way the collection was organised rather than on its contents. On two occasions, examples of individual entries were specifically asked for but generalisations were delivered.<\/td>\n<td width=\"104\">6<\/td>\n<\/tr>\n<tr>\n<td width=\"93\">Copilot (MS LLM)<\/td>\n<td width=\"77\">8.7<\/td>\n<td width=\"348\">Three of these five answers were exceptionally good, and all the responses were well illustrated with rationale and examples. The summaries at the end were well constructed and useful. I only spotted one error across the five answers, though there were three or four things which stood out as having been omitted.<\/td>\n<td width=\"104\">16.4<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Copilot was a clear winner with ChatGPT following on behind. Both provided substantial answers which included rationale, several examples, and a summary. In contrast the AnythingLLM RAG answers were sparse, sometimes not very good, and occassionally included complete hallucinations. The RAG approach just doesn\u2019t seem very effective for Index material.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post I explained why I changed the evaluation criteria I was using to the following: What are the main themes that run through the entire index? Are there distinct phases or periods in the collection? Which items &hellip; <a href=\"https:\/\/www.pwofc.com\/ofc\/2026\/04\/15\/enter-copilot-and-chatgpt\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40],"tags":[],"class_list":["post-2786","post","type-post","status-publish","format-standard","hentry","category-ai-for-personal-archives"],"_links":{"self":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2786","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/comments?post=2786"}],"version-history":[{"count":18,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2786\/revisions"}],"predecessor-version":[{"id":2804,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2786\/revisions\/2804"}],"wp:attachment":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/media?parent=2786"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/categories?post=2786"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/tags?post=2786"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}