{"id":2742,"date":"2026-03-07T09:32:20","date_gmt":"2026-03-07T09:32:20","guid":{"rendered":"https:\/\/www.pwofc.com\/ofc\/?p=2742"},"modified":"2026-03-07T10:49:03","modified_gmt":"2026-03-07T10:49:03","slug":"index-adjustments","status":"publish","type":"post","link":"https:\/\/www.pwofc.com\/ofc\/2026\/03\/07\/index-adjustments\/","title":{"rendered":"Index Adjustments for AI"},"content":{"rendered":"<p>Having completed the Preparedness steps, I asked ChatGPT the following question:<\/p>\n<p>\u201cI have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on the Reference No and Description information. The Index file is named \u2018Memento Collection Index for AI\u2019 and it is located in my laptop at C:\\Users\\pwils\\Documents\\AI. The first two rows of the Index file contain descriptive information about the file and can be ignored. The 3rd row contains the headers for each of the Index fields. There are fourteen fields in all with the first two titled \u2018Reference No\u2019 and \u2018Description\u2019. What\u2019s the first thing I should do to create the RAG interrogation capability?\u201d<\/p>\n<p>ChatGPT responded with advice to remove the first two rows in the spreadsheet, and to convert it to a csv file. In subsequent exchanges, ChatGPT suggested the following changes and additions to the csv file which would enable the AI to provide more insightful answers:<\/p>\n<ul>\n<li><strong>Create a new column called \u2018Item Label\u2019<\/strong> which combines the Reference No and the Description separated by a hyphen <a href=\"https:\/\/www.pwofc.com\/ofc\/wp-content\/uploads\/2026\/03\/2026-03-06-ChatGPT-re-creating-a-new-column-called-\u2018Item-Label.docx\">(see the relevant ChatGPT conversation)<\/a>.<\/li>\n<li><strong>Normalize the two Facet fields<\/strong> (the index has a Facet 1 and a Facet 2 field. If there is only 1 entry in Facet 1, Facet 2 is empty. If there is a second keyword in facet 1 (separated from the first keyword by a comma), then both keywords appear in Facet 2 but in reverse order). Normalizing means, a) lowercasing all the words, b) avoiding plurals, c) keeping the facets short &#8211; preferably just 1 word.<\/li>\n<li><strong>Add a \u2018Primary Facet\u2019 column which<\/strong> contains whichever of the two facets is considered to be the dominant one.<\/li>\n<li><strong>Add an \u2018AI Context\u2019 column which<\/strong> combines the \u2018Item Label\u2019 text with the \u2018Facet 1\u2019 text in the format [Item Label text]. Facets: [Facet 1 text].<\/li>\n<li><strong>Add a \u2018Collection Themes\u2019 column<\/strong> which contains 1-3 broader thematic categories than the more specific Facets. For a collection this size there should be between 12 and 20 Themes. These do not currently exist in the Index and would have to be identified and then allocated to each line item. However, it seems that the AI could come up with an initial list of themes by analysing the contenst of the &#8216;Item Label&#8217; and the &#8216;Facet&#8217; fields.<\/li>\n<li><strong>Add a<\/strong> <strong>\u2018Theme Cluster\u2019<\/strong> <strong>column<\/strong> \u2013 containing a short name representing a group of objects that share a pattern. For a collection this size there should be between 25 and 40 clusters. Again, it seems that the AI could come up with an initial list of clusters by analysing the &#8216;Item Label&#8217; and &#8216;Facet&#8217; fields.<\/li>\n<li><strong>Add a \u2018Cluster Signature\u2019 column<\/strong> which combines the &#8216;Primary Facet&#8217; and the &#8216;Collection Theme&#8217; fields in the format [Primary Facet text] | [Collection Theme text].<\/li>\n<li><strong>Add a \u2018Related concepts\u2019 column<\/strong> which contains 1 -3 broader conceptual ideas associated with the object. For a collection this size there should be 20-30 of these \u2013 preferably single words. These do not currently exist in the Index and would have to be identified and allocated. I\u2019m not sure if the AI could help to identify them or not.<\/li>\n<li><strong>Add an \u2018Outlier score\u2019 column<\/strong> which indicates how unusual an item is within the collection. Possible values could be: 1 Very typical object, 2 Moderately distinctive, 3 Unusual, 4 Very Unusual, 5 Unique or rare in the collection. This information does not currently exist in the database and would have to be specified for each item (though among the fields that have been removed for this AI exercise, \u2018Unusual\u2019 items are identified).<\/li>\n<li><strong>Add an \u2018Object links\u2019 column<\/strong> which lists the Reference Numbers of other objects that are meaningfully related, in the format RefNo, RefNo, RefNo. This information does not currently exist in the Index and would have to be specified for each item \u2013 potentially quite a big job.<\/li>\n<\/ul>\n<p>At this point I decided that, for this first stage in this journey, I would simply stick with the very first suggestion \u2013 to create a new column called \u2018Item Label\u2019 combining the Reference No and the Description separated by a hyphen. Once I have something working, I can return to these other sophistications.<\/p>\n<p>In the course of this extended exchange, ChatGPT also offered to provide \u201cthe exact 40-line Python script that will turn your spreadsheet into a working RAG search system for the 2993 mementos\u201d. I accepted and in the course of the subsequent interchange was offered an easier approach which involves acquiring a desktop RAG tool called AnythingLLM which would run locally and require no programming. The latter sounded exactly what I needed and I set about downloading and installing it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Having completed the Preparedness steps, I asked ChatGPT the following question: \u201cI have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on &hellip; <a href=\"https:\/\/www.pwofc.com\/ofc\/2026\/03\/07\/index-adjustments\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[40],"tags":[],"class_list":["post-2742","post","type-post","status-publish","format-standard","hentry","category-ai-for-personal-archives"],"_links":{"self":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2742","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/comments?post=2742"}],"version-history":[{"count":3,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2742\/revisions"}],"predecessor-version":[{"id":2747,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/2742\/revisions\/2747"}],"wp:attachment":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/media?parent=2742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/categories?post=2742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/tags?post=2742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}