Index Adjustments for AI

Having completed the Preparedness steps, I asked ChatGPT the following question:

“I have a collection of 2993 mementos which has an Index containing a Reference No and Description for each item. I want to create a RAG interrogation capability on the Reference No and Description information. The Index file is named ‘Memento Collection Index for AI’ and it is located in my laptop at C:\Users\pwils\Documents\AI. The first two rows of the Index file contain descriptive information about the file and can be ignored. The 3rd row contains the headers for each of the Index fields. There are fourteen fields in all with the first two titled ‘Reference No’ and ‘Description’. What’s the first thing I should do to create the RAG interrogation capability?”

ChatGPT responded with advice to remove the first two rows in the spreadsheet, and to convert it to a csv file. In subsequent exchanges, ChatGPT suggested the following changes and additions to the csv file which would enable the AI to provide more insightful answers:

  • Create a new column called ‘Item Label’ which combines the Reference No and the Description separated by a hyphen (see the relevant ChatGPT conversation).
  • Normalize the two Facet fields (the index has a Facet 1 and a Facet 2 field. If there is only 1 entry in Facet 1, Facet 2 is empty. If there is a second keyword in facet 1 (separated from the first keyword by a comma), then both keywords appear in Facet 2 but in reverse order). Normalizing means, a) lowercasing all the words, b) avoiding plurals, c) keeping the facets short – preferably just 1 word.
  • Add a ‘Primary Facet’ column which contains whichever of the two facets is considered to be the dominant one.
  • Add an ‘AI Context’ column which combines the ‘Item Label’ text with the ‘Facet 1’ text in the format [Item Label text]. Facets: [Facet 1 text].
  • Add a ‘Collection Themes’ column which contains 1-3 broader thematic categories than the more specific Facets. For a collection this size there should be between 12 and 20 Themes. These do not currently exist in the Index and would have to be identified and then allocated to each line item. However, it seems that the AI could come up with an initial list of themes by analysing the contenst of the ‘Item Label’ and the ‘Facet’ fields.
  • Add a ‘Theme Cluster’ column – containing a short name representing a group of objects that share a pattern. For a collection this size there should be between 25 and 40 clusters. Again, it seems that the AI could come up with an initial list of clusters by analysing the ‘Item Label’ and ‘Facet’ fields.
  • Add a ‘Cluster Signature’ column which combines the ‘Primary Facet’ and the ‘Collection Theme’ fields in the format [Primary Facet text] | [Collection Theme text].
  • Add a ‘Related concepts’ column which contains 1 -3 broader conceptual ideas associated with the object. For a collection this size there should be 20-30 of these – preferably single words. These do not currently exist in the Index and would have to be identified and allocated. I’m not sure if the AI could help to identify them or not.
  • Add an ‘Outlier score’ column which indicates how unusual an item is within the collection. Possible values could be: 1 Very typical object, 2 Moderately distinctive, 3 Unusual, 4 Very Unusual, 5 Unique or rare in the collection. This information does not currently exist in the database and would have to be specified for each item (though among the fields that have been removed for this AI exercise, ‘Unusual’ items are identified).
  • Add an ‘Object links’ column which lists the Reference Numbers of other objects that are meaningfully related, in the format RefNo, RefNo, RefNo. This information does not currently exist in the Index and would have to be specified for each item – potentially quite a big job.

At this point I decided that, for this first stage in this journey, I would simply stick with the very first suggestion – to create a new column called ‘Item Label’ combining the Reference No and the Description separated by a hyphen. Once I have something working, I can return to these other sophistications.

In the course of this extended exchange, ChatGPT also offered to provide “the exact 40-line Python script that will turn your spreadsheet into a working RAG search system for the 2993 mementos”. I accepted and in the course of the subsequent interchange was offered an easier approach which involves acquiring a desktop RAG tool called AnythingLLM which would run locally and require no programming. The latter sounded exactly what I needed and I set about downloading and installing it.

Leave a Reply

Your email address will not be published. Required fields are marked *