PAWDOC: Searching

The ability to search for and find a document is just about the most important aspect of a filing system, and that capability has undoubtedly been improved by increases in the power of the modern computer. For example, when the index for the PAWDOC collection was computerised in 1986 it took 228 seconds to conduct a standard search on 3200 records. By 2001, that standard search conducted on over 14,000 records took 7 seconds. That same search, now performed on over 17,000 records, takes less than a second on my current laptop – in fact it’s virtually instantaneous.

Of course, search speed is only half the story, since a targeted document also has to be selected and retrieved. The fact that whole digitised collections can be held on a modern laptop, means that this second part of the process can also be very quick. In fact, the total end-to-end search and retrieval time for the current PAWDOC system is typically between 15 and 30 seconds.

However, speed is not the most important element of a successful search. Instead, it is the ability to find what you are looking for – whether you are after a specific document or just doing a general search to see what you have on a specific subject. In the PAWDOC system, searches are conducted on the collection’s Index, therefore success is critically dependent on there being a match between the words specified in the search query and the words in the index entry of the targeted document(s). In a personal system, both elements – index entry and search query – come from a single individual’s mind, so more often than not a match is achieved. However, inevitably there are cases where a match isn’t achieved first time – and, sometimes, not at all. There are a variety of reasons for this including the passage of time (I placed my first document into PAWDOC some 38 years ago), and changing terminology as one becomes more familiar with a topic or as technology develops. To cope with these problems, strategies such as using terminology appearing in unsuccessful searches, and adding keywords to index entries when an item is eventually found, can be helpful.

Specific questions relating to this aspect are answered below. Note that the status of each answer will fall into one of the following 5 categories: Not Started, Ideas Formed, Experience Gained, Partially Answered, Fully Answered.

Q22. How long does it take to find items in the filing system?

2001 Answer: Partially answered:

  • Using the `t f o a d r’ test (find every record with the combination of t, f, o, a, d and r somewhere in it) on the 1988 Filemaker Index with 3,200 records running on a Macintosh computer, 27 hits were found in 228 seconds. The equivalent test on the 2001 Filemaker Index with 14,111 records running under Windows `98 on a Pentium II PC, took seven seconds to identify 612 hits. A search for all records containing the syllable `man’ across the same two systems took four seconds to identify 211 hits in the 1988 system, and one second to identify 1,137 records in the 2001 system (Wilson 1992a: 38, 85).
  • Having identifed the correct index entry (in the 2001 system) it takes between 10 and 20 seconds to obtain the reference number, go to the hardcopy cabinet/box, find the item and pull it out.
  • Should the index entry concerned refer to an electronic file, it takes about 6 ± 10 seconds in the 2001 system to have the document management software display the relevant folder, to double click the required item and to have it open up in the relevant application.
  • The total end-to-end retrieval time for the 2001 system is about 10 ± 30 seconds for hardcopy and 10 ± 20 seconds for electronic files.

2019 Answer: Fully Answered: Using the `t f o a d r’ test (find every record with the combination of t, f, o, a, d and r somewhere in it) on the 2019 Filemaker Pro 15 Index with 17,294 records running on a Chillblast Intel i7  computer with 8Gb of RAM, 1,224 hits were found in less than a second – in fact, almost instantaneously. A search for all records containing the syllable `man’ across the same system took less than a second to identify 1,538 hits (in fact it too was almost instantaneous).

Having identified the correct index entry, it takes between 10 and 15 seconds to copy the reference number, go to the Windows File Explorer screen, select the main PAWDOC folder, paste the number into the search field, press enter, and double click the folder when it appears. Since there may or may not be multiple files in the folder, and since different files may require different applications which probably open at different speeds, it is difficult to provide a reliable figure for selecting a file and opening it. However, as a very rough guide it is likely to take between 3 and 15 seconds.

Therefore, the total end-to-end retrieval time in the current system is approximately 13 – 30 seconds.

Q23. What can you do to speed up retrieval?

2001 Answer: Not started:

2019 Answer: Fully Answered: There are two key factors that affect retrieval times – Physical Proximity and System Integration. Physical Proximity relates to how close you are physically to the system and the hardcopy and/or digital documents. If you haven’t got the system with you then you can’t even identify the document you require, let alone retrieve it regardless of whether it is a hardcopy or digital document. If you are able to identify the document you require, and it is a hardcopy document, then the closer you are to the hardcopy documents the faster retrieval is likely to be (for example, retrieval will be faster if the hardcopies are in the same room that you are in as opposed to in a room down the corridor). If the document you require is an electronic document, then it will probably be a little faster to retrieve if it is on the same system you are using to search for documents, than if it is on some remote server elsewhere. Therefore, from a Physical Proximity perspective, retrieval can be speeded up by making sure that the Index and the digital store and any hardcopies are all as close to the user as possible.

System Integration refers to the linkage between the searchable Index and a collection’s digital files. Zero integration requires the user to remember the Reference Number selected in the search process, to go to the database of digital files, and to use the Reference Number to open the relevant folder. In contrast, a very high level of integration might be achieved by having the files being stored under a particular Reference Number, appear somewhere in the Index screen for that Reference Number, and being able to open a particular file from there. A halfway house might be to have a macro which will use a Reference Number identified in the index to automatically open up the folder of that particular Reference Number. Therefore, from a System Integration perspective, retrieval can be speeded up by reducing the keystrokes required to go from selecting an Index entry to viewing the files associated with that Index entry.

Q24. In what circumstances are searches conducted?

2001 Answer: Partially answered:

  • `Start Work’: focused assembly of information while under no pressure.
  • `Mid Work’: a search for a specific piece of information while pre-occupied with the interrupted activity.
  • `Visitor’: a search while you are talking to someone at your desk.
  • `Phone Call’: a search while you are on the phone to someone.

2019 Answer: Fully Answered: There are probably three intersecting dimensions to the circumstances in which searches are conducted: Activity (what you’re doing at the time you conduct the search); Work Content (the topic you are working on when the search is conducted); and Location (the type of place in which the search is conducted).

Four different types of Activity are described in the original BIT answer in 2001 (Start Work, Mid Work, Visitor, Phone Call) – though I have no data on the relative frequency of each of those. However, from experience I would guess that Mid Work occurred most often with the frequency of Start Work, Phone Call and Visitor occurring in that descending order.

Work Content might be the subject you are looking into, or the project you are working on, or the organisation you are working for, or any other categorisation that summarises the type of work being undertaken. Again, no data is available to identify what types of work content have been most associated with the searches made on the PAWDOC collection.

Location can be categorised as Employer’s Office, Other Organisation’s Premises, Travelling, and Home. I know that, over the years, I have indeed conducted many searches at all these types of location.

Q25. What are the most common types of searches?

2001 Answer: Partially answered:

  • The `Familiar Item’ search for an item that you have accessed several times before.
  • The `Long Lost Friend’ search for an item you are sure is there but have not accessed recently.
  • The `Shot in the Dark’ search to see if there is any material on a subject.
  • The `Literature Search’ to find everything you have on a subject.

2019 Answer: Partially answered: The 2001 BIT answer provides one perspective on the most common types of searches conducted on the PAWDOC collection (the ‘Familiar Item’ search; the ‘Long Lost friend’ search; the ‘Shot in the dark’ search; and the ‘Literature Search’). However, another perspective might be to categorise the types of document content most frequently searched for. This analysis might be feasible to perform using the ‘Date Last Accessed’ field’ in the Filemaker Index. Although this may not be an entirely accurate record of which items have or haven’t ever been searched for, it nevertheless does provide some sort of indication. Therefore, by categorising the 4,551 records which have an entry in the Date Last Accessed field (out of a total 17,294 records) and ranking the categories by number of occurrences, some indication will be gained of the types of documents that have been searched for and their relative frequency.

Q26. What are the most effective search strategies?

2001 Answer: Ideas formed:

  • Get into the habit of searching the filing system when you need some information even when you don’t think you have anything relevant; after several years you forget what you have (Wilson 1992a: 4, 25).
  • Let your mind roam freely when selecting search words; you are more likely to come up with words you originally specified as keywords (Wilson 1992a: 4).
  • Specify searches with minimal parts of words to avoid problems where spelling errors have been made.

2019 Answer: Fully Answered: In addition to the suggestions made in the 2001 BIT article (search just in case, let your mind roam freely, and use minimal parts of search words), I would add:

  • Use any older terminology you can think of if your current terminology isn’t coming up with the goods;
  • Check the results of unsuccessful searches to see if there are any terms which you might try in subsequent searches;
  • If eventually you are successful in a search that has taken some time, consider adding some additional search terms to the index entry to give yourself a better chance of quicker success in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *