PAWDOC: Technology requirements and problems

To operate a personal electronic filing system, you need a computer with a screen, a scanner, software to manage an Index and the documents, and a general approach. My colleague, John Pritchard, and I decided to explore what it would be like to operate such a system after visiting Amoco in the USA, and we followed the approach that we had seen there: every document was given a reference number and an index entry and was then stored in reference number order. Searches were performed on the index and retrieval was achieved by using the reference number.

We were able to apply the approach immediately using index cards. However, the technology to support the approach took a long time to become sufficiently powerful and cheap to become feasible for the individual to apply it: and it took many more years before it could be considered to fully support personal electronic filing systems. Consequently, much of the experience gained in using hardware and software to support PAWDOC has been in how to manage imperfect technology solutions. This has been particularly the case with computer storage which was insufficient and expensive when I first started scanning PAWDOC documents in 1996. The bulk of scanned documents had to be held offline on Magneto-Optical disks and this not only imposed a whole set of management requirements but also constrained the portability of the system. Today, however, storage is plentiful and cheap and the whole of the digitised PAWDOC collection is held on my laptop.

Scanners too have become better and cheaper since 1996. The first one I had was only capable of scanning in Black & White and one side of the paper at a time. Consequently, scanning large documents took a long time, and any colour on documents I scanned at that time has been lost. The scanner I have today takes less time to scan a page despite the fact it is also scanning in colour and both sides of the paper as it goes through the machine.

In many ways the software to support personal filing has always been in place, but its performance has been constrained by computing power. For example, the indexing software I use took over three minutes to conduct a complex search on less than 4,000 records in 1988, whilst my current version of the same software takes less than one second to conduct the same search on over 17,000 records.

The software to manage the stored documents has also been constrained by computer power – but in a rather unexpected way. In the 1980s and 90s when I first started using the PAWDOC system the conventional thinking was that a dedicated Document Management System was needed for the purpose. Such software applications were large complex beasts with numerous features and they relied on an underlying database application. Today, PAWDOC documents are stored in Windows folders labelled with a Reference Number. My laptop and the Windows 10 operating system are more than powerful enough to be able to display and search over 17,000 folders in just a few seconds. Such a solution would not have been feasible in the mid 90s, but today’s power has enabled a very complicated and constraining element of the personal electronic filing system architecture to be dispensed with.

Specific questions relating to this aspect are answered below. Note that the status of each answer will fall into one of the following 5 categories: Not Started, Ideas Formed, Experience Gained, Partially Answered, Fully Answered.

Q38. What additional software functionality is required?

2001 Answer: Partially answered:

  • A system which eliminates the need for two systems by combining simple and flexible indexing and searching functionality, and file management functionality which keeps track of the thousands of electronic files (Wilson 1996a: 3).
  • Facilities to detect low usage and to automatically recommend the destruction of paper (after scanning).
  • Intelligent synonym functionality that can recognize relationships between frequently used abbreviations and terms, and which requests the user to confirm possible synonym relationships (Wilson 1990: 96 ± 97).
  • The ability to automatically manage multi-part reference numbers of the type PAW/DOC/7653/01 and to be able to present the next unused number.
  • The ability to produce a KWIC (Key Words In Context) or KWOC (Key Words Out of Context) index (Wilson 1992a: 29,30).
  • The ability to store a set of web pages without losing the links between them (the FISH Document Management System is unable to do this because it stores each individual file with a new file name consisting of a combination of alphanumerics) (Wilson 1995b: 131)
  • Functionality to support the assembly, development and use of knowledge (Wilson 1997: 3 ± 4).

2019 Answer: Fully answered: My current views on the additional functionality listed in the 2001 answer are as follows:

  • Combined Indexing and file storage: Now that I have eliminated the Document Management System and replaced it with Windows folders, I no longer feel this is needed. However, despite retrieval being simple and quick, it could be made even more effective if the files associated with a particular Reference Number could be automatically listed under the Index entry for that number; and if the file you require could be selected and opened from that list.
  • Low useage detection: Now that all documents are digitised and paper is no longer taking up valuable space, there is no need to identify which hardcopies are not being accessed and could therefore be digitised and removed. Consequently, this requirement is no longer needed.
  • Intelligent synonym functionality: Terminology continues to change, so this is still required.
  • Management of multi-part Reference Numbers: This is still a requirement. It would make it quicker and easier to create new index entries.
  • Production of a KWIC index: I no longer produce paper backups of the index, so this is no longer required.
  • Store web pages without losing links: I now use zip functionality to combine and store the multiple files making up a single web site, so this is no longer required.
  • Nugget/knowledge management: I never clearly ascertained if this would be worthwhile or not (see more detailed discussion in the answers to questions 27 – 29, and also in the topic ‘Knowledge Development‘ elsewhere in this web site).

In addition, I would add the following:

  • Use of flexible Date formats: This is required to be able to specify BOTH exact dates (for, say, the date a document gets created or a letter is sent – dd/mm/yyy); AND partial dates (for, say, the year a book is published – yyyy – or the month and year of publication of a journal or magazine – mm/yyyy)

Q39. What technology problems have been experienced while operating the electronic filing system?

2001 Answer: Experience gained:

  • Replacing the PC requires the re-installation of all the software, which has been problematic on the last three occasions.
  • Upgrades of software can require a complex conversion process.
  • The index software (Filemaker Pro) crashes from time to time, but the Filemaker recovery function has always been able to deal with it except for an occasion over 10 years ago when the backup file had to be used (Wilson 1992a: 65, 72, 79).
  • About 30 of the files were lost in the document management system – probably in the course of moving them to off-line storage or moving them back into the PC’s hard disk (Wilson 1995b: 137, 139).
  • One of the Magneto-Optical disks became corrupted and the backup files had to be used (Wilson 1995b: 141).

2019 Answer: Fully answered: Over the last 20 years most of the technology problems I’ve had, seem to relate to four main areas – storage, specialist software, upgrades, and obsolescence:

  • Issues associated with the lack of cheap reliable storage: This problem has largely disappeared. When I first started scanning in 1996, I had to use external Magneto-Optical disks attached to my laptop, and I did suffer some data transfer and disk corruption problems. Today I have more than enough fast storage with the 1Tb SSD in my laptop.
  • The management and cost of specialist software: I had to deal with a wide variety of issues over the years with my document management software and its associated Sybase, and subsequently SQL, database. So much so that I have concluded that it is far better to avoid all specialist software if at all possible. It introduces complexity and is costly to buy, upgrade and support. While general purpose software may have fewer features, overall it is likely to be much easier to manage and use, and is likely to be a much more viable long term solution for the individual. I am very pleased to have eliminated the document management system and associated database from the PAWDOC architecture, and to now be using the much more familiar and straightforward Windows folders to store PAWDOC files in. I still use Filemaker for the Index but I regard this also as specialist software. Although it is very reliable and presents few management issues, it still has to be upgraded every three years at a cost of over £200 a time; whereas I know that I could still operate the Index if I exported the data to an Excel spreadsheet. In conclusion, I would recommend anyone setting up a personal electronic filing system to use standard multi-purpose software, preferably which you are already using, and to avoid specialist software if at all possible.
  • The complexities associated with upgrading platforms and operating systems: Moving systems from old to new computers, or upgrading operating systems, are major changes with associated risks. That’s not to say that it will necessarily be difficult – but over the years I have encountered issues and have found the more complex the systems being used the greater the challenges. The document management system had to be totally reinstalled from scratch when it was moved to a new laptop and that was something I only ever achieved once by myself without any supplier support. Now that PAWDOC only uses a Filemaker Index and Windows folders, the risks and difficulties associated with upgrades are much lower.
  • Obsolescence: As files are accumulated over the years, they may become unreadable because you no longer have the appropriate application software running on your machine. When I conducted a Digital Preservation exercise on the PAWDOC system in 2016-2018, I discovered many examples of such files, and it took a considerable effort to deal with the problems and achieve readable files again. Similar problems can affect hardware such as disks and memory sticks – though I feel less vulnerable on this front as I have sufficient storage on my laptop to cope with all PAWDOC requirements. However, anyone operating a long term filing system is going to have to undertake periodic Digital Preservation work of one sort or another to ensure that their documents continue to be readable.

Q40. What contingency arrangements can be made to minimize and overcome technology problems?

2001 Answer: Ideas formed:

  • Make clear notes on little used technology procedures and fixes.
  • Document system components and configuration settings.
  • Assemble support phone numbers.
  • Keep all of the above in hardcopy and in a place that does not require the filing system to find them.

2019 Answer: Fully answered: In addition to the four points made in the 2001 answer (make notes on procedures and fixes, document components and configurations, document support numbers, keep such documentation outside the system in hardcopy), I would add:

  • Be diligent about regularly backing up.
  • Ensure you know how to use backup data to reinstall applications.
  • If you have a specialist Index application, consider regularly exporting the data to a spreadsheet application so that, if the application fails, you still have immediate access to the Index.

Q41. What equipment is needed to operate a filing system and what are the key criteria by which it should be selected?

2001 Answer: Experience gained:

  • A high-resolution monitor preferably capable of displaying a whole A4 page in a magnification you can read.
  • A laptop computer with sufficient hard disk to store all the electronic files and scanned images in the collection, and with room for the growth of the collection.
  • An off-line storage system that can be used to make backups of all the collection’s electronic files and scanned images, as well as the electronic filing software application’s configuration, control and data files.
  • The equipment should not be too noisy.

2019 Answer: Fully answered:

  • A large screen high-resolution colour monitor big enough to display a whole portrait page sufficiently large as to be roughly readable without magnification, and//or capable of being turned into a portrait monitor as required.
  • A light-weight laptop computer small enough to be transported in hand luggage, with a high-resolution colour screen, sufficient fast SSD storage to accommodate the whole of the personal filing collection, and which makes a minimal amount of noise.
  • A colour scanner with both a sheet feeder and a flatbed capable of scanning documents at least A4 in size, which is reasonably fast, and is small enough to fit on or next to your desk. Its software should be capable of automatically adjusting the scan to the size of the document and automatically adjusting sloping originals to produce a vertical scan. It should also provide easy-to-use facilities for adjusting contrast and brightness to deal with poor originals, and for resetting after sheet feeder jams.
  • Two or more external hard disks or flash drives with sufficient capacity to store the whole of the personal filing collection, for use as a) a local backup, b) a remote in-country backup, and c) if required, a remote out-of-country backup.

Q42. What considerations should be taken into account when physically laying out the filing system?

2001 Answer: Partially answered:

  • Paper files should be placed so they are accessible while sitting at the desk (Wilson 1990: 94)
  • The scanner should be placed so it can be operated while sitting at the desk.

2019 Answer: Fully answered: Over the years I’ve had to cope with a variety of company offices and a long period of operating out of my home study. In all these situations I have tried to arrange the physical layout so I could conduct all my filing activities while seated at my desk. I’ve found this to be feasible and effective.  Hardcopy files can be placed in an upright filing cabinet (or cardboard boxes) alongside or behind one’s desk; and a scanner can be placed on the right-hand edge of the desk. Backup external drives can be placed in a pedestal drawer.

Q43. What criteria should be used to select an electronic filing system software package?

2001 Answer: Experience gained:

  • Ability to support the desired filing schema.
  • Ability to manage both hardcopy and electronic files.
  • Enables the rapid input of new items.
  • Enables easy and quick searching.

2019 Answer: Fully answered: In addition to the points made in the 2001 answer (support for the filing schema, management of both hardcopy and electronic files, rapid input of new items, and easy and quick searching), I would add:

  • Simplicity and understandability of the architecture of the system.
  • Ease of installation.

Q44. Is it feasible to construct a filing system out of multiple different software packages?

2001 Answer: Experience gained: Yes. However, provided all the requirements are met, it would be more efficient and easier to manage if only a single package was required.

2019 Answer: Fully answered: Yes, it is feasible, provided effective integration between the packages can be achieved, and provided not too much effort is required to set up and maintain the integration. However, it undoubtedly complicates matters and requires more effort to manage and maintain, therefore, the simpler the packages to be integrated the better. However, on balance I would not recommend it if a single piece of software will do the job.

Q45. How much file space do you need to store an individual’s personal files?

2001 Answer: Experience gained: Assuming only black and white scanning, no digitizing of journals or books, and no video material, a collection built up over 70 years would require approximately 53 GB (Wilson 2001a). Until experience is gained of colour scanning and digitized video, a more realistic figure cannot be estimated.

2019 Answer: Partially answered: The current PAWDOC collection can’t be considered a total lifetime collection because:

  • A substantial number of the colour hardcopies were scanned in B&W.
  • For about 10 years when I was working in Bid Management with highly confidential and fast-moving documents, the number of documents I was putting into the collection was much reduced.
  • The collection only includes about 30 years of my 40 years of work.

It should also be remembered that about half the collection was assembled under business conditions that were in transition from paper only to paper + electronic – very different from today’s environment. Furthermore, the type of work I did and my overlapping interest in technology research, dictated my coming into contact with a particular range of documents; different types of jobs and interests will dictate different numbers of documents of different types.

Having said that, all the items in PAWDOC have been digitised and the overall digital collection takes up about 46Gb.

Q46. How much file space is taken up by the average document?

2001 Answer: Partially answered: Chan’s results showed the following sizes for an A4 page: line art 87 kb; black and white 91 kb; halftone 181 kb; and colour 3347 kb (Chan 1993: 28). In practice, initial black and white scans at 240 dpi were producing an average file size of about 40 kb (Wilson 1995c: 1).

2019 Answer: Fully answered: File sizes vary depending on what application they have been created in and on whether they are scanned as colour or B&W documents. Therefore, file sizes for a number of these combinations were established using my current scanner (a Canon DR-2020U) to scan at 300dpi a single full page of typed text for the B&W document and a single page containing 5 colour photos of various sizes for the colour document.

  • B&W page created in Word 2007: 13 Kb
  • B&W page scanned in B&W to PDF: 105 Kb
  • B&W page scanned in Greyscale to JPG (the scanner would not scan in B&W to JPG): 579 Kb
  • B&W page scanned in 24 bit colour to JPG: 584 Kb
  • B&W page scanned in B&W to TIF: 69 Kb
  • Colour page created in Powerpoint 2007: 1,100 Kb
  • Colour page scanned in 24 bit colour to PDF: 808 Kb
  • Colour page scanned in 24 bit colour to JPG: 750 Kb
  • Colour page scanned in 24 bit colour to TIF: 25,389 Kb

Q47. What’s the best type of storage media to keep electronic files on?

2001 Answer: Experience gained: A hard disk in the laptop is best because it is so quick and easy to use. CDs are good because CD writers are cheap and CD drives are available in most laptops. Having said that, this does not preclude other media with similar characteristics.

2019 Answer: Fully answered: Its best to keep your files with you on a laptop – or on your mobile phone provided you have all the necessary applications on the phone and you feel the screen is big enough to be able to read the documents. However, since both laptops and phones are portable and therefore at higher risk of being lost or stolen, adequate measures must be taken to protect the data should the equipment fall into the wrong hands. Another possibility is to store the master set of files in a cloud-based service, however I believe that would be unwise due to the risks of the service failing or being subject to viruses or hacking. A cloud-based service may be suitable for backup, though external hard drives or SSD flash drives are cheap and effective enough for the purpose.

Leave a Reply

Your email address will not be published. Required fields are marked *