Getting my thoughts in order

The primary objects to be addressed in this journey are the books and other items on the topmost shelf in the cabinet in my study, as shown in the photo below.

To get started, I made a quick assessment of what each item means to me by looking at the spines in the photos and writing brief notes in scrawly longhand in my Journeys notebook – somehow, electronic media just can’t replicate the freedom and unconstrained nature of the longhand scrawl. Although this exercise stimulated a good selection of musings, it also made me think that there are probably several different points of view from which a book can be considered and that each of those points of view might stimulate different thoughts. For example, I have a general awareness that the bookshelf contains a set of work books that I don’t look at very often; and I’m aware of that bookshelf in the context of all the other bookshelves I have and of the availability or shortage of space across all of them. When I take a closer look at the bookshelf I recognise specific books; and sometimes when I muse on one of them, thoughts pop into my head about it, including why I have decided to keep it. Finally, if I actually take a book off the bookshelf and handle it and leaf through some of the pages, I experience further thoughts; and, sometimes, the smell of a book might stimulate a memory. To try to be as methodical as possible, these different points of view have been given the following names:

Presence: Awareness of book/s or types of book/s in a particular location.

Space: Awareness of the type, suitability and availability of storage space the book(s) is/are in and the way they are being stored in the space.

Recognition: A consciousness of the Title/Author of a particular book and possibly of what the book is about.

Reason for Keeping: Why this particular book was retained during prior clear outs.

Sight Stimulation: The emergence of thoughts and memories related to a particular book stimulated by looking at the outside of the book (and possibly just the spine).

Handling Stimulation: The emergence of thoughts and memories related to a particular book stimulated by touching the book, opening it and leafing through the pages.

Smell Stimulation: The emergence of thoughts and memories related to a particular book stimulated by the smell of the book.

The first two (Presence and Space) apply to my awareness of the bookshelf as a whole and generated the following thoughts:

Presence (top shelf): Retained work books and some other books on the top shelf of the white cabinet

Space (top shelf): The shelf is full and I’ve run out of shelf space in the whole of my study. I could do with freeing up some shelf space or getting some more shelving.

The remaining points of view were considered with respect to each individual item on the bookshelf in turn. For Sight Stimulation and Handling Simulation there was no limit placed on the number of separate thoughts that could be recorded. The detailed results are shown in this analysis spreadsheet (note that the Smell Stimulation yielded almost no results – possibly due to a blocked nose, but maybe an indication that this type of stimulation occurs only rarely). The thoughts that were stimulated were then sorted into a two level categorisation. The top level categories are:

Topic, Publisher, Author, Acquisition, Use, Value, People, Events, Places, Experiences, Father, Borrowed, Physical characteristics, Spine, Scanning decision, Not applicable to this exercise.

The lower level categories and their frequency of occurrence are documented in this Types of Thoughts spreadsheet.

With this basic analysis work completed, the next step will be to ask the following question of each item on the shelf: “What would have to be displayed on an electronic story board to make me be prepared to destroy the item in the course of scanning it?”. I shall provide answers while reviewing the thoughts I have recorded for each item.

Still looking for a home

Back in 2015 I reported on my efforts to find a permanent home for my document collection. I had no success with any of the organisations I mentioned in that post, and subsequently turned my attention to trying to find a contemporary historian who is interested in the development of computing. I came across one Daniel Wilson (no relation) based at Cambridge University who has a particular interest in the history of science and technology; and I duly contacted him. Despite being interested in hearing about the contents of the collection, he felt unable to help, explaining that “this will require significant work and few people have the budget or the time, given current pressures”. He gave me the name of another contemporary historian at Leicester University who I also tried emailing, but, despite sending a follow-up, I got no response. I’ve concluded that individual academics just have too little time to take on the management of a collection that isn’t absolutely central to a specific piece of work that they are doing.

I am now turning my attention, once more, to institutions, and have just sent an email to the Keeper of Manuscripts and Special Collections (MSC) at the University of Nottingham. I came across this organisation in a JISC email which advised that MSC has just joined the DPC. I was able to mention in the email that, not only have I just completed a digital preservation exercise on the PAWDOC collection using templates which are published in the DPC website; but also that the PAWDOC collection contains much material from the Cosmos project in which the University’s Department of Computer Science took part – perhaps those little extra bits of information might spark an extra bit of interest.

Under the paper wait

The paper describing the PAWDOC digital preservation work was submitted to the Digital Preservation Coalition (DPC) on 31st May and the organisation responded saying it was interested in the paper but was currently unable to provide a timescale for dealing with it due to a busy work schedule. I guess it might be several months before hearing whether the DPC will want to publish a version of the paper.

Dust Jacket design and production

This week I put the finishing touches to ‘Touch the Join’; the end papers were glued to the boards and I completed the dust jacket. The volume now awaits a place on the top shelf of my cabinet when its current contents are digitised in the Electronic Story Book journey.

The creation of the dust jacket was particularly Interesting. The thickness of the book meant that there was considerably more space on the spine to do things that perhaps wouldn’t work at all in a narrower area. I decided to use the space to illustrate the title, so I took lots of photos of my fingers touching the bare leather spine and embedded the title in one of them with the result shown below.

The other two pictures on the spine are taken from within the book itself along with the other sixteen images that appear on the front and back of the jacket – all assembled in a PowerPoint slide. Fingers have been superimposed in another two of them to reiterate the message of the title. The front and back of  the dust jacket are shown below.

The inside flaps provided the opportunity to provide the following rationale for producing the book:

” In the years before retiring in 2012, I had accumulated a number of projects that I didn’t have time to work on. Things like the analysis of why I kept certain documents after scanning and not others; and the comparison of my incoming communications between  1981 and 2011; and the Roundsheet. As retirement approached, I began to realise that I could undertake all of these and more under a collective banner; and that, for some of them, I might be able to find  collaborators, academic or otherwise, to advise me  or to work with me. I thought of these as prospective journeys of discovery, unfettered by organisations or money.

To provide a structured framework within which to work, I decided to record a journal for each journey; and to set up a website to provide an open record of my activities for prospective collaborators to see what I was doing. Consequently, in April 2012, pwofc.com opened for business.   

One of the journeys I embarked on was digital preservation work on my lifetime collection of work documents, to ensure that its contents could be accessed in the future. The collection includes some self-contained web sites, so I investigated the best way of storing web sites long term. However there appeared to be no simple solutions. The industry standard WARK methodology seemed far to complex for my needs, so I stuck to the approach I had always taken – keeping all the files together in a zip file.

However, it did get me thinking about how to preserve pwofc.com; and I suddenly realised that a more tangible way of doing it would be to simply put it on a bookshelf. I had started attending bookbinding classes in Bedford in 2017, and I had already created, printed and bound a book of my own (Sounds for Alexa), so I knew it would be possible. I realised, also, that it would be an interesting opportunity to compare the features of a web blog and a book.

That’s how this tome came about. I wonder what its future life will be? My guess is that it’ll last longer than its electronic counterpart.”

The printing of the dust jacket was quite demanding because its length (60cm) required a custom print size to be set up in the canon MG3550 printer driver software; but the printer driver software does not permit Borderless Printing with custom print sizes. Hence it was not possible to avoid getting a 0.5 cm blank edge all round the print. Furthermore, the height required was exactly the widest the Canon MG3550 printer was physically capable of handling (22cm) but the printer driver software only permitted a custom size of 21.59 cm,  so a further 0.41 cm of blank space was introduced on the top or bottom edge. I produced 4 separate test prints and each time tried to get an equal amount of blank space on the top and bottom edge by moving the image to be printed up and down; but wasn’t able to achieve it, so I ended up with a larger space along the top edge and a smaller space along the bottom edge. I didn’t think this would look very good, so decided to fill the blank edges with gold wax gilt by using masking tape and applying the ‘KIng Gold’ version of Pebeo’s Gede guilding wax using a small paint brush with the hairs cut down to a length of about 5mm.

A further complication arose when trying to wrap the dust jacket around the book with the spine in the appropriate place. Because of the size of the book and the length of the dust jacket it was necessary to handle the print quite a lot to get it in the right place, and I found that some ink was coming off on my fingers. Despite experiencing this on the test prints and then being super careful with the final master version, some ink smears still found their way onto parts of the jacket. I’ve decided I’ll live with these for the time being. Perhaps, at my leisure in the future, I’ll have another go and leave the print for a few weeks in the hope that the ink fixes more securely.

Finally, covering the jacket with a sheet of transparent plastic (probably polypropylene) was relatively straightforward – just cut to size with several centimetre overlap all round and then fold over the top and bottom edges of the dust jacket. However, there’s an issue with using this material that I havn’t yet found a way to resolve: the plastic attracts all the dust and hairs that are already lying on the surface on which you cut it and fold it. I guess if I had a dedicated workbench which I could keep immaculately clean, that would do the job – but I don’t and have to make do with whatever area I can find that’s large enough to take an expanse of the 80cm wide roll. Consequently the outside of the cover had lots of bits on it which I have tried to remove using a damp cloth. However, there may also be bits on the inside of the plastic. Luckily, once the covered dust jacket is on the book, such bits are not immediately obvious to the casual eye.

So, that’s the whole story of the book of the blog. Perhaps there’ll be an accompanying volume in a few more years.

Telling the story behind emotions and feelings

In a previous journey (electronic bookshelf) I digitised unused university and work books, but retained some of those physical volumes because they had special meaning for me. However, the books I kept are still unused and taking up valuable shelf space; so this journey will digitise them and explore how to represent the meanings that they have for me, and how to stimulate the feelings that sight of the books inspire in me. This is likely to involve telling the story which generates such a response. Some non-work books that are on the same shelf will also be included in the exercise. As was the case in the electronic bookshelf work, the solutions and designs will be based on the assumption that electronic paper will eventually enable large paper-thin wall displays to be created in any shape and size. In the absence of such versatile electronic paper at an affordable price, the work will simulate such a capability using card, posters and iPad.

Paper written – Maint Plan test to do

The follow up paper describing my recently completed preservation project, is now ready for submission to the Digital Preservation Coalition (DPC). I’m hoping that, since they published my paper describing how I derived the Preservation Planning Templates in the first place, they might be interested in taking a paper describing how they have been used in practice. We’ll see. In any case it’s good to have been able to create a summarised account of what happened while its fresh in my mind.

Writing the first draft of the paper only took about a week. However, that piece of work made me realise that the details of what got done when, appears in five main documents – the paper I was writing, the Scoping document, the Plan DESCRIPTION, the Plan CHART, and section 2 of the Preservation Maintenance Plan (Previous preservation actions taken); and that the base data for all these documents was being derived from the three major controls sheets – the DROID analysis spreadsheet, the Files-that-won’t-open spreadsheet, and the Physical Disks spreadsheet. Although the facts were roughly consistent across the documents, there were several anomalies that would be apparent to readers, and the sheer number of files and types of conversions that had been performed made it difficult to check and make revisions. I decided that the only way to achieve true consistency and traceability across all the documents would be to specify columns in the control spreadsheets for all the categories I wanted to describe, and to have the spreadsheets add up the counts automatically.  This is what I spent the following two weeks doing – and a very slow and tortuous exercise it was. Which is why the paper makes several mentions of the need to set up control sheets correctly in the first place to facilitate downstream needs for control and for statistical information about what’s been done….

I was given a lot of very useful feedback on the drafts of the paper by Ross Spencer, including suggestions to include a summary timeline for the project at the beginning of the paper, to provide more details about the DROID tool, and to include some additional references.  Ross also advised making it clear that this is a personal collection with preservation decisions being made that the owners were comfortable with; and that different decisions might have been made by other people from the perspective of who the future users of the Collection might be. This prompted me to include an extra paragraph in the Conclusions section to the effect that no attempt has been made to convert some files (such as old versions of the Indexing software, or a Visio stencil file) because they don’t have content and their mere presence in the collection tell their own story. However, it’s got me thinking that there is a wider point here about what collections are for, and just how much detail of the digital form needs to be preserved. I’ll probably explore this issue further in the Personal Document Management topic in this Blog.

Writing the paper also prompted me to realise that, unfortunately, my Digital Preservation Journey can’t be completed until I’ve tested out the application of a Preservation Maintenance Plan. It’s one thing to fill in a Maintenance Plan (which was relatively quick and easy), but quite another to have it initiate and direct a full blown Preservation project. Only by using it in practice will it be known if it is an effective and useful tool; and, no doubt, its use will lead to some refinements being made to its contents. I shall explore whether I could use the Maintenance Plans I produced for photos and for mementos which were created in the course of the trials conducted when putting together the first versions of the Preservation Planning Templates. If they won’t provide an adequate test, I’ll have to wait until the date specified in the PAWDOC Preservation Maintenance Plan for the next Maintenance exercise – September 2021.

Just the Dust Jacket left to do

After about a dozen bookbinding classes, the 9cm stack of loose paper has been transformed into a tightly knit, disciplined, battalion of messengers. The metamorphosis involved 2-up stitching, attaching the end bands and hollow, securing the tapes to the boards, paring the leather and gluing it to the boards, and finally printing the gold lettered title on the spine. The photos below illustrate some of these intermediate stages.

Aside from the small matter of gluing down the end papers, there only remains the dust jacket to create, print and fit – a blank canvas which I’m looking forward to designing. Several people in my bookbinding class can’t understand why anyone would want to put a cover on a nice leather bound book, but, for me, there are two good reasons for doing so: first, my bookshelves are full of brightly coloured and good condition dust jacket spines – I don’t think plain spines look good among the rest of the books; and, secondly, the ability to personalise a book with a dust jacket design and to include additional descriptive text on the inside sleeves is a great opportunity to explain my relationship to the artefact and what it means to me – particularly for books I have created myself.

PawdocDP Preservation Project Put to Bed

Last Thursday (03May) I completed the preservation project on my document collection – quite a relief to know that it is now in reasonably good shape for a few more years. To finish off this work I intend to write a follow up paper recounting how the processes and templates I developed in the earlier stages of this exercise, fared when applied to a substantial body of files. Looking back I see that I started this Preservation Planning topic nearly four years ago, so its been a long haul and very labour intensive – I’m looking forward to being able to move it to the Journeys Completed section of this blog so that I can concentrate again on more creative and exciting forays!

Disk, Reordering, and Maintenence Plan Insights

Although my last post reported that I’d got through the long slog of the conversion aspects of this preservation project, in fact there was still more slog of other sorts to go. A lot more slog in fact: there was the transfer of the contents of 126 cd/dvd disks to the laptop; and there was the reordering of pages in 881 files to rectify the page order produced by scanning all front sides first and then turning over the stack of pages to scan the reverse sides at a time in the 1990s when I didn’t have a double sided scanner. In fact this exercise involved yet more conversion (from multi-page TIF file to PDF) before the reordering could be done.

This latter task really took a huge amount of time and effort and was yet another reminder of how easy it is to specify tasks in a preservation project without really appreciating how much hard graft they will entail. Having said that, it’s worth noting that my PDF application – eCopy PDF Pro – had two functions which made this task a whole easier: first, the ability to have eCopy convert a file to PDF is available in the menu brought up by right clicking on any file, thereby automatically suggesting a file title (based on the title of the original file) for the new PDF in the Save As dialogue box, and which then automatically displays the newly created file – all of which is relatively quick and easy. Second, eCopy has a function whereby thumbnails of all the pages in a document can be displayed on the screen and each page can be dragged and dropped to a new position. I soon worked out that the front-sides-then-reverse-sides scan produces a standard order in which the last page in the file is actually page 2 of the document; and that if you drag that page to be the second page in the document, then the new last page will actually be page 4 of the document and can be dragged to just before the 4th page in the document. In effect, to reorder simply means progressively dragging the last page to before page 2 and then before page 4 and then before page 6 etc until the end of the file is reached. Both these functions (to be able to click on a file title to get it converted, and to drag and drop pages around a screenfull of thumbnails) are well worth looking for in a PDF application.

Regarding the disks, I was expecting to have trouble with some of the older ones since, during the scoping work, I had encountered a few which the laptop failed to recognise. I did try cleaning such disks with a cloth without much success. However, what did seem to work was to select ‘Computer’ on the left side of the Windows Explorer Window which displays the laptop’s own drive on the right side of the window together with the any external disks that are present. For some reason, disks which kept on whirring without seeming to be recognised, just appeared on this right side of the window. I don’t profess to understand why this was happening – but was just glad that, in the end, there was only one disk that I couldn’t get the machine to display and copy its contents.

I’m now in the much more relaxed final stages of the project, defining backup arrangements and creating the Maintenance Plan and User Guide documents. The construction of the Maintenance Plan has thrown up a couple of interesting points. First, since it requires a summary of what preservation actions have been completed and what preservation issues are to be addressed next time, it would have made life easier to construct the preservation working documents in such a way that the information for the Preservation Maintenance Plan is effectively pre-specified – an obvious point really but easy to overlook – and I did overlook it…. The second point is a more serious issue. The Maintenance Plan is designed to define a schedule of work to be undertaken every few years; its certainly not something I want to be doing very often – I’ve got other things I want to do with my time. However, some of the problem files I have specified in the ‘Possible future preservation issues’ section in the Maintenance Plan could really do with being addressed straight away – or at least sooner than 2021 when I have specified the next Maintenance exercise should be carried out. I guess this is a dilemma which has to be addressed on a case by case basis. In THIS case, I’ve decided to just leave the points as they are in the Maintenance Plan so that they don’t get forgotten; but to possibly take a look at a few of them in the shorter term if I feel motivated enough.

The Conversion Slog

I’m glad to say I’ve nearly finished the long slog through the file conversion aspects of this digital preservation project. After dealing with about 900 files I just have another 50 or so Powerpoints and a few Visios to get through. It’s been a salutary reminder of how easily large quantities of digital material could be lost simply because the sheer volume of files makes for a very daunting task to retrieve them.

Below are a few of the things I’ve learnt as I’ve been ploughing through the files.

Email .eml files: These are mail messages which opened up fine in Windows Live Mail when I did the scoping work for this project. Unfortunately, since then I’ve had a system crash and Live Mail was not loaded into my rebuilt machine; and Microsoft removed all Live Mail support and downloads at the end of 2017. On searching for a solution on the net, I found several suggestions to change the extension to .mht to get the message to open in a browser. This works well, but unfortunately the message header (From, To, Subject, Date) is not reproduced. I ended up downloading the Mozilla Thunderbird email application, opening each email in turn in it, taking screenshots of each screenfull of message and copying them into Powerpoint, saving each one as a JPG, and then inserting the JPGs for all the emails in a particular category into a PDF document. A bit tortuous and maybe there are better ways of doing it – but at least I ended up with the PDFs I was aiming for.

Word for Mac 3.0 files: These files did open in MS Word 2007 – but only as continuous streams of text without any formatting. After some experimentation, I discovered that doing a carriage return towards the end of the file magically re-instated most of the formatting – though some spurious text was left at the end of the file. I saved these as DOCX files.

Word for Mac 4.0 & 5.0 and Word for Windows 1.0 & 2.0: These documents all opened up OK in Word 2007. However, I found that in longer documents which had been structured as reports with contents list, the paging had got slightly out of sync so that headings, paragraphs and bullets were left orphaned on different pages. I converted such files to DOCX format in order to have the option to reinstate the correct format in the future. Files without pagination problems, or which I had been able to fix without too much effort, were all converted to PDF.

PDF-A-1b: I have previously elected to store my PDF files in the PDF-A-1b format (designed to facilitate the long term storage of documents). However, on using the conformance checker in my PDF application (e-Copy PDF Pro) I discovered that they possessed several non-conformancies; and, furthermore, the first use of e-Copy PDF Pro’s ‘FIX’ facility does not resolve all of them. I decided that trying to make each new PDF I created conform to PDF-A-1b would take up too much time and would joepardise the project as a whole. So, I included the following statement in the Preservation Maintenance Plan that will be produced at the end of the project: “PDF files created in the previous digital preservation exercise were not conformant to the PDF-A-1b standard, and the eCopy PDF Pro ‘FIX’ facility was unable to rectify all of the non-conformances. Consideration needs to be given as to whether it is necessary to undertake work to ensure that all PDF files in the collection comply fully with the PDF-A-1b standard.

PowerPoint – for Mac 4.0. Presentation 4.0, and 97-2003: All of these failed to open with Powerpoint 2007, so I used Zamzar to convert them. Interestingly Zamzar wouldn’t convert to PPTX – only to Powerpoint 1997-2003 which I was subsequently able to open with Powerpoint 2007. So far, it has converted over 100 Powerpoints and failed with only four (two Mac 4.0 and two Presentation 4.0). The conversions have mostly been perfect with the small exception that, in some of the files, some of the slides include a spurious ‘Click to insert title’ text box. I can’t be sure that these have been inserted during the conversion process, but I think it unlikely that I would have left so many of them in place when preparing the slides. Zamzar’s overall Powerpoint conversion capability is very good – but I have experienced a couple of irritating characteristics: first, on several occassions it has sent me an email saying the conversion has been successful but then fails to provide the converted file implying that it wasn’t able to convert the file; and second, the download screen enables five or more files to be specified for conversion but if several files are included it only converts alternate files – the other files are reported to have been converted but no converted file is provided. This problem goes away if each file is specified on its own in its own download screen. The other small constraint is that the free service will only convert a maximum of 50 files in any 24 hour period – but that seems a fair limit for what is a really useful service (at the time of writing, the fee for the cheapest level of service was $9 a month).

UPDATED and ORIGINAL: I am including UPDATED in the file title of the latest version of a file, and ORIGINAL in earlier versions of the same file, because all files relating to a specific Reference No are stored in the same Windows Explorer Folder and users need to be able to pick out the correct preserved file to open. There will be only one UPDATED file – all earlier versions will have ORIGINAL in the file title. Another way of dealing with this issue of multiple file versions would be to remove all ORIGINAL versions to separate folders. However, this would make the earlier versions invisible and harder to get at, which may not be desirable. I believe this needs further thought – and the input of requirements from future users of the collection – before the best approach can be specified.

DOCX, PPTX and XSLX: When converting MS Office documents, unless I was converting to PDF, I elected to convert to the DOCX, PPTX and XLSX formats for two reasons – it is Microsoft’s future-facing format, and that – for the time being – it provides another way of distinguishing between files that have been UPDATED and those that haven’t.

Many of these experiences came as a surprise despite the amount of scoping work that was undertaken; and that is probably inevitable. To be able to nail down every aspect of each activity would take an inordinate amount of time. There will always be a trade off between time spent planning and the amount of certainty that can be built into a plan; and it will always be necessary to be pragmatic and flexible when executing a plan.