Four approaches and some decisions

I set out on this journey to find a way of archiving this pwofc.com web site. I’ve explored four different approaches, each with their own distinctive characteristics as summarised below:

Hosting Backup Facilities: No doubt different for each hosting operation, therefore my experience is limited to the hosting package that I use (which does not include a backup service).

  • Provides functions to download collections of files.
  • Creates a point-in-time replica of the content files of the web site.
  • The backup replica cannot be viewed on its own – it requires other facilities such as underlying database software to generate the web site.

HTTrack: This is a free software package that operates with a GNU General Public Licence. It downloads an internet web site such that it can be read locally in a browser.

  • Creates a point-in-time local replica which can be read offline with a browser, with PC response times.
  • It did not replicate the search facility available in my online web site.
  • It has complex configuration options and limited documentation and help – but these were not needed to undertake a simple mirror of the web site.
  • If the capture configuration specifies that external sites should not be captured, it provides URLs which can be clicked to go to that site (I did attempt to capture external web pages but the first time it failed after 2 minutes, and the second time (when I thought I had specified that it should collect just 1 specific external web page) it was collecting so much that I had to stop it running – I clearly didn’t configure it correctly…
  • The files that HTTrack produces can be zipped up into a single file and archived.

UK Web Archive (UKWA): A British Library service that stores selected web sites permanently; which captures updated versions on a yearly basis; and which makes all copies freely available on the net.

  • Requires that a web site is proposed for inclusion in the UKWA, and that approval is given.
  • Creates dated replicas of a web site which can be selected and viewed online in a browser.
  • Does not include the contents of external pages, and, in some cases, does not even provide a clickable URL of the external page.
  • In the replica site, the Home link on a page that has been linked to, doesn’t work.
  • In the replica site, two images with embedded links are not displayed and are replaced with just the text titles of the images.

Book: A copy of the web site printed on paper and bound in a book.

  • Creates a point-in-time replica of the web site on paper with some formatting adjustments to accommodate the different medium.
  • Produces a copy in a format which is very familiar to humans and which can be easily accommodated on a shelf in a house.
  • Cross referencing links work but are slower to follow than the digital equivalent.
  • May have better longevity than a digital equivalent.

As a result of these investigations, I’ve decided to:

  • Continue to use HTTrack to create mirrors of pwofc.com periodically
  • Continue to create a book of pwofc.com every five years: the next one is due in 2022 and will be called ‘Feel the Join’ (as opposed to the 2017 version which was called ‘Touch the Join’).
  • Be thankful that the British Library is archiving pwofc.com.

I think that just about wraps up all I’m prepared to do on this subject, so this journey is now complete.

The UK Web Archive

Its been over a year since I wrote about this journey, so I’ll start this entry with a short recap of where I’m up to. Back in March 2019, I decided I would explore three different ways of archiving this pwofc website. First, by using tools provided by the company I pay to host the site; second, by using a tool called HTTrack, and thirdly, by submitting the site for inclusion in the British Library’s UK Web Archive (UKWA).

My experiences with the hosting site tools was less than satisfactory, and are documented in a post on 28April2019 entitled ‘A Backup Hosting Story’. My use of HTTrack was much more rewarding; it produced a complete backup of the whole of the site which could be navigated on my laptop screen with near instantaneous movement between pages, and which could be easily zipped into a single file for archiving. This is written up in the 30Apr2019 post titled ‘Getting an HTTrack copy’.

I’ve had to wait till now to relate my experience of submitting the site to the British Library’s UK Web Archive (UKWA), because the inclusion in the archive has been a little problematic. Here’s what happened: following a suggestion from Sara Thomson of the DPC, I filled in the form at https://beta.webarchive.org.uk/en/ukwa/info/nominate offering pwofc.com for archiving. Within about three weeks I received an email saying that the British Library would like to archive the site and requesting that I fill in the on-line licence form which I duly completed. A couple of days later, on 16th March 2019, I got an email confirming that the licence form had been submitted successfully and advising that: “Your website may not be available to view in the public archive for some time as we archive many thousands of websites and perform quality assurance checks on each instance. Due to the high number of submissions we receive, regrettably we cannot inform you when individual websites will be available to view in the archive at http://www.webarchive.org.uk/ but please do check the archive regularly as new sites are added every day.”

From then on I used the search facility at http://www.webarchive.org.uk/ every month or so to look for pwofc.com but with no success. Over a year later, on 21st April 2020, I replied to the licence confirmation email and asked if it was normal to wait for over a year for a site to be archived or if something had gone wrong. The very prompt reply said, “Unfortunately there is a delay between the time we index our content and when it can be searched through the public interface. We aim to update our indexes as soon as possible and this is an issue we are trying to fix, please bear with us as we do have limited resources. Your site has been archived and it can be accessed through this link: https://www.webarchive.org.uk/wayback/archive/*/http://www.pwofc.com/.

Sure enough, the link took me to a calendar of archiving activity, which showed that the site had been archived three times – twice on 01July2019 (both of which seemed to be complete and to work OK); and once on 13Mar2020 (which when clicked seemed to produce an endless cycle of uploadings). I reported this back to the Archivist who scheduled some further runs, and who, after these too were unsuccessful, asked if I could supply a site map. I duly installed the Google XML Sitemaps plugin on my pwofc.com WordPress site, provided the Archivist with the site map url, http://www.pwofc.com/ofc/sitemap.xml, and the archive crawler conducted some more runs. The 13th run of 2020, on 22nd June, seemed to have been successful: the archived site looked just as it should. I then set about doing a full check of the archived site against the current live site to ensure that all the images were present, and that the links were all in place and working. The findings are listed below:

  • External links not collected: Generally speaking, the UKWA archive had not included web pages external to pwofc.com. Instead, when such a link is selected in the archive one of the following two messages is displayed: either “The url XXX could not be found in this collection” (where XXX is the URL of the external site); or “Available in Legal Deposit Library Reading Rooms only”. However, in at least two instances the link does actually open the live external web page. I don’t know what parameters produce these different results.
  • Link doesn’t work: For one particular link (with the URL ‘http://www.dpconline.org/advice/case-notes’), which appears in two separate places in the archive, there is no response at all when the link is clicked.
  • Home link doesn’t work on linked internal pages: links to internal pages within pwofc.com all work fine in the archive. However, the Home button on the pages that are displayed after selecting such links, doesn’t produce any response.
  • Image with a link on it not displayed: The pwofc.com site has two instances of an image with a link overlaid on it. The archive displays the title of the image instead of the image itself.

On the whole, the archive provides quite a faithful reproduction of the site. However, the fact that no information was collected for most external web pages, and no link to the external live web pages is provided either, is quite a serious shortcoming for a site like pwofc.com which has at least 26 such links. Having said that, the archive aims to collect all the web sites on its books at least once a year; and all the different versions appear to be accessible from a calendared list of copies; so, should one be able to get on the UKWA roster, this would appear to be quite an effective way to backup or archive a blog.

Getting an HTTrack Copy

HTTrack is a free-to-use website copier. Its web site provides the following description:  “It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.”

I downloaded and installed HTTrack very quickly and without any difficulty, then I set about configuring the tool to mirror pwofc.com. This involved simply specifying a project name, the name of the web site to be copied, and a destination folder. The Options were more complicated and, for the most part, I just left the default settings before pressing ‘Finish’ on the final screen. There was an immediate glitch when I discovered that I had not provided the full web address (I’d specified pwofc.com instead of http://www.pwofc.com/ofc/); but having made that change, I pressed ‘Finish’ again and HTTrack got on with its mirroring.  Some 2 hours 23 minutes and 48 seconds later, HTTrack completed the job, having scanned 1827 links and having copied 1538 files with a total file size of 212 Mb.

The mirroring had produced seven components: two folders (hts-cache and www.pwofc.com) and 5 files (index, external, hts-log, backblue and fade).  The hts-cache folder is generated by HTTrack to enable future updates to the mirrored web site; the external file is a template page for displaying external links which have not been copied; backblue and fade are small gif images used in such templates; and the log file records what happened in the mirroring session. The remaining wwwpwofc.com folder and index file contain the actual contents of the mirror.

On double clicking the Index file, the pwofc.com home page sprang to life in my browser looking exactly the same as it does when I access it over the net. As I navigated around the site the internal links all seemed to work and all the pictures were in place, though the search facility didn’t work. External links produced a standard HTTrack page headed by “Oops!… This page has not been retrieved by HTTrack Website Copier. Clic to the link below to go to the online location!” – and indeed clicking the link did take me to the correct location (I believe it is possible to specify that external links can also be copied by setting the ‘Limit’ option ‘maximum external depth’ to one, but my subsequent attempt to do so ended with errors after just two minutes; I abandoned the attempt). The only other noticeable difference was the speed with which one could navigate around the pages – it was just about instantaneous. From this cursory examination I was satisfied that the mirror had accurately captured most, if not all, of the website.

An inspection of the log file, however, identified that there had been one error – “Method Not Allowed (405) at link www.pwofc.com/ofc/xmlrpc.php (from www.pwofc.com/ofc/)”. According to the net, a PHP file ‘is a webpage that contains PHP (Hypertext Preprocessor) code. … The PHP code within the webpage is processed (parsed) by a PHP engine on the web server, which dynamically generates HTML’. Interestingly, I wasn’t aware of having any content with such characteristics, but, on closer inspection of the files in my hosting folder, I found I had lots of them – probably hundreds of them. I tried to figure out what the error file related to but had no clue other than its rather striking creation date – 23/12/2016 at 00:00:00 – the same date as several of the other PHP files. I had not created any blog entries on that day, so my investigation ground to a halt. I don’t have the knowledge to explore this, and I’m not prepared to spend the time to find out. My guess is that the PHP files do the work of translating the base content stored in the SQL database into the structured web pages that appear on the screen. I’m just glad that there was only one error – and that its occurrence isn’t obviously noticeable in the locally produced web pages.

The log file also reported 574 warning which came in the form of 287 pairs. A typical example pair is shown below:

19:31:13        Warning:    Moved Permanently for www.pwofc.com/ofc/?p=987 19:31:13        Warning:    File has moved from www.pwofc.com/ofc/?p=987 to                                           http://www.pwofc.com/ofc/2017/06/29/an-ofc-model/

I tried to find a Help list of all the Warning and Error messages in the HTTrack documentation but it seems that such a list doesn’t exist. Instead there is a Help forum which has several entries relating to such warning messages – but none that I could relate to the occurrences in my log. As far as I can see, all of the pages mentioned in the warnings (in the above instance the title of the page is ‘an-OFC-Model’), have been copied successfully so I decided that it wasn’t worth spending any further time on it.

All in all, I judge my use of HTTrack to have been a success. It has delivered me a backup of my (relatively simple) site which I can actually see and navigate around, and which can be easily zipped up into a single file and stored.

A Backup Hosting Story

In the last few days I’ve been exploring making backup copies of this pwofc Blog using the facilities provided by the hosting company that I employ – 123-Reg. It was an instructive experience.

When I first set up the Blog in 2012 I had deliberately decided to spend a minimal amount of time messing around with the web site and to focus my energies on generating the stuff I was reporting in it. Consequently, most of my interactions with the hosting service had involved paying my annual fees, and I had little familiarity with the control panel functions provided to manage the web site. In 2014, I had made some enquiries about getting a backup, and the support operation had provided a zip file which was placed in my own file area. Since then I had done nothing else – I think I had always sort of assumed that, if something went wrong with the Blog, the company would have copies which could be used to regenerate the site.

However, when I asked the 123-Reg support operation about backups a few days ago, I was told that the basic hosting package I pay for does NOT include the provision of backups – and the company no longer provides zip files on request: instead, facilities are provided to download individual files, to zip up collections of files, and to download and upload files using the file transfer protocol FTP. Of these various options, I would have preferred to just zip up all the files comprising pwofc.com and then to download the zip file. However, the zipping facility didn’t seem to work and, on reporting this to the 123-Reg Support operation, I was told that it was out of action at the moment… So, I decided to take the FTP route.

I duly downloaded the free-to-use FTP client, FileZilla, set it up with the destination host IP Address, Port No, Username and Password, and pressed ‘Connect’. After a few seconds a dialogue box opened advising that the host did not support the secure FTP service and asking if I wanted to continue to transfer the files ‘in clear over the internet’. Naturally I was a little concerned, closed the connection, and asked 123-Reg Support if a secure FTP transfer could be achieved. I was told that it could be and was given a link to a Help module which would explain how. This specified that a secure transfer requires Port 2203 to be used (it had previously been set to 21), so I made the change and pressed ‘Connect’ again. Nothing happened. A search of the net indicated that secure FTP requires a Port No of 22, so I changed 2203 to 22 and, bingo, I was in.

FileZilla displays the local file system in a box on the left of the screen, and the remote file system (the pwofc.com files in this case) in a box on the right. Transferring the pwofc files (which comprise a folder called ‘ofc’, a file called ‘index’, and a file called ‘.htaccess’) was simply a matter of highlighting them and dragging them over to a folder in the box on the left. The transfer itself took about 12 minutes for a total file size of 246 Mb.

Of course, the copied files on my laptop are not sufficient to produce the web pages: they also require the SQL database which manages them to deliver a fully functioning web site. If you double click the ‘Index’ file it just delivers a web page with some welcome text but no links to anything else. Hence, these backup files are only of use to download back to the original hosting web site for the blog to be resurrected if the original files have become corrupted or destroyed. I guess they could also, in principle, be used to set up the site on another hosting service – though I have no experience of doing that.

Of course these experiences only relate to one customer’s limited experience of one specific hosting service and may or may not apply generally. However, they do indicate some general points which Blog owners might find worth bearing in mind:

  • Don’t assume that your hosting service could regenerate your Blog if it became corrupted or was destroyed – find out what backup facilities they do or don’t provide.
  • Don’t assume that all the functions provided by your hosting service work – things may be temporarily out of action or may have been superceded by changes to the service over the years.
  • Remember that a backup of the website may be insufficient to regenerate or move the Blog – be clear about what additional infrastructure (such as a database) will be required.
  • If you want to be able to look at the Blog offline and independently of a hosting service, investigate other options such as creating a hardcopy book, or using a tool such as HTTrack (which is discussed in the following entry).

ST’s Alternative Approaches

About 6 weeks ago (on 6th March), Sara Thomson of the Digital Preservation Coalition kindly spent some time on the phone with me discussing the archiving of web sites. I wanted to find out if there were any other solutions to the ones I had stumbled across in my brief internet search some 16 months ago. Sara suggested 3 approaches which were new to me and described them as follows in a subsequent email:

  1. UK Web Archive (UKWA) ‘Save a UK Website’: https://beta.webarchive.org.uk/en/ukwa/info/nominate Related to this – two web curators from the British Library (Nicola Bingham and Helena Byrne) presented at a DPC event last year discussing the UKWA, including the Save a UK Website function. A video recording of their talk along with their slides (and the other talks from the day) are here: https://dpconline.org/events/past-events/web-social-media-archiving-for-community-individual-archives
  2. HTTrack: https://www.httrack.com/  I gave a brief overview of HTTrack at that same DPC event last year that I linked to above. I have also included my slides at an attachment here – the HTTrack demo starts on slide 15.
  3. Webrecorder: https://webrecorder.io/ by Rhizome. Their website is great and really informative, but let me know if you have any questions about how it works.

Shortly after this, I followed the link that Sara had provided to the UKWA nomination site and filled in the form for pwofc.com. On 14th March I got a response saying that the British Library would like to archive pwofc.com and requesting that I fill in an on-line licence form which I duly completed. On 16th March I decided to explore the contents of the UKWA service and found it collects ‘millions of websites each year and billions of individual assets (pages, images, videos, pdfs etc.)’. I started looking at some of the blogs. The first one I came across was called Thirteen days in May and was about a cycling tour – but it seemed to lack some of the photos that were supposed to be there. The next two I looked at, however, did seem to have their full complement of photos; and one of them (called A Common Reader) had a strangely coincidental entry about ‘Instapaper’ which provides what sounds to be a very useful service for saving web sites for later reading. It looks like the UKWA does an automated trawl of all the websites under its wing at least once a year, so I guess that, as a backup, it should never be more than a year out of date.

An hour after completing this exploration, I got an email confirming that the licence form had been submitted successfully and advising that the archiving of pwofc.com would proceed as soon as possible but that it may not available to view in the archive for some time due to the many thousands of web sites being processed and the need to do quality assurance checks on each. Since then, I’ve been checking the archive every now and again, but pwofc.com hasn’t emerged yet. When it does, it’ll be interesting to see how faithfully it has been captured.

Regarding the other two suggestions that Sara made, I’ve decided to discount Webrecorder as that entails visiting every page and link in a website which would just take too much time and effort for pwofc.com. However, I’m going to have a go at using HTTrack, and I’m also going to try and get a backup of pwofc.com from my web hosting service. Having experienced all these various archiving solutions, there’ll be an opportunity to compare the various approaches and reach some conclusions.

A few insights and conclusions

The sort-out of my publications, reports and CSCW proceedings (broadly categorised as ‘things I had created and done’) confirmed that I have a particular interest in material I had created or had made significant contributions towards. It was undoubtedly rewarding to revisit the material – though I wouldn’t anticipate doing it again very often. In fact, it made me realise that just having the knowledge that all the material is available and easily accessible, is itself a very satisfying and reassuring thought. Of course, having a complete collection of work documents to draw on when assembling full sets of my publications and reports, was slightly unusual; most people might only have partial sets depending on what particular material they had saved in the course of their careers.

The items included in the category ‘things I had created and done’ are only a subset of all the work items I’ve kept over the years. I have previously digitised over 80 of my work book collection as described in the Electronic Bookshelf journey; I’ve created story boards for 30+ work books that I regarded as special in some or other; and my PAW-PERS collection of memorabilia contains aver 120 other items in the following additional categories:

Formal job documents (offer letters, job specs, pension info, pay slips etc.): I originally kept these for reference; but now, of course they have become very informative pieces of memorabilia.

Company information (brochures, newsletters etc.): Many of these are well presented documents providing detailed information about the organisations I worked for.

Recognition objects (certificates, long term service awards, contract win artefacts etc.): I didn’t keep the originals of certificates confirming I had completed in-company courses as they didn’t seem very significant; however I do value a certificate from my professional body and keep it framed on my study wall. I’ve kept the cut glass paperweight celebrating a contract win, and the cut glass bowl for long service, which are both in our crystal cabinet; though they are retained more because of their looks than as reminders of work. I also value the long service domino set (very nice in a large wooden box) which I chose deliberately because I knew I would want to keep it long term for both its utility and its looks.

People I worked with (humorous documents, social gatherings, leaving cards, etc.): These are generally mementos of the people I worked with and the activities we did together.

Associated activities (company sports and social clubs, trade unions, professional bodies etc.) These are mementos of my activities in organisations associated with my work, and they are surprisingly prominent in my collection. I guess they such organisations have played a significant part in my working life over the years.

In thinking back about what I’ve done with all these different sets of work items, I was reminded of how sometimes particular items have corrected a fact that I had mis-remembered. For example, for several years, I believed that I was the instigator of the Alvey project I was involved in (Cosmos). However, in trawling through my documents to create one of the Electronic Story Boards, I discovered that it was a colleague who had been the instigator and I was a very ardent subsequent advocate. I guess that often we remember things in the way we would like them to be, not necessarily the way they actually were. Hence, having some documentation or other artefact can cast a truer light on the past. However, it must be remembered that the documents we have may only be a subset of all the relevant documents that were produced; and/or that their contents may just be reflecting the biases of the authors. Hence, whatever the nature of our ‘record’, be it memory, or a selection of the relevant items that you have, or all of the items that you have, or, indeed, all the relevant items that exist in the world, we should always remember that it may not be the whole story.

As with my non-work mementos, most of these work items have been digitised and the originals disposed off; though a small number, which I decided are special in some way or other, have been retained in physical form. In this respect, these work items are very similar to other types of memento. However, there is one very significant difference: many of these work items will not be recognisable by my wife and family. That’s because my work took me to a different place and a different life for a part of each day – as it does for very many people; hence, work mementos are likely to mean more to the individual than to family, relatives and friends. Consequently, I suspect that such collections are even less likely than other types of mementos to be retained and maintained by future generations of the family. I believe this to be almost certainly true for physical work mementos (I can’t see people hanging onto bulky books and papers which mean little to them). However, I’m less sure about digital collections which, in principle, are much less obtrusive and much easier to keep in the short term, but do rely on some care and attention as computers are replaced and technology advances. In fact, this uncertainty must apply to all informally-held digital collections – too little time has passed so far to be able to discern if such material is being passed down the generations. Interestingly, I do see the possibility of Artificial Intelligence playing a role in managing such material, and this could significantly affect how much of its digital history a family may have access to in the future.

In summary, this short review of my work mementos seems to have thrown up the following insights:

  • Categories of work mementos include; things the individual has created and done, work books, formal job documents, company information, recognition objects, people the individual has worked with, and associated activities.
  • While work mementos are similar to other type of mementos, they do provide reminders of a part of life that is often very personal to the individual and often separate from family life. In as much as work is often done with other people, it is almost like a parallel life with a separate family; hence, it generates a separate set of mementos.
  • Work concerns making, creating and doing things; and if individuals are in any way proud of what they have done, then they may well be keen to retain examples of what they achieved and to inspect it from time to time.
  • It is very satisfying and reassuring to know that examples of what you have produced at work, are safely stored away and accessible when you want them. Just being able to have those thoughts may be as rewarding as actually looking at the material.
  • Our ‘record’ of events is only as good as the material we have, be it memory or a few relevant artefacts, or lots of artefacts. We should always remain open to the possibility we don’t have all the facts.
  • Work mementos are probably less likely to be passed on down family lines than other types of mementos.
  • Physical work mementos are less likely to be passed on down family lines than digital work mementos.
  • Artificial intelligence may result in many more digital mementos of all types being passed on reliably down the generations.

The power of the shower

This morning I finished digitising the CSCW conference proceedings, including the creation of a bookmarked contents list for each one (rather a tedious process), and downloaded the PDFs to the Sidebooks app on my iPad. Although this was a largely mundane exercise, I was stimulated from time to time when I came across author and project names that I had become familiar with while I was working in the CSCW field in the early 1990s. Reflecting on this in the shower this morning, I remembered my conclusion a week or so ago that, for memorabilia, the journey was often better than the destination. Suddenly, in a deluge of shower illumination, I realised that it was the remembering that had been fulfilling; and that the act of remembering is an act of doing; and that ­any thoughts about memorabilia – even just pondering the fact that they are where you put them and can be accessed when you wish – are ‘doing’ acts. It is when items cease to stimulate any thought or interest that they become worthless. Conversely, while items of memorabilia still inspire some physical or thinking action, they still have some value for the individual.

I continued to think through the meaning of this insight and concluded that it has significant implications for why we keep things; and that it will necessitate the adjustment of parts of the OFC tutorial text (though I must add that I’m sure these ideas are not new – but I have the luxury of not having to trawl the huge literature to see what has been documented before: that is the job of academics who should be appropriately paid to do a very difficult, laborious and hugely important job). This experience has cemented my belief in the innovative power of the shower, and makes me wonder just how important ablutions have been to the development of modern civilisation over the centuries.

Proceeding with proceedings

The final stage in this sort out of work books/documents concerns the seven volumes of proceedings of conferences on Computer Supported Cooperative Work that appear in the picture below.

The first of these events was held in Austin, Texas, in 1986 and I was there to experience the excitement of a new field being born. It was a field which embraced the Cosmos project that I was participating in at the time, and a field in which I actively worked for the following five years. In 1989 I was to organise the first European CSCW conference, EC-CSCW89. Hence, I am particularly attached to the proceedings of both the 1986 and 1989 events, though the 1986 volume is in poor condition with the cover having come away, and the 1989 volume is spiral bound. So, I’ve decided to turn both volumes into hardback books and to incorporate some additional pages of related material in the process.

As for the remaining 5 volumes, although I attended all but one of the events they document, I am unlikely to want to look at them in the future so have decided to digitise them and include them in my electronic bookshelf collection so that at least the covers and spines will be visible for decor and I’ll have the comfort of knowing that the text is immediately accessible should I want to take a look at it.

Careering through time

A trawl through my CV to identify major pieces of work I’d produced prompted the discovery of some memorable material – and some I’d forgotten about.  The end result was 98 documents all neatly packaged as numbered PDFs and recorded in an index. However, while I expected them to be mostly reports I’d written, the documents turned out to be much more diverse than that. This was because some of my work assignments had been rather open-ended in subject matter and long term in timescale, so there was not necessarily one or more reports that could represent my efforts. Instead, I started looking for documents that would tell a story about my involvement and what had happened. Hence, of the 98 documents, only about half were substantive reports written entirely by myself or in conjunction with others. A further 20 were shorter documents written by myself. The remainder were short documents written with others (16); documents providing context for work I was involved in (17); summaries of what was going on or handover reports when I left an assignment (6); newsletters produced by myself (3); and, finally, two documents were hardcopies of special editions of HCI-related magazines which were on my bookshelf and which I was loath to part with.

It was hardly surprising that such a range of documents emerged since the memorable aspects of work involve more than just one’s own efforts, and usually includes what we do with others and what is going on about us. In fact, I had to be pretty selective in my choices since my work document collection includes most of the items that I received and produced; and the index to the collection makes it very easy and quick to list all the documents related to a particular topic or, in this case, assignment.

The process I went through for each element of my CV was to first search for any specific document I remembered and wanted to take a copy of. Then, to produce a list of all the index entries related to that assignment and to go through the list (sometimes including a hundred or more line items) and to note the reference number of any that I wanted to look at further. This was a very good test of my newly revamped filing system in which documents are no longer stored in a document management system but simply reside in Windows folders named with the appropriate reference number (see entry in Personal Document Management Journey).  It proved very easy and quick to find documents and to open them up  – which was a good job because, inevitably, I was having to look at several different documents before making a decision about which one to go for.

I had decided before the start of this process that I only wanted to keep electronic copies and that the only hardcopies I might keep would be the fourteen that were already on my book case. That is how it turned out, though I did throw out one of the fourteen – a rather thick spiral bound item which consisted largely of a user manual. None of these items are sturdy enough to stand up on their own – they are either folded papers, stapled papers, spiral bound documents, or magazines. So I acquired a large portfolio box from TK Max inscribed with ‘Around the World’ on the spine – a highly visible distinguishing feature. Four of the hardcopies were in a ring binder, and these too were placed in the portfolio box, together with a printout of the index,  so that all this material is now in one place and not flopping around on the shelves.

Regarding the electronic versions, in some cases they were stored in my files as PDFs, so little further work was required. However, many were stored as either multi-page TIF files or as MS Office Word or Powerpoint files. These were converted to PDFs using the eCopy PDF PRO application. For the larger documents I created linked content lists in the form of sets of bookmarks, numbered each file, and moved a copy of each file into a special folder setup to be the master of this set of material. However, my preferred way of viewing such material is within the Sidebooks application on the iPad. Files can be transferred to Sidebooks using Dropbox and this example of system integration works brilliantly. It only took a few seconds to copy all the files (362 Mb in total) and to place them into the Dropbox folder on my laptop. They showed up in the Sidebooks Dropbox area within a minute or two; and then each one was selected in turn and took just a few seconds – none more than 10 seconds – to download into Sidebooks where they can be displayed as either a text list of file names or as variously sized thumbnails as shown below.

The exercise is now complete. I don’t know how often I’ll be looking at the documents – probably not very much – maybe never; but, just knowing they are there is a reassuring feeling – not having them there might generate a little nagging wish. Furthermore, going through all this material and being reminded of what I have done has been a very fulfilling experience. As I keep being reminded, in the matter of memorabilia the journey is often better than the destination.

Reports and the satisfaction of achievement

Part of a shelf in my bookcase is taken up with hardcopies of a few reports and documents that I wrote or had a hand in writing. I’ve kept these because the hardcopies were available and I feel they are substantial pieces of work (they fall into the category of ‘Items that the owner has written, produced, assembled or made a significant contribution to’ as documented in the paper produced in the Digital Age Artefacts journey). However, these are just items that I came across in the course of doing digital preservation work on my work file collection; I have produced many other reports and documents over my working career, some of which I feel are just as substantial.

In thinking through what I want to do with this material, I’ve concluded that I don’t really want the hardcopy except in particularly special cases. However, I would like to have electronic versions of all the major reports I’ve produced in order to be able to look through them from time to time, to see how my ideas developed, and to enjoy the satisfaction of achievement. So, my plan is to use my CV to guide me through the various companies I’ve worked for and assignments I’ve undertaken, and to search my work files for reports I may have produced in each one. I’ll take an electronic copy of some of those that I find and give each one a serial number in the file title. The files will be recorded in an index in chronological order, and stored on my iPad. At the end of the process, any hardcopies that I’ve decided to keep will be placed into a box file on the bookshelf.