A subjective halfway view

I’ve just acted as subject in our first investigation into the memorability and impact of information nuggets. The nugget material, in this case, was mindmaps of key points in nineteen esoteric-type books which explore perceived unresolved mysteries from ancient Egyptology to modern secret societies.  I discovered that I could remember almost none of the points presented to me and was unable to link any of them to a particular development in my thinking. My immediate reaction to this disappointing – but probably to be expected – finding was that these are not actually nuggets of information but instead are just parts of a summary of each book.

However, on reflection, I’ve reversed that view. After all, when I was picking out the points as I read, I must have thought each of them to be significant – otherwise I wouldn’t have picked them out. So, how is a key point in a book different from a key point in, say, a five page article? Well there are some obvious differences like the book is a lot bigger and has a lot more stuff in it – most of which I’m not familiar with AT ALL. Unless one has a photographic or otherwise superb memory, you wouldn’t expect to remember everything in such a book after one quick casual read. Of course, I have the books on my bookshelf and have the look of each one locked in my memory with some ideas of what it’s about. However, this is the case because there are just a few hundred of them, and they have a rich content and the covers and spine usually have distinctively memorable images. In contrast, the articles and documents in my work collection (which are due to be investigated next), are much more numerous; are hidden away in my computer (with just a few in my physical archive box); and they all look very similar and have very few distinctive markings.

I guess I’ve expanded my thinking this morning about all this. However, I’m only the subject and we’re only half way through the overall exercise. The interesting bit will be what the researcher concludes from it all.

Nugget Investigation Plans

This entry has been jointly authored by Peter Tolmie and Paul Wilson

A key part of the motivation for keeping texts is that they contain information – nuggets – that have some kind of future value. It’s rare for a whole document to be seen as having such value in its entirety. Nuggets are more often certain sentences or paragraphs. The question is, what happens to that value over the lifespan of an archive. Does the value of specific nuggets persist? Does it change? Does it grow or reduce in relevance? Does it become eroded to the point of obsoletion? And, given the same documents at some later point in time, would the same nuggets be identified, or would something else stand out as being more important?

To assess the use and impact of identifying nuggets, three separate investigations will be conducted using documents containing nuggets that were collected over the last 38 years. In all cases, investigations will be undertaken with reference to the individual who originally identified the nuggets. Two of the investigations will focus on the individual reflecting himself upon his use of nuggets. The third investigation will focus upon extracting information about the nuggets and their use through discussion.

The investigations will be performed using two separate sets of material:

Set 1: A set of 19 MindMaps relating to esoteric books.

Set 2: Documents from the PAWDOC collection.

Approach

Investigation 1: The first investigation will be a written exercise using the set of Mindmaps and will attempt to assess:

a) Whether individual nuggets can still be recalled;
b) Whether the sources of those individual nuggets can still be recalled;
c) How significant specific nuggets are considered to be by the individual concerned;
d) What other nuggets, if any, are associated with the one in question;
e) What specific concept(s) individual nuggets are believed to have contributed towards.

Investigation 2: The second investigation will also be a written exercise and will explore what nuggets, if any, can be identified by the individual concerned in unmarked versions of randomly selected documents from the PAWDOC collection where nuggets have previously been identified. From this exercise it is hoped to deduce:

i) Whether nuggets lose their vitality over time, with new concepts being derived and becoming established.
ii) Whether looking at the documents anew, within a distinct context, will lead to the identification of different nuggets with different relevance to the individual.

Note that the first 2 investigations are interleaved so as to be able to maximise the amount of discussion possible in the concluding interview.

Investigation 3: This investigation will be conducted as an interview and will explore similar themes to those tackled in the prior investigations but in a more open-ended way, so that the reasoning involved in retaining documents for the sake of specific features can be examined in greater detail. The interview will also seek to examine in detail topics that have spanned all three investigations.

Method

Investigation 1: Nineteen nuggets, each from separate MindMaps, will be randomly selected from the MindMaps of books on esoteric subjects. Randomisation will be achieved by creating a template that divides an A4 page into 24 numbered areas of equal size. A random number generator will then be used to create nineteen separate numbers ranging from 1 to 24. The Researcher, Peter Tolmie, will generate the numbers and use each one in conjunction with one of the MindMaps and the template to identify the nugget(s) present in that location. If more than one nugget occurs in that location, the topmost one will be selected.

The Researcher will assemble the nineteen nuggets and send them to the PAWDOC Owner and Subject, Paul Wilson, who will be asked to provide a written answer to the following questions in relation to each one:

  • Do you remember where this nugget came from?
  • Do you remember why you might have marked this out as a nugget?
  • How important do you consider this nugget to be now?
  • Do you remember what other nuggets were associated with this one?
  • How did you use this nugget and what other things did you develop on the back of it?

The Subject will then be sent the original MindMaps in which the nuggets appeared, and will be asked the following questions for each one:

  • Do you remember more about why you marked this as a nugget now?
  • Now you see the MindMap it came from and the other nuggets it was associated with, do you see it as more or less important?
  • Do you remember anything more now about how you used it or what other ideas it may have contributed to?

The Subject will return his responses to the Researcher, who will categorise them and place the results in an analysis spreadsheet. The overall analysis and specific instances regarding the Subject’s responses and reactions will be written up as the findings for Investigation 1.

Investigation 2: Nineteen separate documents, each containing nuggets, will be randomly selected from documents included in the PAWDOC collection between 1981 and 2011. Randomisation will be achieved by using a random number generator to create numbers relating to the 16925 entries in the PAWDOC Index sequenced in the order in which they were created.  The Researcher will generate the numbers and use his copy of the Excel version of the Index to identify the first nineteen of the entries for which an electronic file exists (some entries just contain the information they relate to and have no associated documents – these are identifiable by the contents of the Movement Status field); and for which the associated electronic files are likely to contain highlighted nuggets (items such as, for example, Health & Safety booklets, are unlikely to contain highlighted nuggets). He will send the list of numbers to the Subject who will open the folder labelled with each particular number, take a copy of the first file that appears in each folder and will send the files back to the Researcher. The Subject will take as little notice of the file titles as possible by setting up the Windows File Explorer window to display only the beginning of the file name so that only the Reference Number, which is always at the start of the file name, is visible. Should the Researcher deem any of these files to be unsuitable, he will send additional numbers to the Subject until a satisfactory set of nineteen documents that contain nuggets has been obtained. Then, using a cropping tool or by obtaining a clean copy of the document from elsewhere, he will send clean unmarked copies back to the Subject. The Subject will read the documents, mark up any text that he considers to be nuggets, and will send the documents back to the Researcher.

The Researcher will then make a comparison between the original nuggets identified and the new nuggets identified and record the results in the analysis spreadsheet. The overall analysis will be written up as the findings for Investigation 2.

Investigation 3: Nineteen nuggets, each from a separate document, will be randomly selected from randomly selected documents placed in the PAWDOC collection between 1981 and 2011. Randomisation will be achieved by using the same procedure employed in Investigation 2.  The Researcher will take the first nineteen suitable documents that contain nuggets and randomly select one specific nugget from each document using a random number generator. The nineteen nuggets will be assembled together and presented to the Subject who will be asked to answer a similar set of questions to the first set of questions in Investigation 1. These questions will form the rough frame for the first part of an interview in which the various nuggets will be discussed.

The Subject will then be shown the original documents in which the nuggets appeared, and will be asked the following questions for each one, which will form the second part of the interview mentioned above:

  • Do you remember more about why you marked this as a nugget now?
  • Now you see the document it came from and the other nuggets it was associated with, do you see it as more or less important?
  • Do you remember anything more now about how you used it or what other ideas it may have contributed to?
  • Are there things in the original document that you didn’t mark as a nugget at the time that you would mark as a nugget now? If so, why?

A third and final part of the interview will explore the responses from across all of the assessments in a more open-ended fashion to generate deeper insights and discussion.

The Researcher will then transcribe the interview, categorise the responses and place the results in the analysis spreadsheet. The overall analysis and specific instances regarding the Subject’s responses and reactions will be written up as the findings for Investigation 3.

Conclusions: The Researcher will use the findings from all three Investigations to write the overall conclusions of the investigation.

Nuggets about Nuggets across 38 years

To review what I’ve done on the topic of information nuggets, I’ve been trawling through the PAWDOC Index and files. The earliest example of sidelined text that I can find in my document collection was from October 1981 when I was working at the National Computing Centre. I can’t remember why I started to do it – but it may well have been prompted by the method that NCC’s Chief Editor, Geoff Simons, used to construct his books. He explained to me that he read everything about a subject, identified key points and put them on Post-it notes which he stuck on the wall. When he was ready to write, he rearranged the Post-its into separate sections and in sequence within the sections – and I saw examples of this in his office. At some point I started to employ this technique to construct the best practice books I wrote at NCC – but using the word processor on our new Zynar Office System to assemble and organise the key points. Sidelining text was an obvious way to identify key material to feed into that process.

Around 1994 I started talking with City University academics Clive Holtham and David Bawdon with a view to undertaking a joint project on ‘The Paperless Office Worker’. A key strand of this work would involve me digitising my PAWDOC collection. Extracts from emails between us in the early part of 1994 included the following:

Email from Clive Holtham: ‘….I don’t record each document as Paul does, but file at a quite detailed level. What I am conscious of is how much I forget about what I already have. The equivalent of underlining is important – we need to consider something more than keywords to store with each piece.’

Reply from Paul Wilson ‘…I agree with needing to deal with the underlining problem. I have sidebars on most of my material – they are the information nuggets; but I don’t know how much use they would be out of context.’

Email from Paul Wilson: ‘… with reports , papers etc. I usually mark the nuggets of info within them – presumably these are the bits of information I really want.’

The collaboration with City University proved very productive: Clive Holtham introduced me to a product Manager in Fujitsu who loaned me a scanner; and to the owner of a small company called DDS who loaned me the Paperclip document management software. Soon I was scanning my existing paper documents and new ones as they arrived. In January 1997 I issued my 3rd briefing note on these activities and included the following towards the end of the four page document:

Despite the close relationship between filing and information use, contemporary filing systems provide little other than title and index fields to support the knowledge acquisition, synthesis and use process. Filing systems, it seems, are there just to store items and to aid their retrieval. Unfortunately, the personal knowledge acquisition, synthesis and use process is not supported adequately outside filing systems either. Some standalone packages do exist, but they are not intended to be used in a day to day manner for personal knowledge acquired in documents, electronic files and other artefacts. In fact, even the need for such support is not widely recognised.

It is not yet clear to me what support could be most beneficial. However the clues are littered throughout the practices of knowledge workers like myself. For example, whenever I read articles, papers and reports I always mark the good bits – the nuggets of information. These are key points which I particularly want to augment the knowledge in my brain. Sometimes when I have been researching a topic I collect together all the nuggets I can find, categorise them and reorder them, and synthesise a new view of the topic in question. Unfortunately, like the magnesium nodules on the floors of the deep ocean, huge numbers of nuggets now litter my filing system unseen and inaccessible. I hope they are in my brain and that they have been used to develop my current state of thinking – but I’m not so sure that their huge potential has been fully exploited.

For my filing activities to really start adding value I need tools which can record those nuggets as I consume and index each item, and which can enable me to reorganise those nuggets, add more nuggets, and synthesise new nuggets, in the process of actively developing my ideas. Such tools would, of course, maintain the links to the original source material stored in my files. And my files would become a combination of original source material and the representations of my developing thoughts and ideas.

Now that I am confident that I have the paper scanning and electronic file indexing activities under reasonable control, it seems high time to start addressing the critical area of information use and its role in knowledge management.

These are the earliest mentions I can find of information nuggets in the PAWDOC collection – and they give no indication of where I picked up the concept from. In fact I can only find one published mention of the term and that was in the Lotus Notes-oriented magazine ‘Groupware and Communications Newsletter’ from April 1998 in which one Ted Howard-Jones gave a brief description of a service implemented by a major financial institution to capture competitive information. He wrote:

‘Called Report-It!, this service captures knowledge using a secure voice-mail system and delivers categorised information directly to the desktops of office-bound managers and competitive information professionals. These nuggets of professional information are disseminated via Notes.’

The use of the term Knowledge in this article, and in my briefing paper mentioned above, reflected the fact that, in the late 1980s, the term Knowledge Management started to became fashionable and by the late 1990s had become a holy grail of IT Professionals, Management Consultants, and Academics. The first mention of the term in the PAWDOC Index appears in 1990, and occurs in a further 137 Index entries from then to 2016. The company I worked for (Computer Sciences Corporation – CSC) was a global computer services organisation with tens of thousands of employees worldwide. Its involvement in the Knowledge Management topic came from three angles: first, its clients started to ask about it and how to do it; second, its consultants and salesmen saw it as a potential source of revenue; and third, its employees and management began to think that they needed it internally to improve the effectiveness of the business. Hence, as a consultant in the UK end of the business, I was aware of or got involved in:

  • A number of initiatives to develop an offering or information for clients, including:
    • the development of a KM service by CSC Netherlands in 1990;
    • discussions with CSC UK Management Consultants who were developing KM propositions, in 1996-8;
    • the definition and design of a KM service by CSC UK personnel to address opportunities in a number of clients including ICI Paints, LUCAS Engineering, John Menzies and United Distillers, in 1996-7;
    • the publication a CSC Research Services Foundation report on KM in 1998;
    • news of CSC’s Global Knowledge Management services, in 2001.
  • The development of such systems for clients (primarily using web pages on intranets), including:
    • a presentation to ICI Paints in 1992;
    • the development of systems for KM and for a web-based ‘Gazateer’ of all KM, architecture and other organisational information, for the Nokia SCC – a new organisation being set up by CSC UK for Nokia, in 1997;
    • the design and population of a web-based KM system for Dupont Agriculture’s architectural components, in 1998.
  • the development and use of internal CSC solutions, including:
    • attendance at internal CSC workshops on developing an organisational learning infrastructure in 1996;
    • the development of an improvement process for CSC’s new application development organisation that was being designed and built from scratch, in 1998;
    • the design of a practical KM programme for CSC UK’s reorganised Consulting & Systems Integration unit, in 1999;
    • knowledge management work being done for BAE, in 1999;
    • an internal Community of Interest on the subject of Personal Knowledge Management, in 2001.

Of course, my rather lowly personal filing perspective had to be rapidly expanded as I entered the Knowledge Management (KM) arena to accommodate both high level Management Consultancy notions of ‘Intellectual Capital’ and the distinction between Knowledge and Information; and the practical need to derive benefits from an investment in KM by effectively sharing the knowledge that had been acquired. Indeed, one of my contributions to the internal Community of Interest mentioned in the last bullet point above seems to herald a change in my thinking. My opening sentence reads:

‘Since collaborating with everyone in this shared space I’ve had my eyes opened to the concept that KM is all about enabling people to find things and work together, as opposed to the idea that KM is all about nailing down bits of knowledge and providing it to people. I realise that there is significant crossover between the two approaches – but nevertheless giving priority to one or the other will result in significantly different activities.’

During the 1990s I learnt a great deal about what people thought Knowledge Management was, and also about the powerful potential of the new web technology to support KM. However, throughout this period I don’t recall any specific conversations or documents about dealing with underlined, sidelined, or highlighted text. Indeed, by the time I completed the draft of the paper summarising my PAWDOC findings in June 2001, it seems that my ideas on the subject had advanced no further than that reported above. The paper was published in the journal Behaviour & Information Technology (BIT), and addressed the subject as follows in the section on ‘Areas of Investigation and Summary Findings’:

Q27. How can an electronic filing system be used to develop and use knowledge?

  • Include substantive information in the index entries, for example phone numbers, book references, and expense claim amounts.
  • Identify the nuggets of information (i.e. the valuable bits) when you first read a document
  • Capture and structure the nuggets into the overall nugget-base at the same time as indexing the item

Status: ideas formed

Q28. What is the best way to capture and structure information nuggets?

Probably by using a Concept Development tool. Some initial prototyping has been done using the Visual Concepts package and the eMindMaps package.

Status: ideas formed

Q29. Is it feasible and practical to capture and structure information nuggets as well as indexing items?

Status: not started

Q30. Is it worthwhile building and developing an information nugget base?

Status: not started

The Concept Development prototyping mentioned in the answer to Q28 above probably started in April 2001 when I acquired a free copy of the eMindMaps software, and by the end of 2001 I had started making MindMaps of books on esoteric topics such as the Egyptian pyramids and the origin of Atlantis. I have no record of my detailed intentions in doing this, but I guess I wanted to experience the process of recording all the nuggets I found in a book – and then to explore what could be done to integrate and exploit the material from several different MindMaps. In all, I made MindMaps of 19 books over a two year period; but that’s as far as I got. At the end of 2001 I started a new job in bid management and my energies were increasingly taken up with managing very intensive bids, with documenting the bid process, and with operating a Lessons Learned programme. I had no time to pursue these information nugget ideas any further.

This concludes my review of my previous activities in the use of information nuggets. The questions I posed in the 2001 BIT paper still remain largely unanswered, and the operation of the PAWDOC system has not provided any further insights on the subject since then. However, the existence of the sidelined documents, and of the 19 MindMaps, do provide an opportunity to undertake some rudimentary practical work to explore if the information nuggets identified were memorable and of any use. Subsequent entries will outline the methods that will be used to undertake these investigations, and will report on the results.

Knowledge Nugget Endeavours

In 1981 I was working in the newly formed Office Systems team in the UK National Computing Centre, and I was interested in how the new technology could support the management of an individual’s office documents. So, a colleague and I decided to experiment with our own documents. This was the start of a still-running practical exploration of how to manage personal documents using digital technology.

It was my practice to highlight key text with a side line as I read documents, and, as my document collection grew, I began to wonder how I could make explicit use of this very specific information. No doubt the act of highlighting was in itself helping me to assimilate documents; but I wasn’t sure if all the highlighted facts were being retained in my brain and being used to develop new concepts.

During the 1990s, the trendy new topic of Knowledge Management emerged which provided a recognised arena in which I was able to explore these ideas. Sometime during this period, I latched onto the term ‘nugget of information’ (the first published mention of this in my filing index is in an article by one Ted Howard-Jones in the March 1998 issue of the Groupware and Communications Newsletter). My attempts to relate lowly personal filing to the Knowledge Management field eventually fizzled out in the face of much sexier concepts such as an organisation’s ‘intellectual capital’. However, in the early 2000s, I did make a specific attempt to see if I could use Concept Mapping software to capture nuggets, by applying it to 19 new age books on the pyramids and the like; but that is where my knowledge nugget endeavours ended.

Now that I’m trying to find a home for my document collection, and to identify the findings from its long term operation, it seems a timely moment to review this particular aspect, to do some practical work on the nuggets I’ve identified over the years, and to draw some conclusions on the topic.

Getting an HTTrack Copy

HTTrack is a free-to-use website copier. Its web site provides the following description:  “It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.”

I downloaded and installed HTTrack very quickly and without any difficulty, then I set about configuring the tool to mirror pwofc.com. This involved simply specifying a project name, the name of the web site to be copied, and a destination folder. The Options were more complicated and, for the most part, I just left the default settings before pressing ‘Finish’ on the final screen. There was an immediate glitch when I discovered that I had not provided the full web address (I’d specified pwofc.com instead of https://www.pwofc.com/ofc/); but having made that change, I pressed ‘Finish’ again and HTTrack got on with its mirroring.  Some 2 hours 23 minutes and 48 seconds later, HTTrack completed the job, having scanned 1827 links and having copied 1538 files with a total file size of 212 Mb.

The mirroring had produced seven components: two folders (hts-cache and www.pwofc.com) and 5 files (index, external, hts-log, backblue and fade).  The hts-cache folder is generated by HTTrack to enable future updates to the mirrored web site; the external file is a template page for displaying external links which have not been copied; backblue and fade are small gif images used in such templates; and the log file records what happened in the mirroring session. The remaining wwwpwofc.com folder and index file contain the actual contents of the mirror.

On double clicking the Index file, the pwofc.com home page sprang to life in my browser looking exactly the same as it does when I access it over the net. As I navigated around the site the internal links all seemed to work and all the pictures were in place, though the search facility didn’t work. External links produced a standard HTTrack page headed by “Oops!… This page has not been retrieved by HTTrack Website Copier. Clic to the link below to go to the online location!” – and indeed clicking the link did take me to the correct location (I believe it is possible to specify that external links can also be copied by setting the ‘Limit’ option ‘maximum external depth’ to one, but my subsequent attempt to do so ended with errors after just two minutes; I abandoned the attempt). The only other noticeable difference was the speed with which one could navigate around the pages – it was just about instantaneous. From this cursory examination I was satisfied that the mirror had accurately captured most, if not all, of the website.

An inspection of the log file, however, identified that there had been one error – “Method Not Allowed (405) at link www.pwofc.com/ofc/xmlrpc.php (from www.pwofc.com/ofc/)”. According to the net, a PHP file ‘is a webpage that contains PHP (Hypertext Preprocessor) code. … The PHP code within the webpage is processed (parsed) by a PHP engine on the web server, which dynamically generates HTML’. Interestingly, I wasn’t aware of having any content with such characteristics, but, on closer inspection of the files in my hosting folder, I found I had lots of them – probably hundreds of them. I tried to figure out what the error file related to but had no clue other than its rather striking creation date – 23/12/2016 at 00:00:00 – the same date as several of the other PHP files. I had not created any blog entries on that day, so my investigation ground to a halt. I don’t have the knowledge to explore this, and I’m not prepared to spend the time to find out. My guess is that the PHP files do the work of translating the base content stored in the SQL database into the structured web pages that appear on the screen. I’m just glad that there was only one error – and that its occurrence isn’t obviously noticeable in the locally produced web pages.

The log file also reported 574 warning which came in the form of 287 pairs. A typical example pair is shown below:

19:31:13        Warning:    Moved Permanently for www.pwofc.com/ofc/?p=987 19:31:13        Warning:    File has moved from www.pwofc.com/ofc/?p=987 to                                           https://www.pwofc.com/ofc/2017/06/29/an-ofc-model/

I tried to find a Help list of all the Warning and Error messages in the HTTrack documentation but it seems that such a list doesn’t exist. Instead there is a Help forum which has several entries relating to such warning messages – but none that I could relate to the occurrences in my log. As far as I can see, all of the pages mentioned in the warnings (in the above instance the title of the page is ‘an-OFC-Model’), have been copied successfully so I decided that it wasn’t worth spending any further time on it.

All in all, I judge my use of HTTrack to have been a success. It has delivered me a backup of my (relatively simple) site which I can actually see and navigate around, and which can be easily zipped up into a single file and stored.

A Backup Hosting Story

In the last few days I’ve been exploring making backup copies of this pwofc Blog using the facilities provided by the hosting company that I employ – 123-Reg. It was an instructive experience.

When I first set up the Blog in 2012 I had deliberately decided to spend a minimal amount of time messing around with the web site and to focus my energies on generating the stuff I was reporting in it. Consequently, most of my interactions with the hosting service had involved paying my annual fees, and I had little familiarity with the control panel functions provided to manage the web site. In 2014, I had made some enquiries about getting a backup, and the support operation had provided a zip file which was placed in my own file area. Since then I had done nothing else – I think I had always sort of assumed that, if something went wrong with the Blog, the company would have copies which could be used to regenerate the site.

However, when I asked the 123-Reg support operation about backups a few days ago, I was told that the basic hosting package I pay for does NOT include the provision of backups – and the company no longer provides zip files on request: instead, facilities are provided to download individual files, to zip up collections of files, and to download and upload files using the file transfer protocol FTP. Of these various options, I would have preferred to just zip up all the files comprising pwofc.com and then to download the zip file. However, the zipping facility didn’t seem to work and, on reporting this to the 123-Reg Support operation, I was told that it was out of action at the moment… So, I decided to take the FTP route.

I duly downloaded the free-to-use FTP client, FileZilla, set it up with the destination host IP Address, Port No, Username and Password, and pressed ‘Connect’. After a few seconds a dialogue box opened advising that the host did not support the secure FTP service and asking if I wanted to continue to transfer the files ‘in clear over the internet’. Naturally I was a little concerned, closed the connection, and asked 123-Reg Support if a secure FTP transfer could be achieved. I was told that it could be and was given a link to a Help module which would explain how. This specified that a secure transfer requires Port 2203 to be used (it had previously been set to 21), so I made the change and pressed ‘Connect’ again. Nothing happened. A search of the net indicated that secure FTP requires a Port No of 22, so I changed 2203 to 22 and, bingo, I was in.

FileZilla displays the local file system in a box on the left of the screen, and the remote file system (the pwofc.com files in this case) in a box on the right. Transferring the pwofc files (which comprise a folder called ‘ofc’, a file called ‘index’, and a file called ‘.htaccess’) was simply a matter of highlighting them and dragging them over to a folder in the box on the left. The transfer itself took about 12 minutes for a total file size of 246 Mb.

Of course, the copied files on my laptop are not sufficient to produce the web pages: they also require the SQL database which manages them to deliver a fully functioning web site. If you double click the ‘Index’ file it just delivers a web page with some welcome text but no links to anything else. Hence, these backup files are only of use to download back to the original hosting web site for the blog to be resurrected if the original files have become corrupted or destroyed. I guess they could also, in principle, be used to set up the site on another hosting service – though I have no experience of doing that.

Of course these experiences only relate to one customer’s limited experience of one specific hosting service and may or may not apply generally. However, they do indicate some general points which Blog owners might find worth bearing in mind:

  • Don’t assume that your hosting service could regenerate your Blog if it became corrupted or was destroyed – find out what backup facilities they do or don’t provide.
  • Don’t assume that all the functions provided by your hosting service work – things may be temporarily out of action or may have been superseded by changes to the service over the years.
  • Remember that a backup of the website may be insufficient to regenerate or move the Blog – be clear about what additional infrastructure (such as a database) will be required.
  • If you want to be able to look at the Blog offline and independently of a hosting service, investigate other options such as creating a hardcopy book, or using a tool such as HTTrack (which is discussed in the following entry).

ST’s Alternative Approaches

About 6 weeks ago (on 6th March), Sara Thomson of the Digital Preservation Coalition kindly spent some time on the phone with me discussing the archiving of web sites. I wanted to find out if there were any other solutions to the ones I had stumbled across in my brief internet search some 16 months ago. Sara suggested 3 approaches which were new to me and described them as follows in a subsequent email:

  1. UK Web Archive (UKWA) ‘Save a UK Website’: https://beta.webarchive.org.uk/en/ukwa/info/nominate Related to this – two web curators from the British Library (Nicola Bingham and Helena Byrne) presented at a DPC event last year discussing the UKWA, including the Save a UK Website function. A video recording of their talk along with their slides (and the other talks from the day) are here: https://dpconline.org/events/past-events/web-social-media-archiving-for-community-individual-archives
  2. HTTrack: https://www.httrack.com/  I gave a brief overview of HTTrack at that same DPC event last year that I linked to above. I have also included my slides at an attachment here – the HTTrack demo starts on slide 15.
  3. Webrecorder: https://webrecorder.io/ by Rhizome. Their website is great and really informative, but let me know if you have any questions about how it works.

Shortly after this, I followed the link that Sara had provided to the UKWA nomination site and filled in the form for pwofc.com. On 14th March I got a response saying that the British Library would like to archive pwofc.com and requesting that I fill in an on-line licence form which I duly completed. On 16th March I decided to explore the contents of the UKWA service and found it collects ‘millions of websites each year and billions of individual assets (pages, images, videos, pdfs etc.)’. I started looking at some of the blogs. The first one I came across was called Thirteen days in May and was about a cycling tour – but it seemed to lack some of the photos that were supposed to be there. The next two I looked at, however, did seem to have their full complement of photos; and one of them (called A Common Reader) had a strangely coincidental entry about ‘Instapaper’ which provides what sounds to be a very useful service for saving web sites for later reading. It looks like the UKWA does an automated trawl of all the websites under its wing at least once a year, so I guess that, as a backup, it should never be more than a year out of date.

An hour after completing this exploration, I got an email confirming that the licence form had been submitted successfully and advising that the archiving of pwofc.com would proceed as soon as possible but that it may not available to view in the archive for some time due to the many thousands of web sites being processed and the need to do quality assurance checks on each. Since then, I’ve been checking the archive every now and again, but pwofc.com hasn’t emerged yet. When it does, it’ll be interesting to see how faithfully it has been captured.

Regarding the other two suggestions that Sara made, I’ve decided to discount Webrecorder as that entails visiting every page and link in a website which would just take too much time and effort for pwofc.com. However, I’m going to have a go at using HTTrack, and I’m also going to try and get a backup of pwofc.com from my web hosting service. Having experienced all these various archiving solutions, there’ll be an opportunity to compare the various approaches and reach some conclusions.

The PAWDOC Preservation story

In May 2018 the inaugural digital preservation work on the PAWDOC collection was completed. The story of the work that was done, and the lessons that were learnt, are documented in the following paper which can be downloaded from this site subject to Creative Commons conditions:

The Application of Preservation Planning Templates to a Personal Digital Collection

Instances of the populated preservation planning templates that were used to control the work are also provided:

A summary of the work done and the lessons learned has been published as a Blog Post on the Digital Preservation Coalition (DPC) website.

The preservation planning templates were updated as a result of insights gained in the work and these are available as embedded files in the above ‘Application of Preservation Planning Templates’ paper and also in the DPC website.

March: Long and Plans

It looks like the blog post describing the Digital Preservation work undertaken last year on the PAWDOC collection, will be published next month on the DPC website. It will refer to the full paper describing the work in more detail, which will be published here within pwofc.com. At the same time, the preservation planning document templates will be replaced by updated versions in the DPC website.  The publication of all these materials will be a fitting end to the preservation planning activities that are described in previous entries in this site. However, there will still be one thing to do before the topic can be considered complete and that is to review the effectiveness of the Preservation Maintenance Plan template when an instance of it will be used in the PAWDOC Preservation maintenance exercise scheduled for September 2021.

A few insights and conclusions

The sort-out of my publications, reports and CSCW proceedings (broadly categorised as ‘things I had created and done’) confirmed that I have a particular interest in material I had created or had made significant contributions towards. It was undoubtedly rewarding to revisit the material – though I wouldn’t anticipate doing it again very often. In fact, it made me realise that just having the knowledge that all the material is available and easily accessible, is itself a very satisfying and reassuring thought. Of course, having a complete collection of work documents to draw on when assembling full sets of my publications and reports, was slightly unusual; most people might only have partial sets depending on what particular material they had saved in the course of their careers.

The items included in the category ‘things I had created and done’ are only a subset of all the work items I’ve kept over the years. I have previously digitised over 80 of my work book collection as described in the Electronic Bookshelf journey; I’ve created story boards for 30+ work books that I regarded as special in some or other; and my PAW-PERS collection of memorabilia contains aver 120 other items in the following additional categories:

Formal job documents (offer letters, job specs, pension info, pay slips etc.): I originally kept these for reference; but now, of course they have become very informative pieces of memorabilia.

Company information (brochures, newsletters etc.): Many of these are well presented documents providing detailed information about the organisations I worked for.

Recognition objects (certificates, long term service awards, contract win artefacts etc.): I didn’t keep the originals of certificates confirming I had completed in-company courses as they didn’t seem very significant; however I do value a certificate from my professional body and keep it framed on my study wall. I’ve kept the cut glass paperweight celebrating a contract win, and the cut glass bowl for long service, which are both in our crystal cabinet; though they are retained more because of their looks than as reminders of work. I also value the long service domino set (very nice in a large wooden box) which I chose deliberately because I knew I would want to keep it long term for both its utility and its looks.

People I worked with (humorous documents, social gatherings, leaving cards, etc.): These are generally mementos of the people I worked with and the activities we did together.

Associated activities (company sports and social clubs, trade unions, professional bodies etc.) These are mementos of my activities in organisations associated with my work, and they are surprisingly prominent in my collection. I guess they such organisations have played a significant part in my working life over the years.

In thinking back about what I’ve done with all these different sets of work items, I was reminded of how sometimes particular items have corrected a fact that I had mis-remembered. For example, for several years, I believed that I was the instigator of the Alvey project I was involved in (Cosmos). However, in trawling through my documents to create one of the Electronic Story Boards, I discovered that it was a colleague who had been the instigator and I was a very ardent subsequent advocate. I guess that often we remember things in the way we would like them to be, not necessarily the way they actually were. Hence, having some documentation or other artefact can cast a truer light on the past. However, it must be remembered that the documents we have may only be a subset of all the relevant documents that were produced; and/or that their contents may just be reflecting the biases of the authors. Hence, whatever the nature of our ‘record’, be it memory, or a selection of the relevant items that you have, or all of the items that you have, or, indeed, all the relevant items that exist in the world, we should always remember that it may not be the whole story.

As with my non-work mementos, most of these work items have been digitised and the originals disposed of; though a small number, which I decided are special in some way or other, have been retained in physical form. In this respect, these work items are very similar to other types of memento. However, there is one very significant difference: many of these work items will not be recognisable by my wife and family. That’s because my work took me to a different place and a different life for a part of each day – as it does for very many people; hence, work mementos are likely to mean more to the individual than to family, relatives and friends. Consequently, I suspect that such collections are even less likely than other types of mementos to be retained and maintained by future generations of the family. I believe this to be almost certainly true for physical work mementos (I can’t see people hanging onto bulky books and papers which mean little to them). However, I’m less sure about digital collections which, in principle, are much less obtrusive and much easier to keep in the short term, but do rely on some care and attention as computers are replaced and technology advances. In fact, this uncertainty must apply to all informally-held digital collections – too little time has passed so far to be able to discern if such material is being passed down the generations. Interestingly, I do see the possibility of Artificial Intelligence playing a role in managing such material, and this could significantly affect how much of its digital history a family may have access to in the future.

In summary, this short review of my work mementos seems to have thrown up the following insights:

  • Categories of work mementos include; things the individual has created and done, work books, formal job documents, company information, recognition objects, people the individual has worked with, and associated activities.
  • While work mementos are similar to other type of mementos, they do provide reminders of a part of life that is often very personal to the individual and often separate from family life. In as much as work is often done with other people, it is almost like a parallel life with a separate family; hence, it generates a separate set of mementos.
  • Work concerns making, creating and doing things; and if individuals are in any way proud of what they have done, then they may well be keen to retain examples of what they achieved and to inspect them from time to time.
  • It is very satisfying and reassuring to know that examples of what you have produced at work, are safely stored away and accessible when you want them. Just being able to have those thoughts may be as rewarding as actually looking at the material.
  • Our ‘record’ of events is only as good as the material we have, be it memory or a few relevant artefacts, or lots of artefacts. We should always remain open to the possibility we don’t have all the facts.
  • Work mementos are probably less likely to be passed on down family lines than other types of mementos.
  • Physical work mementos are less likely to be passed on down family lines than digital work mementos.
  • Artificial intelligence may result in many more digital mementos of all types being passed on reliably down the generations.