The UK Web Archive

Its been over a year since I wrote about this journey, so I’ll start this entry with a short recap of where I’m up to. Back in March 2019, I decided I would explore three different ways of archiving this pwofc website. First, by using tools provided by the company I pay to host the site; second, by using a tool called HTTrack, and thirdly, by submitting the site for inclusion in the British Library’s UK Web Archive (UKWA).

My experiences with the hosting site tools was less than satisfactory, and are documented in a post on 28April2019 entitled ‘A Backup Hosting Story’. My use of HTTrack was much more rewarding; it produced a complete backup of the whole of the site which could be navigated on my laptop screen with near instantaneous movement between pages, and which could be easily zipped into a single file for archiving. This is written up in the 30Apr2019 post titled ‘Getting an HTTrack copy’.

I’ve had to wait till now to relate my experience of submitting the site to the British Library’s UK Web Archive (UKWA), because the inclusion in the archive has been a little problematic. Here’s what happened: following a suggestion from Sara Thomson of the DPC, I filled in the form at https://beta.webarchive.org.uk/en/ukwa/info/nominate offering pwofc.com for archiving. Within about three weeks I received an email saying that the British Library would like to archive the site and requesting that I fill in the on-line licence form which I duly completed. A couple of days later, on 16th March 2019, I got an email confirming that the licence form had been submitted successfully and advising that: “Your website may not be available to view in the public archive for some time as we archive many thousands of websites and perform quality assurance checks on each instance. Due to the high number of submissions we receive, regrettably we cannot inform you when individual websites will be available to view in the archive at http://www.webarchive.org.uk/ but please do check the archive regularly as new sites are added every day.”

From then on I used the search facility at http://www.webarchive.org.uk/ every month or so to look for pwofc.com but with no success. Over a year later, on 21st April 2020, I replied to the licence confirmation email and asked if it was normal to wait for over a year for a site to be archived or if something had gone wrong. The very prompt reply said, “Unfortunately there is a delay between the time we index our content and when it can be searched through the public interface. We aim to update our indexes as soon as possible and this is an issue we are trying to fix, please bear with us as we do have limited resources. Your site has been archived and it can be accessed through this link: https://www.webarchive.org.uk/wayback/archive/*/http://www.pwofc.com/.

Sure enough, the link took me to a calendar of archiving activity, which showed that the site had been archived three times – twice on 01July2019 (both of which seemed to be complete and to work OK); and once on 13Mar2020 (which when clicked seemed to produce an endless cycle of uploadings). I reported this back to the Archivist who scheduled some further runs, and who, after these too were unsuccessful, asked if I could supply a site map. I duly installed the Google XML Sitemaps plugin on my pwofc.com WordPress site, provided the Archivist with the site map url, https://www.pwofc.com/ofc/sitemap.xml, and the archive crawler conducted some more runs. The 13th run of 2020, on 22nd June, seemed to have been successful: the archived site looked just as it should. I then set about doing a full check of the archived site against the current live site to ensure that all the images were present, and that the links were all in place and working. The findings are listed below:

  • External links not collected: Generally speaking, the UKWA archive had not included web pages external to pwofc.com. Instead, when such a link is selected in the archive one of the following two messages is displayed: either “The url XXX could not be found in this collection” (where XXX is the URL of the external site); or “Available in Legal Deposit Library Reading Rooms only”. However, in at least two instances the link does actually open the live external web page. I don’t know what parameters produce these different results.
  • Link doesn’t work: For one particular link (with the URL ‘http://www.dpconline.org/advice/case-notes’), which appears in two separate places in the archive, there is no response at all when the link is clicked.
  • Home link doesn’t work on linked internal pages: links to internal pages within pwofc.com all work fine in the archive. However, the Home button on the pages that are displayed after selecting such links, doesn’t produce any response.
  • Image with a link on it not displayed: The pwofc.com site has two instances of an image with a link overlaid on it. The archive displays the title of the image instead of the image itself.

On the whole, the archive provides quite a faithful reproduction of the site. However, the fact that no information was collected for most external web pages, and no link to the external live web pages is provided either, is quite a serious shortcoming for a site like pwofc.com which has at least 26 such links. Having said that, the archive aims to collect all the web sites on its books at least once a year; and all the different versions appear to be accessible from a calendared list of copies; so, should one be able to get on the UKWA roster, this would appear to be quite an effective way to backup or archive a blog.

A Story Board a Day Evaluation

Yesterday I started an evaluation of my Electronic Story Boards. Its been over a year and a half that I first put them together and since then I’ve looked at them occasionally; referred to them when I needed some specific information; and even forgotten that some information I knew I had was actually on one of them. However, I haven’t yet made a methodical assessment of how interesting, useful or effective they are. I’m going to try and do that by looking at a different story board every day starting with No 1 and working my way through to the final one – No 35.

No 1 is the Levinson book on Pragmatics, and it’s story board effectively summarises my involvement in the Cosmos project. After looking at it, two words immediately came to mind – Rich, and Personal. That one single page is rich in content – every element bringing back powerful memories; and Personal – because all the content is to do with me.

Later on yesterday, I took a look at the electronic version on the iPad. It was simple to find – all 35 story boards are represented as thumbnails on a single Sidebooks screen on the iPad. Selecting the Pragmatics Story Board brought up a full screen image that looked exactly like the laminated version I’d been looking at on the side of my bookcase. It was just as rich and personal, and it also enabled me to click the arrows and bring up further pages of related material. But, interestingly, those further pages didn’t add a great deal to the experience. The sense of wonder and powerful feelings that I felt, were generated by the material on the main story board: the additional material didn’t really augment them. However, I thought, those supporting pages would certainly be useful if you were specifically looking for detailed information.

That was my initial experience in this 35 day evaluation. I’ll make notes as I go, and summarise my conclusions in 5 or 6 week’s time.

New version 2.5 of the Maintenance Plan Template

A couple of days ago I completed an experiment to use the Maintenance Plan template to undertake initial Digital Preservation work on a collection instead of using the Scoping document. It proved to be very successful. The collection is relatively small with only 840 digital files of either jpg, pdf or MS Office format, so there were few complications and I was able to proceed through the Maintenance Plan process steps without any serious holdups. The whole exercise took just over a week with the majority of the time being taken up by the inventory check of the digital files and of about 300 associated physical artefacts. I used the structure of the Maintenance Plan to document what I was doing and to keep a handle on where I was up to.

As a result of this exercise I’ve now added the following guidance to the beginning of the Maintenance Plan template, and equivalent text to the beginning of the Scoping document template:

If this is the first time that Digital Preservation work has been done on a collection

EITHER use the Scoping template to get started (best for large, complex collections)

OR use this Maintenance Plan template to get started (can be effective for smaller, simpler collections – retitle it to ‘Initial Digital Preservation work on the @@@ collection’ and ignore sections Schedule, 3, 4 and 7)

This concludes the interim testing and revision of the Maintenance Plan template. It has resulted in some substantial changes to the latest version 2.5 of the document (an equivalent version 2.5 of the SCOPING Document Template has also been produced). The final and most substantial test of the Maintenance Plan template will take in September 2021 when the large and complex PAWDOC collection is due to undergo its first maintenance exercise.

More than a Maintenance Plan?

Yesterday I finished the maintenance work on my PAW-PERS collection and so now have a refined version of the Maintenance Plan template based on two real-world trials. However, before publishing it, I’m going to take the opportunity to see if it could be used to start every Preservation Planning project. I’m able to do this because I have one other collection which has, as yet, had no preservation work done on it. It is the memorabilia that my wife and I have accumulated since we were married, and it is called SP-PERS.

Each of the three collections that I have subjected to Digital Preservation (DP) measures so far, have been through the process of creating a Scoping document followed by the production and implementation of a DP Plan, and finally the creation of a DP Maintenance Plan specifying works a number of years hence. However, my recent implementation of Maintenance Plans has led me to believe they might provide a structured immediate starting point for any preservation planning project.  They do not preclude Scoping documents etc. – indeed they explicitly discuss the possible use of those other tools halfway through the process. So, the opportunity to try using the Maintenance Plan template as a way in to every DP project is too good to miss. I’m starting on it today.

First trial of the Maintenance Plan

Today I completed the first real trial of a Maintenance Plan using the Plan I created for my Photos collection in 2015. It was one of the first Plans I’d put together so is slightly different from the current template (version 2.0 dated 2018). However, both have the same broad structure so the exercise I’ve just completed does constitute a real test of the general approach.

Overall, it went well. In particular, having a step by step process to follow was very helpful; and I found it particularly useful to write down a summary of what I’d done in each step. This helped me to check that I’d dealt with all aspects, and gave me a mechanism to actively finish work on one step and to start on the next. I found this to be such an effective mechanism that I modified the current Maintenance Plan Template to include specific guidance to ‘create a document in which you will summarise the actions you take, and which will refer out to the detailed analysis documents’. It’s worth noting that I was able to include this document as another worksheet in the collection’s Index spreadsheet, along with the Maintenance Plan constructed in 2015 and the Maintenance Plan I have just constructed for 2025. Being able to have all these sub-documents together in one place makes life a whole lot easier.

The exercise also identified another significant shortcoming of the template – it includes no details about the collection’s contents and their location(s). Consequently, an additional ‘Contents & Location’ section has been included at the beginning of the template.

The Photos collection has certainly benefited from the exercise; and the experience has enabled me to make some useful modifications to the template. I intend to tackle the second test of the Maintenance Plan (for the PAW-PERS collection) in the next few weeks, and will then publish an updated version 2.5 of the Maintenance Plan template which will include all the refinements made in the course of these two trials.

Maintenance Plan Template Refinement

The final piece of work in this Digital Preservation work is to test and refine the Maintenance Plan template. I’ll be doing this by implementing the following plans drawn up in earlier stages of this preservation journey:

I’m late in starting the PAW-PERS maintenance work because earlier this year I was focused on completing the ‘Sorties into the IT Hurricane’ book. Now that’s out of the way, I plan to complete the PAW-PERS and PHOTO maintenance during May and to use that experience to update the Preservation MAINTENANCE PLAN Template – v2.0, to version 2.5. The insights gained in the major maintenance exercise on the PAWDOC collection in Sep 2021, will be used to produce version 3.0 of the Maintenance Plan template. Updates to the other templates (SCOPING Document, and Project Plan DESCRIPTION and CHART) may also be made at that point if necessary. I shall offer the revised templates to the DPC for inclusion in their website. These will be the final activities in the Digital Preservation work being documented in this journey.

Self-publishing a Photobook

To get an idea of the possibilities for photobooks, just take a look at the Blurb bookstore; there’s a huge diversity of subject matter, and the books look great. It’s clear that anyone who has a passion can create a permanent record which will sit handsomely on a bookshelf for around the cost of a meal out or less. Furthermore, authors can elect to sell their book in the Blurb bookstore and/or through Amazon; and they can specify how much money they want to make on the sale of each copy. Blurb will keep track of sales and remit the income due to the author each month.

I’d already had a go back in 2012 – but with a service designed more for the presentation of photographs rather than discursive text. The result was pleasing but not brilliant. I’d heard there were more appropriate online printing operations – and I determined to try one out sometime. My opportunity came last summer when I decided that I might have more success finding a permanent repository for my work document collection, if I had a book of memorable experiences based on the contents of the documents. I decided to use the Blurb service for no better reason than I’d had a brief look at it a few years ago after seeing it get a good rating in a review of self publishing services. There are many other such services available on the net today and I don’t know how they currently compare to Blurb.  You should check them out.

I decided that my book would consist of one page write ups of particular events, each one accompanied by a page of images. I opted to create the text first in Microsoft Word and then to decide what images to include when I imported each piece of text into Blurb’s BookWright page layout package.

I started writing the text in September 2019. It was mostly done by the end of January 2020, at which point I downloaded the BookWright software. Although it took a bit of getting used to, it wasn’t too difficult, and I found the functionality quite good. There were a couple of minor problems: first, the software closed abruptly, without notice, five or six times – but each time it fired up again and opened up the book’s contents successfully without having lost any data. Second, typing was sometimes slow to reproduce on screen. Exchanges with Blurb Support suggested it was due to a lack of virtual memory – which didn’t surprise me because I was using Word, Excel, Powerpoint, Filemaker, and a PDF package all at once to create the contents while I was using BookWright. Closing some of these seemed to resolve the issue.

The biggest issue I faced was with the resolution of the images I was including. The Blurb Help files warn against grainy, blurry or pixelated images, but, of course, you can only be absolutely sure you have avoided this pitfall when you get the printed book. BookWright itself provides a warning when it thinks an image will not be up to standard (which typically occurred when I was trying to expand an image to make it easily readable or to fill a page). I took notice of these warnings and either made the image smaller or found a way of increasing its resolution. I achieved the latter by either rescanning a physical document at a higher resolution, or printing out an electronic document in high quality and then scanning at a high resolution. Although these two approaches did seem to improve the quality of many of the images, they also substantially increased the file size of the book (about 4.2Gb at that point). A search on the net about the size of BookWright files, reassured me that uploads of that size and more were not unusual – but I did discover that eBooks cannot be produced for files over 2Gb. I also discovered – rather too late in the day – the BookWright advice to use the png lossless format in preference to jpg. I guess this just highlights the fact that I really don’t know too much image formats and resolutions. Nevertheless, most of the images seemed to turn out OK in the finished book. The key seems to keep image sizes below the threshold of the BookWright warning messages.

I had 195 separate stories, so there was at least one image to find and import for each one – and, in some cases, several images. It was a long haul and took me until the 19th March before I’d finished the first pass through in BookWright, and could start the final edit.

I’d elected to subdivide the stories into nineteen short stories – each one labelled with an icon comprising a unique set of different shapes and including the page number of the next story. The idea was that readers of a particular short story could find the next instalment at the specified page number. The page numbers went into the Contents list, and into the icons, on 27th March, and then it was onto creating the dust jacket and doing final checks.

On 30th March, I was ready to submit the 4.75Gb file using Blurb’s Upload facility. First the system ‘rendered’ the file down to 492Mb; and then it did the Upload. The whole process took about 37 minutes. I was all set to order a copy, but found that the discount code I’d planned to use, didn’t work. I searched the Blurb site and the net for 45 minutes and tried lots of codes – but none were current. I decided to wait – the full price of £103.59 was too much to ignore the possibility of a substantial reduction. It was worth the wait – on 1st April Blurb advertised a 41% discount code, so I paid the overall cost of £73.70 (which included a £2.99 PDF copy, £8.99 delivery, and 60p tax), and was told to expect delivery by 14th April.

The book arrived around 9am on 7th April. It exceeded my expectations, with a bold glossy cover, glossy pages, clear text, and bright images. I spent the rest of the day checking each page noting the corrections needed; and then the next two days making final changes. On the morning of 10th April, I did a final preview of the book and this turned up about a dozen further changes. At around 3.30pm I started the Upload process. The system took about 10 minutes to render the 5Gb file down to 496Kb; and a further 27 minutes to upload it.

Putting the book into the bookstore was not particularly difficult – but it did take a little time. There was a book description to write, categories to select, and keywords to specify. Then I had to decide how much profit I wanted to add onto the price of the book; and finally there was the specification of which pages I wanted people to see in the preview. I completed the whole business by around 5.30pm – glad to be able to take a break from the perishing book.

Overall, I’ve found it to be a very effective and satisfying experience. It has been a long and demanding exercise – but that was to be expected with a 438 page book of this nature. I elected to produce a photobook on 118gsm standard semi-matte high quality paper – however, I’ve no reason to suppose that the results couldn’t be commensurately as good for the other types of book and paper that Blurb offers. The BookWright software provides very flexible text options and layout capabilities, and seems to be able to handle images very well; and the bookstore facility provides a ready made distribution channel for the finished books.

However, there is one aspect that needs to be borne in mind. The price of a print-on-demand book is inevitably going to be greater than the price of mass produced books in a physical bookshop. Blurb books give absolute control to the author – but may price the book out of the market. There are volume discounts to be had – but the demand for a bulk lot has to be created by the author. When authors get publishing deals they do, indeed, cede much power to the publishers; but, in return, the publishers establish markets for the books and keep their prices down. This trade off becomes particularly apparent for large glossy books such as the one I have created. It is far less so for softback books with many fewer pages and of lower quality paper, of which many examples can be found on the Blurb bookstore.

Of course, these price concerns are of little consequence if all you are trying to do is to exploit some of the artefacts that you possess and make them visible. My experience with Blurb – and the huge range of examples in the Blurb bookstore – shows that using a self-publishing service provides ample opportunity to use your creativity and artefacts to bring to life your memories, ideas and passions.

Oh, and the book I created? Well here’s the cover. Clicking it will take you to the Blurb bookstore where some of its contents can be previewed.

A subjective halfway view

I’ve just acted as subject in our first investigation into the memorability and impact of information nuggets. The nugget material, in this case, was mindmaps of key points in nineteen esoteric-type books which explore perceived unresolved mysteries from ancient Egyptology to modern secret societies.  I discovered that I could remember almost none of the points presented to me and was unable to link any of them to a particular development in my thinking. My immediate reaction to this disappointing – but probably to be expected – finding was that these are not actually nuggets of information but instead are just parts of a summary of each book.

However, on reflection, I’ve reversed that view. After all, when I was picking out the points as I read, I must have thought each of them to be significant – otherwise I wouldn’t have picked them out. So, how is a key point in a book different from a key point in, say, a five page article? Well there are some obvious differences like the book is a lot bigger and has a lot more stuff in it – most of which I’m not familiar with AT ALL. Unless one has a photographic or otherwise superb memory, you wouldn’t expect to remember everything in such a book after one quick casual read. Of course, I have the books on my bookshelf and have the look of each one locked in my memory with some ideas of what it’s about. However, this is the case because there are just a few hundred of them, and they have a rich content and the covers and spine usually have distinctively memorable images. In contrast, the articles and documents in my work collection (which are due to be investigated next), are much more numerous; are hidden away in my computer (with just a few in my physical archive box); and they all look very similar and have very few distinctive markings.

I guess I’ve expanded my thinking this morning about all this. However, I’m only the subject and we’re only half way through the overall exercise. The interesting bit will be what the researcher concludes from it all.

The truth about truth

Maybe most people have already twigged this, but the BBC programme ‘The Capture’ has made me realise that we can no longer rely on videos for the truth. It illustrates how live camera feeds can be altered – dramatically. I believe sophisticated and moneyed organisations can do this today; and I think it will become easier as time goes by.
So, to add to the possibility of text being untrue, and of people’s accounts and memories being untrue, and of photos being faked, we must add that videos may be false. Is there anything left – well perhaps just our own internal thoughts and memories, but no doubt our race will get to manipulating those too.
So, I guess, we are back to a great truth that our enquirers and thinkers have known for hundreds of years: there is no substitute for diligence and multiplicity in our search for what is and what has been. Our modern technology has made us slack and gullible and persuaded us that we can nail down reality. In fact, reality has to be carefully investigated and checked and rechecked, and then still considered with a critical eye as we use it generously to develop our understanding and knowledge.

Power Booking

People in power a few hundred years ago just didn’t have access to up to date global information. These days such people have no excuse as  large numbers of diligent writers research global issues and publish up to the minute resumes on a wide range of topics around the world. There is no excuse for failing to be aware of what humans have done, and continue to do, to each other; what effects we are having on the planet we live and depend upon; what our universe might consist of; and what possible futures we might have within it. Even I, with just a few books I have read in the last few years, feel informed and broadened. If each world leader were to be given just ten or fifteen books to read at the start of their reigns, perhaps they would act rather more in the interests of all of us, than they currently appear to do so.