Four approaches and some decisions

I set out on this journey to find a way of archiving this pwofc.com web site. I’ve explored four different approaches, each with their own distinctive characteristics as summarised below:

Hosting Backup Facilities: No doubt different for each hosting operation, therefore my experience is limited to the hosting package that I use (which does not include a backup service).

Provides functions to download collections of files.
Creates a point-in-time replica of the content files of the web site.
The backup replica cannot be viewed on its own – it requires other facilities such as underlying database software to generate the web site.

HTTrack: This is a free software package that operates with a GNU General Public Licence. It downloads an internet web site such that it can be read locally in a browser.

Creates a point-in-time local replica which can be read offline with a browser, with PC response times.
It did not replicate the search facility available in my online web site.
It has complex configuration options and limited documentation and help – but these were not needed to undertake a simple mirror of the web site.
If the capture configuration specifies that external sites should not be captured, it provides URLs which can be clicked to go to that site (I did attempt to capture external web pages but the first time it failed after 2 minutes, and the second time (when I thought I had specified that it should collect just 1 specific external web page) it was collecting so much that I had to stop it running – I clearly didn’t configure it correctly…
The files that HTTrack produces can be zipped up into a single file and archived.

UK Web Archive (UKWA): A British Library service that stores selected web sites permanently; which captures updated versions on a yearly basis; and which makes all copies freely available on the net.

Requires that a web site is proposed for inclusion in the UKWA, and that approval is given.
Creates dated replicas of a web site which can be selected and viewed online in a browser.
Does not include the contents of external pages, and, in some cases, does not even provide a clickable URL of the external page.
In the replica site, the Home link on a page that has been linked to, doesn’t work.
In the replica site, two images with embedded links are not displayed and are replaced with just the text titles of the images.

Book: A copy of the web site printed on paper and bound in a book.

Creates a point-in-time replica of the web site on paper with some formatting adjustments to accommodate the different medium.
Produces a copy in a format which is very familiar to humans and which can be easily accommodated on a shelf in a house.
Cross referencing links work but are slower to follow than the digital equivalent.
May have better longevity than a digital equivalent.

As a result of these investigations, I’ve decided to:

Continue to use HTTrack to create mirrors of pwofc.com periodically
Continue to create a book of pwofc.com every five years: the next one is due in 2022 and will be called ‘Feel the Join’ (as opposed to the 2017 version which was called ‘Touch the Join’).
Be thankful that the British Library is archiving pwofc.com.

I think that just about wraps up all I’m prepared to do on this subject, so this journey is now complete.

OFC

Order from Chaos, Digitisation, and their intersection

Four approaches and some decisions

Leave a Reply Cancel reply