Observing politics over the last few years, it does seem that women sometimes have a different perspective on some issues and how they are approached. It’s got me thinking that perhaps women and men ought to be equally represented in political systems. The easy way to achieve that would simply be to have two election contests for each constituency – one for the female representative and one for the male representative.
Author Archives: admin
Dress fit
I’ve been pondering on my last entry about easy-pull-on socks, and realised that, actually, balancing on one leg to put a sock on is really quite athletic. Perhaps it would be possible to put together a coherent fitness programme based around dressing and undressing. Specific designs of particular items of clothing would require the use of particular muscles and skills to put them on and take them-off; and different designs would facilitate the exercise of different sets of muscles and different levels of difficulty.
Opening the channels
Since our initial phone conversation on 28th Feb, Peter Tolmie and I have Skyped twice more – we seem to have got into a pattern of speaking every four weeks or so. In our second conversation, Peter pointed out to me that my PAWDOC filing system was just another manifestation of my inclination to keep things – as amply demonstrated in the various journeys documented in pwofc.com. He asked me what I thought I’d learnt from all these experiences, and I recounted a few things that immediately came to mind. Afterwards, however, I began to think that there were a great many more learnings dotted around the website. So I duly trawled through pwofc.com and recorded in a spreadsheet anything that looked like a finding. For good measure, I used another worksheet in the same spreadsheet to list all the requirements and findings specified in the paper about PAWDOC that was published in Behaviour & Information Technology (BIT) in 2001. I’ve given the spreadsheet to Peter and it will provide a base set of information for our investigations going forward.
My re-assessment of the BIT paper reminded me that one of the things I was thinking about when I wrote it was how one could use the key points in the documents you read to develop one’s knowledge. This idea stemmed from my practice of putting a line next to key points – or nuggets as I termed them – in documents. I remembered that I’d made a start on this work some 17 years ago by recording in a Mind Mapping programme the nuggets I found in books about the Pyramids etc. Peter and I discussed the possibility of my revisiting this material in a ‘Nugget Management’ journey sometime.
In our last Skype call on 25th April, Peter asked if I could keep an auto-ethnographic log of my keeping activities to provide us with more base material to draw on in our analysis activities. I duly created a spreadsheet with the headings listed below and am now recording all instances in which I make a specific effort to store a physical or digital artefact. The word ‘specific’ is used to exclude general keeping of things like email messages in email folders; and the word ‘artefact’ is used to explicitly require that a whole integral item is kept not just information removed from it like the name of a species from a plant label.
- Ref No
- Date
- Item
- How the instance arose
- Reason for keeping
- Initial actions and decisions made
- Actions taken
Peter’s comment on my request for his views on my recording scheme was “This is great. It’s not how I would have done it myself, but that doesn’t matter at all. The main thing is that it works for you. Just different work practices because we come from different backgrounds. Nothing more.”; and I doubt that I, on my own, would have come up with the idea of a generalised keeping log. Herein are clues as to the sheer unique and precious value of collaboration with our fellows.
Easy to Pull on Socks – EPS
If you want to get going you want to be able to put your socks on quickly. You want to be able to stand on one leg and just have the sock glide over your toes and instep and slip around your heel like water going round a u-bend. Some socks have that soft pliable texture – and retain it through the washing machine; but an awful lot don’t. It would be great if sock suppliers could make socks with such a capability and sold them as ‘easy to pull on socks’. They may already be out there but I haven’t seen them. On the other hand, there are socks out there which have such characteristics but are not advertised as such. I’ve got an odd sock that does fit the bill and I’m going searching round the stores with it; but it would be so much easier if such socks were sold with an EPS label.
Getting an HTTrack Copy
HTTrack is a free-to-use website copier. Its web site provides the following description: “It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.”
I downloaded and installed HTTrack very quickly and without any difficulty, then I set about configuring the tool to mirror pwofc.com. This involved simply specifying a project name, the name of the web site to be copied, and a destination folder. The Options were more complicated and, for the most part, I just left the default settings before pressing ‘Finish’ on the final screen. There was an immediate glitch when I discovered that I had not provided the full web address (I’d specified pwofc.com instead of https://www.pwofc.com/ofc/); but having made that change, I pressed ‘Finish’ again and HTTrack got on with its mirroring. Some 2 hours 23 minutes and 48 seconds later, HTTrack completed the job, having scanned 1827 links and having copied 1538 files with a total file size of 212 Mb.
The mirroring had produced seven components: two folders (hts-cache and www.pwofc.com) and 5 files (index, external, hts-log, backblue and fade). The hts-cache folder is generated by HTTrack to enable future updates to the mirrored web site; the external file is a template page for displaying external links which have not been copied; backblue and fade are small gif images used in such templates; and the log file records what happened in the mirroring session. The remaining wwwpwofc.com folder and index file contain the actual contents of the mirror.
On double clicking the Index file, the pwofc.com home page sprang to life in my browser looking exactly the same as it does when I access it over the net. As I navigated around the site the internal links all seemed to work and all the pictures were in place, though the search facility didn’t work. External links produced a standard HTTrack page headed by “Oops!… This page has not been retrieved by HTTrack Website Copier. Clic to the link below to go to the online location!” – and indeed clicking the link did take me to the correct location (I believe it is possible to specify that external links can also be copied by setting the ‘Limit’ option ‘maximum external depth’ to one, but my subsequent attempt to do so ended with errors after just two minutes; I abandoned the attempt). The only other noticeable difference was the speed with which one could navigate around the pages – it was just about instantaneous. From this cursory examination I was satisfied that the mirror had accurately captured most, if not all, of the website.
An inspection of the log file, however, identified that there had been one error – “Method Not Allowed (405) at link www.pwofc.com/ofc/xmlrpc.php (from www.pwofc.com/ofc/)”. According to the net, a PHP file ‘is a webpage that contains PHP (Hypertext Preprocessor) code. … The PHP code within the webpage is processed (parsed) by a PHP engine on the web server, which dynamically generates HTML’. Interestingly, I wasn’t aware of having any content with such characteristics, but, on closer inspection of the files in my hosting folder, I found I had lots of them – probably hundreds of them. I tried to figure out what the error file related to but had no clue other than its rather striking creation date – 23/12/2016 at 00:00:00 – the same date as several of the other PHP files. I had not created any blog entries on that day, so my investigation ground to a halt. I don’t have the knowledge to explore this, and I’m not prepared to spend the time to find out. My guess is that the PHP files do the work of translating the base content stored in the SQL database into the structured web pages that appear on the screen. I’m just glad that there was only one error – and that its occurrence isn’t obviously noticeable in the locally produced web pages.
The log file also reported 574 warning which came in the form of 287 pairs. A typical example pair is shown below:
19:31:13 Warning: Moved Permanently for www.pwofc.com/ofc/?p=987 19:31:13 Warning: File has moved from www.pwofc.com/ofc/?p=987 to https://www.pwofc.com/ofc/2017/06/29/an-ofc-model/
I tried to find a Help list of all the Warning and Error messages in the HTTrack documentation but it seems that such a list doesn’t exist. Instead there is a Help forum which has several entries relating to such warning messages – but none that I could relate to the occurrences in my log. As far as I can see, all of the pages mentioned in the warnings (in the above instance the title of the page is ‘an-OFC-Model’), have been copied successfully so I decided that it wasn’t worth spending any further time on it.
All in all, I judge my use of HTTrack to have been a success. It has delivered me a backup of my (relatively simple) site which I can actually see and navigate around, and which can be easily zipped up into a single file and stored.
A Backup Hosting Story
In the last few days I’ve been exploring making backup copies of this pwofc Blog using the facilities provided by the hosting company that I employ – 123-Reg. It was an instructive experience.
When I first set up the Blog in 2012 I had deliberately decided to spend a minimal amount of time messing around with the web site and to focus my energies on generating the stuff I was reporting in it. Consequently, most of my interactions with the hosting service had involved paying my annual fees, and I had little familiarity with the control panel functions provided to manage the web site. In 2014, I had made some enquiries about getting a backup, and the support operation had provided a zip file which was placed in my own file area. Since then I had done nothing else – I think I had always sort of assumed that, if something went wrong with the Blog, the company would have copies which could be used to regenerate the site.
However, when I asked the 123-Reg support operation about backups a few days ago, I was told that the basic hosting package I pay for does NOT include the provision of backups – and the company no longer provides zip files on request: instead, facilities are provided to download individual files, to zip up collections of files, and to download and upload files using the file transfer protocol FTP. Of these various options, I would have preferred to just zip up all the files comprising pwofc.com and then to download the zip file. However, the zipping facility didn’t seem to work and, on reporting this to the 123-Reg Support operation, I was told that it was out of action at the moment… So, I decided to take the FTP route.
I duly downloaded the free-to-use FTP client, FileZilla, set it up with the destination host IP Address, Port No, Username and Password, and pressed ‘Connect’. After a few seconds a dialogue box opened advising that the host did not support the secure FTP service and asking if I wanted to continue to transfer the files ‘in clear over the internet’. Naturally I was a little concerned, closed the connection, and asked 123-Reg Support if a secure FTP transfer could be achieved. I was told that it could be and was given a link to a Help module which would explain how. This specified that a secure transfer requires Port 2203 to be used (it had previously been set to 21), so I made the change and pressed ‘Connect’ again. Nothing happened. A search of the net indicated that secure FTP requires a Port No of 22, so I changed 2203 to 22 and, bingo, I was in.
FileZilla displays the local file system in a box on the left of the screen, and the remote file system (the pwofc.com files in this case) in a box on the right. Transferring the pwofc files (which comprise a folder called ‘ofc’, a file called ‘index’, and a file called ‘.htaccess’) was simply a matter of highlighting them and dragging them over to a folder in the box on the left. The transfer itself took about 12 minutes for a total file size of 246 Mb.
Of course, the copied files on my laptop are not sufficient to produce the web pages: they also require the SQL database which manages them to deliver a fully functioning web site. If you double click the ‘Index’ file it just delivers a web page with some welcome text but no links to anything else. Hence, these backup files are only of use to download back to the original hosting web site for the blog to be resurrected if the original files have become corrupted or destroyed. I guess they could also, in principle, be used to set up the site on another hosting service – though I have no experience of doing that.
Of course these experiences only relate to one customer’s limited experience of one specific hosting service and may or may not apply generally. However, they do indicate some general points which Blog owners might find worth bearing in mind:
- Don’t assume that your hosting service could regenerate your Blog if it became corrupted or was destroyed – find out what backup facilities they do or don’t provide.
- Don’t assume that all the functions provided by your hosting service work – things may be temporarily out of action or may have been superseded by changes to the service over the years.
- Remember that a backup of the website may be insufficient to regenerate or move the Blog – be clear about what additional infrastructure (such as a database) will be required.
- If you want to be able to look at the Blog offline and independently of a hosting service, investigate other options such as creating a hardcopy book, or using a tool such as HTTrack (which is discussed in the following entry).
ST’s Alternative Approaches
About 6 weeks ago (on 6th March), Sara Thomson of the Digital Preservation Coalition kindly spent some time on the phone with me discussing the archiving of web sites. I wanted to find out if there were any other solutions to the ones I had stumbled across in my brief internet search some 16 months ago. Sara suggested 3 approaches which were new to me and described them as follows in a subsequent email:
- UK Web Archive (UKWA) ‘Save a UK Website’: https://beta.webarchive.org.uk/en/ukwa/info/nominate Related to this – two web curators from the British Library (Nicola Bingham and Helena Byrne) presented at a DPC event last year discussing the UKWA, including the Save a UK Website function. A video recording of their talk along with their slides (and the other talks from the day) are here: https://dpconline.org/events/past-events/web-social-media-archiving-for-community-individual-archives
- HTTrack: https://www.httrack.com/ I gave a brief overview of HTTrack at that same DPC event last year that I linked to above. I have also included my slides at an attachment here – the HTTrack demo starts on slide 15.
- Webrecorder: https://webrecorder.io/ by Rhizome. Their website is great and really informative, but let me know if you have any questions about how it works.
Shortly after this, I followed the link that Sara had provided to the UKWA nomination site and filled in the form for pwofc.com. On 14th March I got a response saying that the British Library would like to archive pwofc.com and requesting that I fill in an on-line licence form which I duly completed. On 16th March I decided to explore the contents of the UKWA service and found it collects ‘millions of websites each year and billions of individual assets (pages, images, videos, pdfs etc.)’. I started looking at some of the blogs. The first one I came across was called Thirteen days in May and was about a cycling tour – but it seemed to lack some of the photos that were supposed to be there. The next two I looked at, however, did seem to have their full complement of photos; and one of them (called A Common Reader) had a strangely coincidental entry about ‘Instapaper’ which provides what sounds to be a very useful service for saving web sites for later reading. It looks like the UKWA does an automated trawl of all the websites under its wing at least once a year, so I guess that, as a backup, it should never be more than a year out of date.
An hour after completing this exploration, I got an email confirming that the licence form had been submitted successfully and advising that the archiving of pwofc.com would proceed as soon as possible but that it may not available to view in the archive for some time due to the many thousands of web sites being processed and the need to do quality assurance checks on each. Since then, I’ve been checking the archive every now and again, but pwofc.com hasn’t emerged yet. When it does, it’ll be interesting to see how faithfully it has been captured.
Regarding the other two suggestions that Sara made, I’ve decided to discount Webrecorder as that entails visiting every page and link in a website which would just take too much time and effort for pwofc.com. However, I’m going to have a go at using HTTrack, and I’m also going to try and get a backup of pwofc.com from my web hosting service. Having experienced all these various archiving solutions, there’ll be an opportunity to compare the various approaches and reach some conclusions.
The PAWDOC Preservation story
In May 2018 the inaugural digital preservation work on the PAWDOC collection was completed. The story of the work that was done, and the lessons that were learnt, are documented in the following paper which can be downloaded from this site subject to Creative Commons conditions:
The Application of Preservation Planning Templates to a Personal Digital Collection
Instances of the populated preservation planning templates that were used to control the work are also provided:
- PawdocDP SCOPING Document
- PawdocDP Preservation Project Plan DESCRIPTION
- PawdocDP Preservation Project Plan CHART
- PAWDOC Preservation MAINTENANCE PLAN
A summary of the work done and the lessons learned has been published as a Blog Post on the Digital Preservation Coalition (DPC) website.
The preservation planning templates were updated as a result of insights gained in the work and these are available as embedded files in the above ‘Application of Preservation Planning Templates’ paper and also in the DPC website.
Getting started with the Findings
Having initiated a preservation planning regime for the collection, and having moved it onto the Windows 10 platform, I’m feeling that the only remaining things I need to do with it are to find it a permanent home and to write up the findings of this lengthy experiment. I took a step forward on the latter activity earlier this week when I had a very interesting phone call with Peter Tolmie, a UK Ethnographer based in the School of Information Systems and New Media at the University of Siegen in Germany. I was given Peter’s name by Richard Harper when I asked if he knew of anyone who is knowledgeable about how professionals manage their documents and who would be interested in working on a wrap-up paper with me. An initial phone call with Peter last Thursday indicated that we have a great many common interests – I found it a very stimulating conversation indeed. I’ve sent Peter some documents describing the collection and we’ve agreed to talk again on 21st March.
Regarding the search for a home for the collection (which is documented in various posts in this Blog going back to 2015), my current efforts lie in conversations I’m having with Dr James Peters, the Archivist of the National Archive for the History of Computing at Manchester University, who has kindly agreed to help me in my search. In a phone call last month, James told me he was waiting for a response from someone he had emailed, but that, if there was no interest from that source, he could issue a note to a relevant mailing list on my behalf. If it is to be the mailing list route, I’m hoping to get James’ advice on what needs to go in the note.
March: Long and Plans
It looks like the blog post describing the Digital Preservation work undertaken last year on the PAWDOC collection, will be published next month on the DPC website. It will refer to the full paper describing the work in more detail, which will be published here within pwofc.com. At the same time, the preservation planning document templates will be replaced by updated versions in the DPC website. The publication of all these materials will be a fitting end to the preservation planning activities that are described in previous entries in this site. However, there will still be one thing to do before the topic can be considered complete and that is to review the effectiveness of the Preservation Maintenance Plan template when an instance of it will be used in the PAWDOC Preservation maintenance exercise scheduled for September 2021.