Getting an HTTrack Copy

HTTrack is a free-to-use website copier. Its web site provides the following description:  “It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site’s relative link-structure. Simply open a page of the “mirrored” website in your browser, and you can browse the site from link to link, as if you were viewing it online.”

I downloaded and installed HTTrack very quickly and without any difficulty, then I set about configuring the tool to mirror pwofc.com. This involved simply specifying a project name, the name of the web site to be copied, and a destination folder. The Options were more complicated and, for the most part, I just left the default settings before pressing ‘Finish’ on the final screen. There was an immediate glitch when I discovered that I had not provided the full web address (I’d specified pwofc.com instead of http://www.pwofc.com/ofc/); but having made that change, I pressed ‘Finish’ again and HTTrack got on with its mirroring.  Some 2 hours 23 minutes and 48 seconds later, HTTrack completed the job, having scanned 1827 links and having copied 1538 files with a total file size of 212 Mb.

The mirroring had produced seven components: two folders (hts-cache and www.pwofc.com) and 5 files (index, external, hts-log, backblue and fade).  The hts-cache folder is generated by HTTrack to enable future updates to the mirrored web site; the external file is a template page for displaying external links which have not been copied; backblue and fade are small gif images used in such templates; and the log file records what happened in the mirroring session. The remaining wwwpwofc.com folder and index file contain the actual contents of the mirror.

On double clicking the Index file, the pwofc.com home page sprang to life in my browser looking exactly the same as it does when I access it over the net. As I navigated around the site the internal links all seemed to work and all the pictures were in place, though the search facility didn’t work. External links produced a standard HTTrack page headed by “Oops!… This page has not been retrieved by HTTrack Website Copier. Clic to the link below to go to the online location!” – and indeed clicking the link did take me to the correct location (I believe it is possible to specify that external links can also be copied by setting the ‘Limit’ option ‘maximum external depth’ to one, but my subsequent attempt to do so ended with errors after just two minutes; I abandoned the attempt). The only other noticeable difference was the speed with which one could navigate around the pages – it was just about instantaneous. From this cursory examination I was satisfied that the mirror had accurately captured most, if not all, of the website.

An inspection of the log file, however, identified that there had been one error – “Method Not Allowed (405) at link www.pwofc.com/ofc/xmlrpc.php (from www.pwofc.com/ofc/)”. According to the net, a PHP file ‘is a webpage that contains PHP (Hypertext Preprocessor) code. … The PHP code within the webpage is processed (parsed) by a PHP engine on the web server, which dynamically generates HTML’. Interestingly, I wasn’t aware of having any content with such characteristics, but, on closer inspection of the files in my hosting folder, I found I had lots of them – probably hundreds of them. I tried to figure out what the error file related to but had no clue other than its rather striking creation date – 23/12/2016 at 00:00:00 – the same date as several of the other PHP files. I had not created any blog entries on that day, so my investigation ground to a halt. I don’t have the knowledge to explore this, and I’m not prepared to spend the time to find out. My guess is that the PHP files do the work of translating the base content stored in the SQL database into the structured web pages that appear on the screen. I’m just glad that there was only one error – and that its occurrence isn’t obviously noticeable in the locally produced web pages.

The log file also reported 574 warning which came in the form of 287 pairs. A typical example pair is shown below:

19:31:13        Warning:    Moved Permanently for www.pwofc.com/ofc/?p=987 19:31:13        Warning:    File has moved from www.pwofc.com/ofc/?p=987 to                                           http://www.pwofc.com/ofc/2017/06/29/an-ofc-model/

I tried to find a Help list of all the Warning and Error messages in the HTTrack documentation but it seems that such a list doesn’t exist. Instead there is a Help forum which has several entries relating to such warning messages – but none that I could relate to the occurrences in my log. As far as I can see, all of the pages mentioned in the warnings (in the above instance the title of the page is ‘an-OFC-Model’), have been copied successfully so I decided that it wasn’t worth spending any further time on it.

All in all, I judge my use of HTTrack to have been a success. It has delivered me a backup of my (relatively simple) site which I can actually see and navigate around, and which can be easily zipped up into a single file and stored.

A Backup Hosting Story

In the last few days I’ve been exploring making backup copies of this pwofc Blog using the facilities provided by the hosting company that I employ – 123-Reg. It was an instructive experience.

When I first set up the Blog in 2012 I had deliberately decided to spend a minimal amount of time messing around with the web site and to focus my energies on generating the stuff I was reporting in it. Consequently, most of my interactions with the hosting service had involved paying my annual fees, and I had little familiarity with the control panel functions provided to manage the web site. In 2014, I had made some enquiries about getting a backup, and the support operation had provided a zip file which was placed in my own file area. Since then I had done nothing else – I think I had always sort of assumed that, if something went wrong with the Blog, the company would have copies which could be used to regenerate the site.

However, when I asked the 123-Reg support operation about backups a few days ago, I was told that the basic hosting package I pay for does NOT include the provision of backups – and the company no longer provides zip files on request: instead, facilities are provided to download individual files, to zip up collections of files, and to download and upload files using the file transfer protocol FTP. Of these various options, I would have preferred to just zip up all the files comprising pwofc.com and then to download the zip file. However, the zipping facility didn’t seem to work and, on reporting this to the 123-Reg Support operation, I was told that it was out of action at the moment… So, I decided to take the FTP route.

I duly downloaded the free-to-use FTP client, FileZilla, set it up with the destination host IP Address, Port No, Username and Password, and pressed ‘Connect’. After a few seconds a dialogue box opened advising that the host did not support the secure FTP service and asking if I wanted to continue to transfer the files ‘in clear over the internet’. Naturally I was a little concerned, closed the connection, and asked 123-Reg Support if a secure FTP transfer could be achieved. I was told that it could be and was given a link to a Help module which would explain how. This specified that a secure transfer requires Port 2203 to be used (it had previously been set to 21), so I made the change and pressed ‘Connect’ again. Nothing happened. A search of the net indicated that secure FTP requires a Port No of 22, so I changed 2203 to 22 and, bingo, I was in.

FileZilla displays the local file system in a box on the left of the screen, and the remote file system (the pwofc.com files in this case) in a box on the right. Transferring the pwofc files (which comprise a folder called ‘ofc’, a file called ‘index’, and a file called ‘.htaccess’) was simply a matter of highlighting them and dragging them over to a folder in the box on the left. The transfer itself took about 12 minutes for a total file size of 246 Mb.

Of course, the copied files on my laptop are not sufficient to produce the web pages: they also require the SQL database which manages them to deliver a fully functioning web site. If you double click the ‘Index’ file it just delivers a web page with some welcome text but no links to anything else. Hence, these backup files are only of use to download back to the original hosting web site for the blog to be resurrected if the original files have become corrupted or destroyed. I guess they could also, in principle, be used to set up the site on another hosting service – though I have no experience of doing that.

Of course these experiences only relate to one customer’s limited experience of one specific hosting service and may or may not apply generally. However, they do indicate some general points which Blog owners might find worth bearing in mind:

  • Don’t assume that your hosting service could regenerate your Blog if it became corrupted or was destroyed – find out what backup facilities they do or don’t provide.
  • Don’t assume that all the functions provided by your hosting service work – things may be temporarily out of action or may have been superceded by changes to the service over the years.
  • Remember that a backup of the website may be insufficient to regenerate or move the Blog – be clear about what additional infrastructure (such as a database) will be required.
  • If you want to be able to look at the Blog offline and independently of a hosting service, investigate other options such as creating a hardcopy book, or using a tool such as HTTrack (which is discussed in the following entry).

ST’s Alternative Approaches

About 6 weeks ago (on 6th March), Sara Thomson of the Digital Preservation Coalition kindly spent some time on the phone with me discussing the archiving of web sites. I wanted to find out if there were any other solutions to the ones I had stumbled across in my brief internet search some 16 months ago. Sara suggested 3 approaches which were new to me and described them as follows in a subsequent email:

  1. UK Web Archive (UKWA) ‘Save a UK Website’: https://beta.webarchive.org.uk/en/ukwa/info/nominate Related to this – two web curators from the British Library (Nicola Bingham and Helena Byrne) presented at a DPC event last year discussing the UKWA, including the Save a UK Website function. A video recording of their talk along with their slides (and the other talks from the day) are here: https://dpconline.org/events/past-events/web-social-media-archiving-for-community-individual-archives
  2. HTTrack: https://www.httrack.com/  I gave a brief overview of HTTrack at that same DPC event last year that I linked to above. I have also included my slides at an attachment here – the HTTrack demo starts on slide 15.
  3. Webrecorder: https://webrecorder.io/ by Rhizome. Their website is great and really informative, but let me know if you have any questions about how it works.

Shortly after this, I followed the link that Sara had provided to the UKWA nomination site and filled in the form for pwofc.com. On 14th March I got a response saying that the British Library would like to archive pwofc.com and requesting that I fill in an on-line licence form which I duly completed. On 16th March I decided to explore the contents of the UKWA service and found it collects ‘millions of websites each year and billions of individual assets (pages, images, videos, pdfs etc.)’. I started looking at some of the blogs. The first one I came across was called Thirteen days in May and was about a cycling tour – but it seemed to lack some of the photos that were supposed to be there. The next two I looked at, however, did seem to have their full complement of photos; and one of them (called A Common Reader) had a strangely coincidental entry about ‘Instapaper’ which provides what sounds to be a very useful service for saving web sites for later reading. It looks like the UKWA does an automated trawl of all the websites under its wing at least once a year, so I guess that, as a backup, it should never be more than a year out of date.

An hour after completing this exploration, I got an email confirming that the licence form had been submitted successfully and advising that the archiving of pwofc.com would proceed as soon as possible but that it may not available to view in the archive for some time due to the many thousands of web sites being processed and the need to do quality assurance checks on each. Since then, I’ve been checking the archive every now and again, but pwofc.com hasn’t emerged yet. When it does, it’ll be interesting to see how faithfully it has been captured.

Regarding the other two suggestions that Sara made, I’ve decided to discount Webrecorder as that entails visiting every page and link in a website which would just take too much time and effort for pwofc.com. However, I’m going to have a go at using HTTrack, and I’m also going to try and get a backup of pwofc.com from my web hosting service. Having experienced all these various archiving solutions, there’ll be an opportunity to compare the various approaches and reach some conclusions.

Dust Jacket design and production

This week I put the finishing touches to ‘Touch the Join’; the end papers were glued to the boards and I completed the dust jacket. The volume now awaits a place on the top shelf of my cabinet when its current contents are digitised in the Electronic Story Book journey.

The creation of the dust jacket was particularly Interesting. The thickness of the book meant that there was considerably more space on the spine to do things that perhaps wouldn’t work at all in a narrower area. I decided to use the space to illustrate the title, so I took lots of photos of my fingers touching the bare leather spine and embedded the title in one of them with the result shown below.

The other two pictures on the spine are taken from within the book itself along with the other sixteen images that appear on the front and back of the jacket – all assembled in a PowerPoint slide. Fingers have been superimposed in another two of them to reiterate the message of the title. The front and back of  the dust jacket are shown below.

The inside flaps provided the opportunity to provide the following rationale for producing the book:

” In the years before retiring in 2012, I had accumulated a number of projects that I didn’t have time to work on. Things like the analysis of why I kept certain documents after scanning and not others; and the comparison of my incoming communications between  1981 and 2011; and the Roundsheet. As retirement approached, I began to realise that I could undertake all of these and more under a collective banner; and that, for some of them, I might be able to find  collaborators, academic or otherwise, to advise me  or to work with me. I thought of these as prospective journeys of discovery, unfettered by organisations or money.

To provide a structured framework within which to work, I decided to record a journal for each journey; and to set up a website to provide an open record of my activities for prospective collaborators to see what I was doing. Consequently, in April 2012, pwofc.com opened for business.   

One of the journeys I embarked on was digital preservation work on my lifetime collection of work documents, to ensure that its contents could be accessed in the future. The collection includes some self-contained web sites, so I investigated the best way of storing web sites long term. However there appeared to be no simple solutions. The industry standard WARK methodology seemed far to complex for my needs, so I stuck to the approach I had always taken – keeping all the files together in a zip file.

However, it did get me thinking about how to preserve pwofc.com; and I suddenly realised that a more tangible way of doing it would be to simply put it on a bookshelf. I had started attending bookbinding classes in Bedford in 2017, and I had already created, printed and bound a book of my own (Sounds for Alexa), so I knew it would be possible. I realised, also, that it would be an interesting opportunity to compare the features of a web blog and a book.

That’s how this tome came about. I wonder what its future life will be? My guess is that it’ll last longer than its electronic counterpart.”

The printing of the dust jacket was quite demanding because its length (60cm) required a custom print size to be set up in the canon MG3550 printer driver software; but the printer driver software does not permit Borderless Printing with custom print sizes. Hence it was not possible to avoid getting a 0.5 cm blank edge all round the print. Furthermore, the height required was exactly the widest the Canon MG3550 printer was physically capable of handling (22cm) but the printer driver software only permitted a custom size of 21.59 cm,  so a further 0.41 cm of blank space was introduced on the top or bottom edge. I produced 4 separate test prints and each time tried to get an equal amount of blank space on the top and bottom edge by moving the image to be printed up and down; but wasn’t able to achieve it, so I ended up with a larger space along the top edge and a smaller space along the bottom edge. I didn’t think this would look very good, so decided to fill the blank edges with gold wax gilt by using masking tape and applying the ‘KIng Gold’ version of Pebeo’s Gede guilding wax using a small paint brush with the hairs cut down to a length of about 5mm.

A further complication arose when trying to wrap the dust jacket around the book with the spine in the appropriate place. Because of the size of the book and the length of the dust jacket it was necessary to handle the print quite a lot to get it in the right place, and I found that some ink was coming off on my fingers. Despite experiencing this on the test prints and then being super careful with the final master version, some ink smears still found their way onto parts of the jacket. I’ve decided I’ll live with these for the time being. Perhaps, at my leisure in the future, I’ll have another go and leave the print for a few weeks in the hope that the ink fixes more securely.

Finally, covering the jacket with a sheet of transparent plastic (probably polypropylene) was relatively straightforward – just cut to size with several centimetre overlap all round and then fold over the top and bottom edges of the dust jacket. However, there’s an issue with using this material that I havn’t yet found a way to resolve: the plastic attracts all the dust and hairs that are already lying on the surface on which you cut it and fold it. I guess if I had a dedicated workbench which I could keep immaculately clean, that would do the job – but I don’t and have to make do with whatever area I can find that’s large enough to take an expanse of the 80cm wide roll. Consequently the outside of the cover had lots of bits on it which I have tried to remove using a damp cloth. However, there may also be bits on the inside of the plastic. Luckily, once the covered dust jacket is on the book, such bits are not immediately obvious to the casual eye.

So, that’s the whole story of the book of the blog. Perhaps there’ll be an accompanying volume in a few more years.

Just the Dust Jacket left to do

After about a dozen bookbinding classes, the 9cm stack of loose paper has been transformed into a tightly knit, disciplined, battalion of messengers. The metamorphosis involved 2-up stitching, attaching the end bands and hollow, securing the tapes to the boards, paring the leather and gluing it to the boards, and finally printing the gold lettered title on the spine. The photos below illustrate some of these intermediate stages.

Aside from the small matter of gluing down the end papers, there only remains the dust jacket to create, print and fit – a blank canvas which I’m looking forward to designing. Several people in my bookbinding class can’t understand why anyone would want to put a cover on a nice leather bound book, but, for me, there are two good reasons for doing so: first, my bookshelves are full of brightly coloured and good condition dust jacket spines – I don’t think plain spines look good among the rest of the books; and, secondly, the ability to personalise a book with a dust jacket design and to include additional descriptive text on the inside sleeves is a great opportunity to explain my relationship to the artefact and what it means to me – particularly for books I have created myself.

Book vs Blog

Now that the content of the book has been put to bed and the focus has turned to bookbinding activities, it seems a good moment to reflect on whether this attempt to replicate a web site in book form has worked or not. First, though, it’s important to be clear about the following differences between the pwofc.com site and most other web sites:

  • there are no adverts
  • all the material is static – the content doesn’t change or move while being viewed.

Having said that, there are several standard web site/blog features in pwofc.com which the physical book may, or may not, have been able to replicate. They include:

  1. Selectable Sections
  2. Links between sections
  3. Links to background in-site material
  4. Links to external web sites
  5. Enlargement of text and images
  6. Categorisation changes
  7. Addition at will
  8. Updating at will
  9. Correction at will
  10. Device display variability
  11. Copying capability
  12. Visibility
  13. Accessibility
  14. Storage capability

Here’s how each of these features were dealt with in the physical book:

1. Selectable Sections

Blog feature: The Blog content was divided into 22 separate topics which appeared permanently as a list down the right hand side of the screen. Whatever content was displayed in the main part of the screen, any topic could be selected and traversed to from the list on the right.

Book capability: The Book has no equivalent functionality with such a combination of immediacy and accuracy; however, it does enable the pages to be flicked through at will; and the contents list at the front allows the page number of a specific topic to be identified and turned to.

2. Links between sections

Blog feature: At any point in the Blog content a link could be inserted to any other Blog Post (though not to specific text within that Post). The links were indicated by specific text being coloured blue.

Book capability: The same text is coloured blue in the Book. In order to provide an equivalent linking capability, the date of the Post being linked to and the page number it is on are included in brackets immediately after the blue text.

3. Links to background in-site material

Blog feature: At any point in the Blog content a link could be inserted to additional material held as a background file in the web site. The file could be of any type that could be displayed – an image, a Word document, a spreadsheet, a PowerPoint presentation etc.. The links were indicated by specific text being coloured blue.

Book capability: The same text is coloured blue in the Book; and the content concerned is included as an Appendix at the back of the book. To provide an equivalent linking capability, the number of the Appendix, its name, and its page number  are included in brackets immediately after the blue text.

4. Links to external web sites

Blog feature: At any point in the Blog content a link could be inserted to a page in another web site. Sometimes the full web address was included in the Post, and at other times some descriptive text was provided. In both cases, however, the text was coloured blue and the relevant HTTP link was associated with it allowing the relevant web page to be immediately visited provided it still existed on the relevant web server.

Book capability: The same text is coloured blue in the Book. Where the HTTP link is provided in the Post then no further text is included in the book. However, where descriptive text is provided in the Post, then the full HTTP link is spelled out in brackets in the form, ‘see http.xxxxx’. To visit the page concerned a reader would have to type the HTTP address into a browser.

5. Enlargement of text and images

Blog feature: Browsers provide functionality to enlarge both text and images. This is of particular use to people who have poor eyesight; and to those wishing to see greater detail in some of the images included with the text.

Book capability: Books have no such integral functionality. Readers have to employ glasses or magnifying glasses to see enlarged text or images. I don’t know for sure whether greater detail and clarity can be achieved with browser magnification or with magnifying glasses on print, however, a comparison of the screen and the printed page version of one of the images (on page 713 of the Book) indicates that much definition is lost in the printing process.

6. Categorisation changes

Blog feature: Current topics in the Blog are listed under the heading ‘Journeys in progress’; whilst completed topics are moved under the heading ‘Journeys KCompleted’ (the inclusion of a K at the beginning of ‘Completed’ is simply to ensure that Completed Journeys  was lower down the alphabet than Journeys in Progress and therefore would appear underneath the list of Journeys in Progress  – I wasn’t prepared to waste further time figuring out how to achieve this in WordPress/html).

Book capability: The Book reflects the status of the web site at a particular point in time and therefore doesn’t need to have this capability. However, this really glosses over a key, fundamental, difference between a Blog and a Book. The blog is a dynamic entity – it can keep changing; whereas a Book has fixed contents. Of course, a Book’s contents can be added to by handwriting in additional material; and the contents of a Book can be read in different orders if appropriate signposting is provided. For example, this particular book could be read in the order that the Contents are listed, or in the order of the entries shown in the Timeline section – though this latter approach would be rather laborious since it would involve a lot of leafing through the Book. Overall, however, a Book simply does not have the Blog’s ability to be changed.

7. Addition at will

Blog feature: New Topics, new Posts within a Topic, and new material within a Post can be added to a Blog at will. In some circumstances this may be considered advantageous. However, it also means that readers cannot be sure that what they have already read is the latest material. There is no feature to highlight what is new.

Book capability: As described in item 6 above, a Book simply does not possess the Blog’s ability to be changed. However, readers can be secure in the knowledge that once they have read the Book they know what it contains and have finished what they set out to do.

8. Updating at will

Blog feature: The contents of a Post can be updated at will, though, as described in 7 above, this may leave readers feeling uncertain about the contents. There is no feature to highlight what has changed.

Book capability: As described in item 7 above, a Book simply does not possess the Blog’s ability to be changed; however, at least readers know that once they have read the Book they know what it contains and have finished what they set out to do.

9. Correction at will

Blog feature: Corrections of typos, poor grammar, and factual errors, can be made to the contents of a Post at will. There is no feature to highlight what has changed, though this perhaps is only of concern for the correction of factual errors – readers will not be interested in corrections to typos or poor grammar.

Book capability: Although corrections can be made by hand on the Book’s pages, the handwriting is likely to detract from the book’s appearance.  As described in item 7 above, a Book simply does not possess the Blog’s ability to be changed. However, at least readers know that once they have read the Book they know what it contains and have finished what they set out to do.

10. Device display variability

Blog feature: The Blog may be read on a variety of different devices including a large screen, a laptop screen, a tablet, and a mobile phone.  Not only are the sizes of the screens on each of these devices different; but they are likely to be employing different browser software to display the pages. These differences mean that a Blog may appear to be significantly different from one device to another. For this particular Blog, the list of topics down the right hand side is transposed to the bottom of narrower screens, which makes it significantly more difficult for users to navigate the material. Furthermore, for users who are not familiar with the site and its contents, may simply not be aware that the list of topics exists and so may feel they are lost without any signposts in a morass of text.

Book capability: There is no such variability with the Book. It is what it is. What you see is what you get. Everyone who reads it gets the same physical experience. From this perspective the Book is considerably more reliable than the Blog.

11. Copying capability

Blog feature: All parts of the Blog can be copied and then pasted into other applications such as a Word document. There are limits as to how much can be copied at once – only the material in a single screen can be copied in one go. However, multiple screens can be copied separately and then stitched together in the receiving application.

Book capability: The Book’s pages can be copied and/or scanned individually or in pairs – though the way the book is assembled will probably preclude the pages being laid flat on the copy/scan platen which could result in a slightly blurred image towards the edge of the spine.

12. Visibility

Blog feature: The Blog is invisible in the huge black hole of the internet. It only becomes visible when people put it in their browser bookmarks, receive notifications of new entries, or see references to it in other electronic or paper documents.

Book capability: The Book will be very visible on a bookshelf in the house it will reside – more so because of its unusually large size – but it will only be visible to a very few people.

13. Accessibility

Blog feature: The Blog is accessible from all over the world provided that its web address is known or that individuals can find the address by using a search engine such as Google. However, this may not be so easy for a small scale web site with a title containing a very commonly used phrase – Order From Chaos (though it’s easier for those inquisitive enough to try the initials OFC).

Book capability: The Book will be immediately accessible to only those in the house where it resides (though this is an extreme case because only one copy of the book will be printed; normally, books have larger print runs and therefore would be accessible to more people). If other people get to know about the Book and want to read it, they would have to request its loan from the owner and make arrangements to obtain it.

14. Storage capability

Blog feature: The Blog takes up no physical space in its own right, and, being of a relatively small digital size, takes up negligible electronic space.  However, a fee has to be paid every year to the organisation that hosts it, and the owner has to have a certain amount of technical knowledge to maintain it in its storage facility (to add new material, update versions of WordPress and its Plug-ins, and to review comments). A copy of the Blog can be obtained from the hosting site in the form of a large zip file. However, I’ve no idea if it would be possible to reconstitute this into a viable web site in a different computing environment, some years downstream.

Book capability: The Book takes up an appreciable amount of bookshelf space – more than usual due to its very large size. However, other than making space for it on the bookshelf and placing it there, there is nothing further to do to store it – and it will remain there intact for many years. Moving it to another bookshelf or other storage facility will not be difficult.

 

Given all the above comparisons, it seems that there is no clear answer to the question of whether the Book has been able to successfully replicate the Blog. The two entities are clearly different animals – the Blog is a dynamic vehicle accessed in a variety of devices; whilst the Book provides a point-in-time snapshot in a standard, well understood, format. The Book probably presents the material in a broadly comparable way, even if it facilitates cross referencing in a rather slower and more cumbersome way. The Blog is hugely more widely accessible and visible, but is much more complicated to store. Regarding longevity, instinct says that the Book’s chances are much better than the Blog’s over the coming decades

Bookfold experiences

This morning I finished printing the 52 sixteen page sections (four A4 pages printed landscape and double sided and then folded in half) and what a pile they make – just over 9cm.

Unfortunately the 100 gsm paper I was hoping to have used would have shown the text through from the reverse of the page. Instead, I ended up with 130 gsm paper which is normally sold in large sheets, but which George Davidson’s supplier kindly cut down to A4. I got 250 sheets for £15 which is a really excellent price, and which gave me a 42 spare page cushion in case things went wrong in the course of printing – which, of course, they always do. In this case, I had four hiccups:

  • It seems that images in PNG format upset the printing of documents using Bookfold page setup. They cause  adjacent text to be printed on the other half of the page. I wasn’t aware of this problem and was only able to confirm that was the cause when I replaced the PNG image with the same image in JPG format. The first time it happened I had to reprint all four pages. After that I was careful to check in Print Preview mode and was able to fix two other instances without wasting any paper.
  • I used about two and a half Canon 3550 ink jet cartridges in the course of the print, and because the printer can’t provide an accurate indication of when the ink is about to run out, I elected to just print until the quality deteriorated. This happened twice so on those occasions I lost at least two or more sheets of paper.
  • One of the Appendices was a document with a contents page in which the page numbers had been automatically generated. No problem had been apparent when I edited this page, however when this page printed it produced extra lines stating ‘Error! Bookmark not defined’ for the last 4 items on the contents list. This had a knock-on effect on all subsequent pages and extended the printing of this section onto a seventeenth page. Fixing the problem was simply a matter of removing all the page numbers from this Contents list and reprinting – however, I lost four pages in the process.
  • The final cause of paper wastage was typical human error: I decided I would print a later section while trying to fix one of the problems already mentioned; and the distraction of trying to find a solution caused me to lose track of where I was up to and to print the same section twice – another four pages down the swanee.

Anyway, despite these problemettes I still ended up with 12 spare pages; but it is a salutary reminder that it is essential to have a good supply of spares (paper and ink) when embarking on a substantial print run.

In the course of this exercise I’ve learnt a lot more about the Bookfold Page Setup in Microsoft Word and how to manage it’s printing. As already mentioned, with Bookfold selected, Word enables you to create text on pages which are half the width of a landscape A4 page. It is possible to create all the pages in a single file and to use Page Setup to specify how many pages each section/booklet should have (each section/booklet is sewn separately into the book’s textblock).  However, I prefer to have my sections in separate files because a) I haven’t been able to get the printer I use to do duplex printing successfully when using the Bookfold Setup – the reverse pages are printed upside down (the solution is described below); and b) I find it easier to manage the edit and print processes in small chunks, despite the need to ensure continuity of text and page numbers from one file to another.

To print with the Bookfold Page Setup I’ve been using the standard settings that come up (Print All, A4 etc.) with the exception of specifying the following settings in the print dialogue boxes:

  • Manual duplex
  • Preview before printing
  • Orientation – Landscape
  • Print quality – High

On selecting ‘Print’ this arrangement results in a preview window being displayed which allows you to view the front side of each of the four pages. If there appears to be a problem this is the point to Cancel out and to take whatever remedial action is required. However, if all looks good, selecting ‘OK’ will result in the front side of the four pages to be printed. This is the point at which you need to enact the manual duplex procedure: take the four pages out of the printer and place the top page on one side at the bottom of a new pile. Take the next page and place it on top of the new pile. Do the same for each of the third and fourth pages. Then place the new pile, facing in the same direction, into the page feeder tray and press OK on the dialogue screen shown below.

When the reverse sides of the pages have been printed take them from the printer and place the top page on one side at the bottom of a new pile. Take the next page and place it on top of the new pile. Do the same for each of the third and fourth pages. If you take the new pile and fold it over you should find that the 16 pages are in the correct order. I’m constantly amazed that this does actually work – but it really does.

Specifying ‘Preview Before Printing’ provides a valuable opportunity to check that all is well before committing to the print run. Unfortunately the Preview only displays the front sides of the pages, so that a problem on the reverse of the pages could waste a lot of paper. However, this can be avoided by checking the Preview of the reverse sides before setting the print run going. If a problem is spotted, the print can be cancelled and the problem fixed. Then, with the problem-free front pages in the paper feed tray, the whole print run can be started again but, this time, the front side print should be cancelled in the main Print screen. However, the ‘remove the printout’ dialogue box will still be present and pressing OK will result in the Preview and Print screens for the reverse pages being displayed. Accepting these print options will result in the reverse pages being printed on the back of the problem-free front pages.

Each of the sixteen page sections took about 10 minutes to print provided no problems were encountered. After each section was produced it was carefully folded and the crease pressed in. Now the bookbinding work starts with the pricking out of the holes for the thread which will sew the sections together. It’s going to be fascinating to see how such a large number of pages can be turned into a viable book.

The slog of the blog book

I’m pushing ahead with the book of the blog. Having established a cut-off date for the end of 2017, I made sure that I cleared away two of my long-standing journeys (OFC and Roundsheet) by the deadline, and ended up with about 350 pages of blog posts. That’s when the grind really started and I had to go through all of them, separating them into 16 page sections ready for bookbinding. As I went through I was ensuring that the background documents accessed from links in the blog were reproduced in full in an Appendix. This was a major exercise which eventually produced a further 465 pages – all of which in their turn had to be separated into 16 page sections.

I now have 52 separate sixteen page sections, and another final section which is growing as I edit each section one last time and assemble the index and the timeline (a list of post titles in date order). In this final edit I’m also ensuring that the cross-post links and the links to Appendix documents are all consistently formatted and include the correct page number to elsewhere in the book. I decided to do this because it is the effortless ability to jump between links, and the absence of any particular space constraints, that distinguishes electronic systems from paper books – and I have taken advantage of both features extensively in the blog. So, when I decided to reproduce the blog in book form, I was determined to try to match those capabilities to the greatest extent I could. Hence, ALL the background documents have been included; and every cross reference includes a page number that goes straight to the relevant content. The only links that don’t have a page number reference are those to material elsewhere in the net which is produced by other people – I rationalised that a blog book should only include material produced by the owner of the blog.

The inclusion of linking page numbers and the creation of the index and timeline are making the final edit a slow process which may take a couple of weeks. In the meantime, I’ve been thinking about the type of paper I should use to print the book. Having assembled all the text, I can see that, if I used the same paper as I used for the ‘Sounds for Alexa’ book, the text block would be 5.5 times the thickness of the Sounds book – some 8.25 cm – a huge tome. The Sounds book was printed on 125 gsm paper, so I tried looking on the net for some thinner bookbinding paper but had no success – specialist A4 bookbinding papers sold in packs as opposed to single sheets, seem to be few and far between and I didn’t come across any that were thinner than 125 gsm. I discussed this with George Davidson, my tutor on the Bookbinding course at the Bedford Arts and Crafts Centre, and he said he would investigate a 100 gsm paper with one his regular suppliers and suggested that it might be feasible to buy a paper in larger sheets and cut them down to A4. In the meantime, I will continue to plough through the final editing of the 50+ sections.

A cursory tour of web archiving

Web archiving isn’t a simple proposition because not only do web sites keep changing, but they also have links to other sites. So, I guess I should have expected that my search for web archiving tools would come up with a disparate array of answers. It seems that the gold-plated solution is to pay a service such as Smarsh or PageFreezer to periodically take a snapshot of a website and to store it in their cloud. The period is user-definable and can be anything from every few hours to every month or year. Smarsh was advertising its basic service at $129 a month at the time of writing.

A more basic, do-it-yourself facility, is the Unix WGET command line function for which a downloadable Windows version is available. This enables all sorts of functions to be specified including downloading parts or all of a site, the scheduling of downloads etc.. However, as you might expect with a Unix function, it requires the user to input programming-type commands and to be aware of a large number of specifiable options.

More limited services such as Archive.is are available to capture, save and download individual pages – and some of these are free to use.

Regarding formats in which web archives can be saved, the Library of Congress’ preferred format is the ISO WARC (Web ARChive) file format. However, I was unable to find any tools or services which purport to store files in this format: it sounds like WARC is being used in the background by large institutions who are trying to preserve large volumes of web content. Interestingly the web hosting service I use for the this blog actually offers backups in various forms of zip files; and indeed, it is zip files that I have used in the past to store web sites that are included in my document collection.

Based on this very quick and certainly incomplete tour of the topic of Web Archiving, I’ve decided I won’t be trying to do anything fancy or different in the way I use technology to archive my old web sites. The zip format has worked well up to now and I see no reason to change that approach. As for a non-technological solution to web archiving, the notion of creating and binding a physical book of the first five years of this OFC web site is becoming more and more attractive. There’s something very solid and immutable about a book on a bookshelf. I’m definitely going to do that, and have set the end of 2017 as the cut-off date for its contents – I’m busy trying to make sure that the Journeys are all at appropriate stages by the 31st December.

The Printing Solution

Pwofc.com was born 5 years ago and, as it has covered more topics and grown in size, the likelihood of being able to reconstitute it should some disaster occur, seems to becoming increasingly remote. So, when I started to systematically go through every entry in the blog to tease out OFC  insights, it occurred to me that I could, at the same time, copy the contents into a word document which could subsequently be printed and bound into a hardcopy book in just the same way as the Sounds for Alexa book has been produced. That’s what I did, and I now have a 227 page document containing the main contents of the site. I now need to add in the 40 Appendix documents which have links from the main text. The final book may well have around 400 pages or more – but that shouldn’t present a bookbinding problem.

I haven’t established yet whether there is a standard website archiving solution which makes it easy to reconstitute and access a site; however, even if there is one, I think I shall feel more comfortable knowing that I actually have all the content in a single backed-up file. I shall feel even more comfortable when I have the book of pwofc.com in my bookcase.