Final Planning underway

Since about last April, I’ve been planning various aspects of the project to preserve my PAWDOC document collection.  This has included:

  • Deciding what to do with zip files
  • Analysing problem files identified by the DROID tool
  • Figuring out how to deal with files that won’t open
  • Investigating all the physical disks associated with the collection including backup disks

All of this work has now been completed, and a clear plan identified for each individual item that requires some preservation work.

In parallel, I have been exploring the possibility of moving the collection’s documents out of the Document Management System it currently resides in (Fish), to standard windows application files residing in Windows Explorer folders. This has included detailed planning of the structure of the target files, and of the process that would have to be undertaken to achieve the transformation. The Fish supplier has recently told me that a utility to undertake this move is now available, and I have confirmed that I want to go ahead with this approach. We are now entering a phase of detailed testing and further planning to verify that this is a viable and sensible way forward. Should no significant obstacles be identified, I anticipate being ready to undertake the move out of the Fish system sometime in January 2018.

Since the bulk of the planning work has now been completed, it has been possible to assemble a draft Preservation Project Plan CHART which itemises each piece of work that will be required. Using this is a base, and incorporating the outcome of the work on the utility with the Fish supplier, I shall start to assemble the overall Preservation Project Plan Description document, and to allocate timescales and effort to each task on the plan.

Dealing with Disks

One very specific aspect of digital Preservation is ensuring that the contents of physical disks can be accessed in the future. I found I had four types of challenges in this area: 1) old 5.25 and 3.5 disks that I no longer have the equipment to read; 2) a CD with a protected video on it that couldn’t be copied; 3) two CDs with protected data on them that couldn’t be copied; and 4) about 120 CDs and DVDs containing backups taken over a 20 year period. My experiences with each of these challenges are described below:

1)  Old 5.25 and 3.5 disks: I looked around the net for services that read old disks and I eventually decided to go with LuxSoft after making a quick phone call to reassure myself that this was a bona fide operation and the price would be acceptable. I duly followed the instructions on the website to number and wrap each disk, before dispatching a package of 17 disks in all (14 x 5.25, 2 x 3.5, 1 x CD). Within a week I’d received a zip file by email of the contents of those disks that had been read and an invoice for what I consider to be a very reasonable £51.50.  The two 3.5 disks and 1 CD presented no problems and I was provided with the contents. The 5.25 disks included eight which had been produced on Apple II computers in the mid 1980s and these LuxSoft had been unable to read. I was advised that there are services around that can deal with such disks but that they are very expensive; and that perhaps my best bet would be to ask the people at Bletchley Park (of Enigma fame) who apparently maintain lot of old machines and might be willing to help. However, since these disks were not part of my PAWDOC collection and I didn’t believe there was anything particularly special on them, I decided to do nothing further with them and consigned them to the loft with a note attached saying they could be used for displays etc. or destroyed. Of the six 5.25 disks that were read, most of the material was either in formats which could be read by Notepad or Excel, or in a format that LuxSoft had been able to convert to MS Word, and this was sufficient for me to establish that there was nothing of great import on them. However, one of 5.25 disks (dating from 1990), contained a ReadMe file explaining that the other three files were self-extracting zip files – one to run a communication package called TEAMterm; one to run a TEAMterm tutorial; and one to produce the TEAMterm manual. Since this particular disk was part of the PAWDOC collection (none of the other 5.25 disks were), I asked LuxSoft to do further work to actually run the self-extracting zips and to provide me with whatever contents and screen shots that could be obtained. I was duly provided with about 30 files which included the manual in Word format and several screen shots giving an idea of what the programme was like when it was running. LuxSoft charged a further £25 for this additional piece of work, and I was very pleased with the help I’d been given and the amount I’d been charged.

2) CD with Protected Video files: This CD contained files in VOB format and had been produced for me from the original VHS tape back in 2010. The inbuilt protection prevented me from copying them onto my laptop and converting them to an MP4 file. After searching the net, I found a company called Digital Converters based in the outbuildings of Newby Hall in North Yorkshire which charged a flat rate of £10.99 + postage to convert a VHS tape and to provide the resulting MP4 file in the cloud ready to be downloaded. It worked like a dream: I created the order online, paid the money, sent the tape off, and a few days later I downloaded my mp4 file.

3) CDs with protected data: I’d been advised that one way to preserve the contents of disks is to create an image of them – a sector-by-sector copy of the source medium stored in a single file in ISO image file format. This seemed to be the best way to preserve these two application installation disks which had resisted all my attempts to copy and zip their contents. After reading reviews on the net, I decided to use the AnyBurn software which is free and which is portable (i.e. it doesn’t need to be installed on your machine – you just double click it when you want to use it). This proved extremely easy to use and it duly produced image files of the two CDs in question in the space of a few minutes.

4) Backup CDs and DVDs: The files on these disks were all accessible, so I had a choice of either creating zip files or creating ISO image files. I chose to create zips for two reasons: first, I wanted to minimise the size of the resulting file and I believe that the ISO format is uncompressed; and, second, on some of the disks I only needed to preserve part of the contents and I wasn’t sure if that can be done when creating a disk image.

Having been through each of these 4 exercises, there are some general conclusions that can be drawn:

  • The way to preserve disks is to copy their contents onto other types of computer storage.
  • The file size capacities of old disk formats are much smaller than the capacities of contemporary computer storage formats. For example, none of the 5.25 disks contained files totalling more than 2 Mb; the CDs contain up to about 700 Mb; and even the DVDs contain no more than 4.7 Gb. In an era where 1Tb hard disks are commonplace, these file sizes aren’t a problem.
  • There are three stages in preserving disk contents; first, just getting the contents from the disk onto other storage technology; second, being able to read the files; and third, should the contents include executables, being able to actually run the programs.
  • The decision about whether you want to achieve stages 2 or 3 will depend on whether you think the contents and what they will be used for, merit the extra effort and cost involved. In the case of the 5.25 disk containing TEAMterm software described above, providing a capability to run the application would have involved finding an emulator to run on my current platform and getting the programme to work on it. I judged that to be not worth the effort for the purpose that the disk’s contents were being preserved for (to be a record of the artefacts received by an individual working through that stage of the development of computer technology).

Listening to New Stuff with Alexa

Back in February, I reported on my attempts to get Alexa to play the albums in our music collection. I’d found the following:

Coverage: about 80% of our albums were present in the Amazon Music Unlimited library.

Specifying Discs and tracks: for albums consisting of more than one disc, there appears to be no way of specifying that Alexa should start playing Disc 2 as opposed to Disc 1; and, similarly, there’s no way of getting Alexa to play a particular track number.

Voice Recognition: Alexa couldn’t recognise about 10% of the Artist/Title combinations even though I had checked that they were actually available in Amazon’s Music Unlimited library.

Since then I’ve been using Alexa and Amazon Music Unlimited to listen to newly issued albums reviewed in the Guardian/Observer newspapers, and now have a further substantial set of experience to compare with my original findings. The first thing to say is that being able to listen to complete albums, as opposed to just samples of each track from Amazon on my laptop (as I have been doing previously), is, obviously, a far more rewarding experience; and to be able to listen to a range of new releases from start to finish, regardless of whether or not they suit one’s innate preferences, is a real luxury. Most I will never listen to again – and some I have cut short because I really didn’t like them; but there are a few which I’ve really liked and have made a note of at the back of our ‘Sounds for Alexa’ book. At least I now feel a bit more in touch with what sort of music is being produced these days.

Now, to get back to the topics I covered in my earlier findings; below are my further observations on each of the points:

Coverage: Since last February I’ve checked out eleven lots of review sections comprising write-ups of 121 albums. Fourteen of these albums were issued in CD format only, and all the other 107 albums were available in Amazon in MP3 format. All but nine of these 107 were advertised as being available for streaming or available to ‘Listen with your Echo’ (the latter being the Alexa device); and of these nine, six did actually play through the Echo device.  Of the three that didn’t, two would play only samples (Bob Dylan’s ‘Triplicate’, and The Unthanks’ ‘The songs and poems of Molly Drake’); and for the other one (Vecchi Requiem by Graindelavoix/Schmetzer) Alexa repeated “Vecchi Requiem” perfectly but said she was unable to find any album by that name. Given that only three items were actually unavailable, I conclude that a lot of the new albums that are being issued in digital format are available in the Amazon Music Unlimited service.

Specifying Discs and tracks: It still appears to be the case that it’s not possible to specify that Alexa play the 2nd disk in a two disk album, nor to play a particular track number. To get round the multiple disks problem, a number of people in the Reddit noticeboard suggest creating a playlist in which the two discs are listed separately. As for the track number, Alexa will step through the tracks if you keep saying ‘next track’; but, if you really do want a particular track played, the best way to achieve that is to use the name of the track when requesting it – both of the following worked for me:  ‘Play Kashmir by led Zeppelin’ and ‘Play Cromwell by Darren Hayman’.

Voice Recognition: Of the 121 albums I checked out, Amazon claimed that 98 of them were available to play through the Echo, whereas, in fact, I could only get 85 of them to play. For eleven of the other thirteen albums, Alexa just couldn’t understand what I was requesting; and in the remaining two cases, Alexa a) insisted on playing “Rock with the Hot 8 Brass Band” instead of “On the spot” by the Hot 8 Brass band, and b) played Mozart‘s Gran Partita by the London Philharmonic instead of by the London Symphony Orchestra. Turning to the 85 albums that did play through the Echo, it was significant that only 59 of them played at the first time of asking. For the other 26, I had to repeat the request at least twice and as many as six times (these details are included in this Recognition Analysis spreadsheet). Naturally I was trying out all sorts of combinations of all or part of the particular album title and artist. After much trial and error I have taken to first asking for both the album title and the artist (play me X by Y); then, if that doesn’t work, to ask for the album title on its own (or even just parts of the album title – for example, 1729 for the album title “Carnevale 1729”). Finally, as a last resort, to just ask for the Artist. This strategy proved successful in all but 3 of the 26 instances that didn’t play at the first time of asking. These figures indicate that Alexa’s voice recognition capabilities haven’t improved much since my last write-up in February. This view is reinforced by my (undocumented) experiences of trying to get Alexa to tell me about various golf, rugby and cricket events. Her responses have usually been either about a completely different event or just that she doesn’t know. Perhaps I’m not asking the questions in the right way….. at least Alexa is usually able to provide a weather forecast at the first time of asking. In her defence, I should mention that my son seems to have no trouble in adding all sorts of outlandish things to our Alexa shopping bag (which, I should add, we don’t use – Alexa just provides it if you want to put things into it).

From this summary of my recent experiences with Alexa, it seems that little has changed. Whilst Alexa’s voice recognition capabilities don’t seem to have improved much, the usefulness of the device compared with having stacks of CDs around, is undiminished. So much so, in fact, that we have replaced our last remaining CD player, which was in the conservatory, with  another Echo device; and we’ve upgraded to Amazon Music Unlimited for 10 devices at £9.99 a month.

There are undoubtedly many other uses that we could be putting Alexa to – the weekly email from Amazon always suggests several new things that one can ask her or get her to do. We haven’t really followed any of them up. Perhaps I’ll get to printing out the email each week and putting it next to the echo as a prompt. Or maybe I won’t  – we’ll see.  One thing’s for sure: what with all our CDs in the loft, and no stand-alone CD player, Alexa is going to be with us for the indefinite future.

The Verdict

Back in April, I asked 6 friends to pass rapid judgement on my latest attempt to define the Roundsheet application.  I asked them to give the document a quick scan and to provide answers to the questions below with either Yes or No, and to feel free to add comments or suggestions if they wished.

  1. Were you able to understand what was being described?
  2. Do the Roundsheet concepts make sense to you?
  3. Do you think a Roundsheet application would give users something they haven’t got already?
  4. Is it worth pursuing this idea any further?

I received 5 replies. Not everyone was able to answer either Yes or No to each question – some answers were ‘partly’ or ‘sort of’ or ‘don’t know’ or ‘possibly’; so, I’m going to classify all such in-between kind of responses as ‘not sure’. Applying this rule of thumb, the bare number results were:

Question Yes Not Sure No
Able to understand? 3 1 1
Concepts make sense? 3 1 1
Something user’s haven’t got? 2 3 0
Worth pursuing? 2 2 1

 Some of the comments were interesting:

“one would think that many spreadsheets have pie chart functionalities: your concept is really about how to use those functionalities”

“whether this could be a protected tool? I think you would have difficulty”

“still wondering why the roundsheet format is any better than the tabular format which apparently could be used instead?”

“may be of interest to those who get frustrated by the complexities of linking Microsoft Excel and PowerPoint applications – I have come across a few of those in my career!”

“I feel there is a core there that could be extremely useful”

However, overall, there’s no overwhelming consensus that this is a winning idea, and it’s probably unlikely that time spent trying to promote its development into a product would be rewarded; so I think this is the time to put this journey to bed. I have enjoyed the intellectual challenge it has given me; and have the satisfaction of knowing that I took the ideas as far as I could and finished the job. I could always bring the topic out of retirement should someone come along with a serious interest in taking it forward.

Thank you to all of you who, over the years, have taken the time to wade through and pass comment on the various specification documents.

Hikes through the preservation hinterland

I’ve just finished dealing with two particular digital preservation challenges that exist within the document collection I’m currently working on. The first involved two Lotus Notes files; and the second concerned some Windows Help files. My experience with these issues illustrates a) how just a few files can take a lot of work to resolve, and b) that there’s often an answer out there to seemingly impossible preservation problems provide you are prepared to look diligently enough.

I really didn’t believe I was going to find a way to unlock the Lotus Notes files since Notes is a major and very expensive piece of software that I don’t possess; and, in any case, it applies sophisticated time-limited password and encryption controls for its use. Despite being aware of these issues, I thought I’d take a quick look on the net to see if I could find any relevant advice. It was time well spent; I discovered that it’s possible to download a local evaluation copy of Notes for 90 days, and that, because it doesn’t run on a server, this sometimes enables old Lotus Notes files to be opened. I duly downloaded the software and installed it; and then, regardless of the mysteries of Notes access controls, had access to the whole of one of the files (which contained conference-type material) and to parts of the other (which contained sent messages). I still had the username and expired password from the time the files were created and I think this may have helped to access the latter – though I’m not sure about that. Anyway, in both cases, I was able to print out the material to PDF files. I had to manually reorder the conference-type material and to reinstate a few hundred links in it, but that was it – job done!

The Windows Help files were a lot more demanding. Microsoft stopped supporting the WinHelp system (.HLP files) in 2006 in favour of its replacement, Compiled HTML Help (.CHM files). Although Microsoft did issue a WinHelp viewer for Windows 7 in 2009, WinHelp is essentially an obsolete format – it isn’t supported in Windows 10. I’m still running a Windows 7 system so am still able to view the HLP files – but they had to be converted now if they are ever to be accessed again in the future.

There is much material on the net about how to convert HLP files into CHM files, but, as someone with no knowledge at all about how files in either of these systems are constructed, I didn’t find it easy to understand. I soon realised that converting from one to the other was going to be a challenge. However, I did eventually find a web site which offered clear practical advice which I could follow (http://www.help-info.de/en/Help_Info_WinHelp/hw_converting.htm), and I duly downloaded the recommended HLP decompiler; and the Microsoft HTML Help Workshop software. The process to be followed went something like this:

  • Decompile the HLP file into its component parts (consisting of a help project file with the extension .hpj, along with one or more .rtf documents, an optional .cnt contents file, and any image files – .bmp, .wmf, or .shg – that are used within the Help file).
  • Convert the various HLP files into HTML Help files using a wizard in the HTML Help Workshop tool (the new files consist of a project file with the extension .hhp, one or more HTML files, a .hhc contents file, an optional .hhk index file, and any image files that are used within the Help file).
  • Set parameters in the hhp file to specify a standard Window name and size; and to have a search capability created when the files are compiled into a single CHM file.
  • Reconstruct the Table of Contents using the original HLP file as a guide (in many cases no Table of Contents information comes through the conversion process – and, even when some did, it had lost its numbering). Where the contents had to be created from scratch, each new content item created had to be linked to the specific HTML file to be displayed when that content item is selected.
  • Re-insert spacings in headings: The conversion process also loses the spacing in headings in the base material resulting in headings that look like this, ‘9.1Revised System’ instead of like this ‘9.1  Revised System’. To rectify this problem, the spacings have to be manually re-inserted into each HTML file of base material.
  • Compile the revised files into a single CHM file.

The first HLP file I tried this out on contained just a single Help document with some 130 pages. It took a bit of figuring out, but I eventually got the hang of it. However, the second HLP item was in fact made up of 86 separate HLP files all stitched together to present a unified Table of Contents in a single window in which the base material was also displayed. Many of these 86 separate files had 50 or more pages, and some had many more than that; and each page had to represented separately in the Table of Contents. It was a very long tortuous job converting all 86 HLP files and ensuring that each one had a correct Table of Contents (I didn’t attempt to re-introduce the spacing in the headings – that would have been a torture too far). However, that was not the end of it; the files then had to be stitched together in a single overall file that combined all the individual Tables of Content and that displayed all the base material. This involved inserting a heading for each document, in the master file; and inserting a linking command to call up the Table of Contents for that particular document. Oh, and I should also mention that the HTML Help File Workshop software was very prone to crashing – not a little irritating – I soon learnt to save regularly…..

This overall task must have taken at least 30 or 40 hours – but I did get there in the end. The new CHM file works fine and is perfectly usable, despite three of the documents being displayed in separate windows instead of the single main window (although I spent some time on this issue I was unable to eliminate the problem). Of course, the lack of spacing in the headers is immediately noticeable – but that’s just cosmetics!

No doubt there are specialists out there who would have made a quicker and better job of these conversion activities. However, if you can’t find such people or you haven’t got the money to throw at them, the experiences recounted above show that, with the help of the net, it’s worth having a go yourself at what you may consider to be your most difficult digital preservation challenges.

Binding Sounds – Part 2

When I last wrote about creating the Sounds for Alexa book, I’d finished sewing the text block. The next step was to glue the mull (thin gauze) to the spine, then to glue some Kraft paper on down the spine on top of the mull, and finally to glue the blue and white end bands, as shown below.

Next, the colourful end papers I had selected were folded in two and glued with a thin, 3mm wide, line of PVA to the inside of the text block – one at the front and one at the back. Cardboard was then cut to the size of the text block plus a 4mm overlap all the way round, and glued to the mull and the tapes as shown in the picture below.

I’d elected to have a leather cover (as opposed to cloth) which is longer lasting and has a more luxurious appearance (see the picture below). However, the downside of leather is that it is thicker and less pliable and so it produces unsightly bulges when it is turned over the edges of the cardboard frame, and noticeable ridges at its edges. To solve this problem, the leather is pared down to a much reduced thickness at the points where it is to be turned and along its edges – in effect all but the central area as shown in the picture below .

Paring leather is a task easier said than done. Doing it manually involves using a very sharp blade and shaving off thin layers over and over again until the requisite thinness is achieved. This is a very messy process which produces lots of small fragments of leather which get everywhere. It took me a couple of hours to complete the task and the particular tool I was using caused me to lose the feeling in my thumbs and forefingers. No doubt it gets easier as one becomes more proficient, however, if I ever use leather again, I shall be investigating using a Paring Service which I’m told can be hired to do the same job much faster and to a better quality than I could hope to achieve, by using machines.

Once the paring was done, the leather was glued to the cardboard frame, the end papers were stuck down to the front and back boards, and the dust jacket was printed and cut to size and folded around the book. Finally a protective plastic cover was fitted to the dust jacket. The completed book is shown below.

Sounds for Alexa has now taken its place next to Alexa in our kitchen diner, as shown in the picture below. We will see whether it gets put to use as originally envisaged over the coming months.

Scoping Document Finalised

Back in February, work started on the draft Scoping Document for the digital preservation actions required on the PAWDOC collection. Having spent some months actually doing bits of the work identified in the document and refining the document with the insights gained in the process, the final version of the Scoping Document has now been completed. It includes the following list of things that have to be done before a Project Plan can be produced:

  • Decide what document management system or alternative, and any associated databases, are to be used going forward.
  • Decide if Filemaker is to be retained as the platform for the Index or if it is to be replaced going forward.
  • Establish the future platform strategy.
  • Research and understand the actions required to:
    • make any moves planned from one piece of software to another; or from one platform to another;
    • be able to open those documents that don’t currently open;
    • promote the long term accessibility and survivability of all categories of document in the collection;
    • mitigate against the collection’s CDs and DVDs becoming unreadable;
    • mitigate against the electronic part of the collection being separated from the physical part.

Unfortunately Jan Hutar and Ross Spencer have decided they are unable to make any further substantial contributions to the project due to time pressures and other reasons. However, I continue to hope that they will remain associated with the work and be prepared to answer questions by email as needed. Their input to the early part of the work has been invaluable in getting the project to the point where I am actively investigating the practicalities of moving the electronic documents out of the Fish document management system into flat files in a Windows directory. The Fish supplier has a utility which will perform such a transformation, but much will depend on whether it can be customised to produce the file title format required and how much it will cost.

Alongside this activity, work continues on files that can’t be opened and on issues identified by the DROID analysis. Given the position that the project is in at present I would anticipate being able to complete the project plan sometime in the next 9 – 12 months.

Cover Art

The Sounds for Alexa book is nearly finished now with its leather cover on and only the end papers to stick down. I’ll describe these final stages in a subsequent entry when it’s completed. In the meantime, with the outside dimensions of the book fixed, I’ve been able to get on with the cover.

The dimensions of a book cover pose a bit of a problem since they are much longer than the normal paper that you can buy to print on. This particular cover will need to be 21 cm high and some 53 cm long. I was able to solve that problem by remembering the roll of surplus wallpaper lining paper that I had stored away in a poster tube for our grandchildren to draw on at some point in the future. It turned out to be sufficiently strong for a book cover but pliable enough to go through the printer – a delicate balance I’d fallen foul of before when trying to print things for weddings. Setting up the printer wasn’t a problem – you just set a custom page length (of up to a maximum of 676 mm in the case of my printer) and the printer will chug away and print the length you desire.

I decided to create the cover in Powerpoint and started off by setting the page size to be about a centimetre each way larger than the actual dimensions I needed because the printer usually leaves a blank border area of at least half a centimetre around the edge of the page. I figured that if I made the picture a little bit bigger than I need I’d be able to cut off these blank edges to get the exact size required.

Ever since deciding on the book’s sub-title – ‘A listing of Su and Paul’s digitised LPs, Cassettes, Tapes and CDs for use in the marriage of Alexa to Aye Fon’ – I’d had a picture in my mind of a wedding ceremony between our Amazon Echo and my iPhone surrounded by the turntable, ghetto blaster, and laptop, and all the digitised LPs, singles, cassettes and CD’s which I now retain in the loft for proof of ownership purposes. I took a look around our house and decided our patio would be the place to take this photo, and took some experimental shots to see where things should go. This transpired to be very important because the title down the spine turned out to be in a part of the photo which had a brick wall in shade and consequently the black of the title was lost in the black of the shade.

I did some more experimentation and finally fixed upon an appropriate angle to take the shot, and waited for a sunny day. I knew it was going to be quite a big job laying out all the technology, LPS, cassettes and CDs on the patio and wanted to minimise the time they were left in the sun in case they got heatstroke, so I enlisted the help of my son and his wife to help in the photo-shoot and get things laid out and back inside as quickly as possible. We did the shoot on the 2nd July which turned out to be particularly hot so we really did need to work as quickly as possible. However we managed it and got several shots and got all the stuff back in the house and packed up in its boxes ready to go back in the loft.  Quite a palaver – too much of a palaver to have to do it again – so the pictures had to be right.

Well the pictures were OK and I did manage to get a cover – but, in the heat of the day and the moment, I made some errors which I guess a professional photographer would have picked up on straight away. I thought I would be able to enlarge and move the picture in powerpoint to get the exact position for the text for the spine. However, this proved to be very difficult without cutting off a substantial part of the key elements of the photo. What I should have done is taken the picture from further away. My experimentation had not been detailed enough and my photography had been inexperienced and rushed. Such are the differences between the amateur and the professional.

Nevertheless, I was able to choose a photo which minimised the problems and which I was able to get the spine title in what seemed to be a fairly readable position. For the inside flap texts I lifted some extracts from previous posts in this blog, and then I was all set to print out a copy and see how it looked and fitted. The result was OK but there were some issues with the position of the title down the spine (it still went into some shaded areas where the black text wasn’t as clear as I would have liked); and the text on the back flap was a little too far away from the edge of the flap. I made the spine text smaller and adjusted the position of the back flap accordingly. However, a more intractable problem was that the picture simply wasn’t high enough; there would have to be a few millimetres of white space at the top and bottom of the cover because the print had been produced with a border.  I started to explore the print options and eventually came to the conclusion that borderless printing – which is what I needed to get the height I required – is unavailable for anything other than smaller Photo Papers. My printer simply does not support Photo Papers of 220 x 570 mm  (which is what I required); and does not support Borderless printing for anything other than Photo papers.

I compromised. I abandoned the quest to enable borderless printing and elected instead to go for High Resolution Paper and High Quality. The result still had a few millimetres of white border at the top and bottom of the cover, but the title was now clear and the print quality was noticeably better. I decided to quit while I was ahead. So my final print settings were:

  • Media Type: High Resolution Paper
  • Print Quality: High
  • Page Size: Width 200mm, Height 570mm
  • Printer Paper Size: Width 215.9mm, Height 570mm
  • Orientation: Landscape

The image I printed is shown below.

The Printing Solution

Pwofc.com was born 5 years ago and, as it has covered more topics and grown in size, the likelihood of being able to reconstitute it should some disaster occur, seems to becoming increasingly remote. So, when I started to systematically go through every entry in the blog to tease out OFC  insights, it occurred to me that I could, at the same time, copy the contents into a word document which could subsequently be printed and bound into a hardcopy book in just the same way as the Sounds for Alexa book has been produced. That’s what I did, and I now have a 227 page document containing the main contents of the site. I now need to add in the 40 Appendix documents which have links from the main text. The final book may well have around 400 pages or more – but that shouldn’t present a bookbinding problem.

I haven’t established yet whether there is a standard website archiving solution which makes it easy to reconstitute and access a site; however, even if there is one, I think I shall feel more comfortable knowing that I actually have all the content in a single backed-up file. I shall feel even more comfortable when I have the book of pwofc.com in my bookcase.

Droid explorations and DMS alternatives

Things have started to move in our efforts to perform digital preservation on the PAWDOC collection. I’ve been running the National Archives DROID tool across the 190,000 files and Ross’s automated analysis of the results has turned up a number of issues including several hundred duplicates which we are investigating. Among other things, DROID identifies file types and versions, and this has helped another strand of our investigations to try and gain access to about three hundred files which can no longer be opened. 150 of these are old PowerPoint files from the early 90s which neither the Microsoft viewer nor the earliest version of OpenOffice can open. However, the Zamzar online service, to which you download a file and specify what format you want it to become, successfully converted all of the examples which I submitted, into a version of Powerpoint I can open. Zamzar can’t deal with every problem file, especially those for which I no longer have the relevant application, for example, MS Project and iThink, though it did convert Visio drawings into PDF. We’re continuing to work through these files with the intention of getting a clear decision about what to do with each one so that specific actions can be included in our eventual preservation project plan.

Another substantial investigation underway is to try and identify a suitable alternative to the document management system (DMS) that controls the collection’s files. The future of the current DMS is uncertain, and is too complex to reinstall on upgraded hardware without expensive consultancy support. Jan’s exploration of alternative DMS and preservation repositories, highlighted the fact that, while there are several free to use public domain systems available, they all require multiple components and appear to be relatively complex to install, configure, and maintain. This observation has prompted me to be a lot clearer about the immediate requirements for the collection. It is hoped to find a long term owner, perhaps working in the field of modern history, and it’s possible that that person or organisation may require more sophisticated search and access control functions. However, until that eventual owner is found, only a minimal level of single user functionality is needed, and minimal system management and cost demands are essential. In light of this greater clarity, we are now also considering a low tech, low cost alternative which would involve inserting the Index reference number into the title of every file and storing all the files in the standard Windows folder system. After identifying a required Reference No in the Index, files would be accessed by putting the reference number into the folder system’s standard search facility. As well as looking at the pros and cons of such a solution, we are also investigating the feasibility of  getting the necessary information out of the current DMS and into the titles of all the document files. A further challenge that would have to be overcome is that the current DMS stores multi-page documents as a series of separate TIF files. If we were to move to the low tech Windows folder system solution, it would first be necessary to combine the files making up a single document into one single file. This would need to be an automated process as there are too many documents to contemplate doing it manually.

All these activities and more are required in order to be able to assemble a project plan with unambiguous tasks of known duration. We are continuing to work towards this goal.