If you have picture rails in your house, they can be used to hang Christmas cards on a line. However, picture hooks are designed to be pulled directly downwards, and a sideways force just pulls them along. What’s needed are some weighty Christmas-themed framed pictures which could be brought out each year with the other Christmas decorations and hung on the Xmas card line picture hooks to keep them in place. The pictures could be permanent or the frames could have slots for different things each year such as photos of last year’s Christmas party, or the best of last year’s cards.
Web archiving isn’t a simple proposition because not only do web sites keep changing, but they also have links to other sites. So, I guess I should have expected that my search for web archiving tools would come up with a disparate array of answers. It seems that the gold-plated solution is to pay a service such as Smarsh or PageFreezer to periodically take a snapshot of a website and to store it in their cloud. The period is user-definable and can be anything from every few hours to every month or year. Smarsh was advertising its basic service at $129 a month at the time of writing.
A more basic, do-it-yourself facility, is the Unix WGET command line function for which a downloadable Windows version is available. This enables all sorts of functions to be specified including downloading parts or all of a site, the scheduling of downloads etc.. However, as you might expect with a Unix function, it requires the user to input programming-type commands and to be aware of a large number of specifiable options.
More limited services such as Archive.is are available to capture, save and download individual pages – and some of these are free to use.
Regarding formats in which web archives can be saved, the Library of Congress’ preferred format is the ISO WARC (Web ARChive) file format. However, I was unable to find any tools or services which purport to store files in this format: it sounds like WARC is being used in the background by large institutions who are trying to preserve large volumes of web content. Interestingly the web hosting service I use for the this blog actually offers backups in various forms of zip files; and indeed, it is zip files that I have used in the past to store web sites that are included in my document collection.
Based on this very quick and certainly incomplete tour of the topic of Web Archiving, I’ve decided I won’t be trying to do anything fancy or different in the way I use technology to archive my old web sites. The zip format has worked well up to now and I see no reason to change that approach. As for a non-technological solution to web archiving, the notion of creating and binding a physical book of the first five years of this OFC web site is becoming more and more attractive. There’s something very solid and immutable about a book on a bookshelf. I’m definitely going to do that, and have set the end of 2017 as the cut-off date for its contents – I’m busy trying to make sure that the Journeys are all at appropriate stages by the 31st December.
Since about last April, I’ve been planning various aspects of the project to preserve my PAWDOC document collection. This has included:
- Deciding what to do with zip files
- Analysing problem files identified by the DROID tool
- Figuring out how to deal with files that won’t open
- Investigating all the physical disks associated with the collection including backup disks
All of this work has now been completed, and a clear plan identified for each individual item that requires some preservation work.
In parallel, I have been exploring the possibility of moving the collection’s documents out of the Document Management System it currently resides in (Fish), to standard windows application files residing in Windows Explorer folders. This has included detailed planning of the structure of the target files, and of the process that would have to be undertaken to achieve the transformation. The Fish supplier has recently told me that a utility to undertake this move is now available, and I have confirmed that I want to go ahead with this approach. We are now entering a phase of detailed testing and further planning to verify that this is a viable and sensible way forward. Should no significant obstacles be identified, I anticipate being ready to undertake the move out of the Fish system sometime in January 2018.
Since the bulk of the planning work has now been completed, it has been possible to assemble a draft Preservation Project Plan CHART which itemises each piece of work that will be required. Using this is a base, and incorporating the outcome of the work on the utility with the Fish supplier, I shall start to assemble the overall Preservation Project Plan Description document, and to allocate timescales and effort to each task on the plan.
It was the combination of my mother saying that she was finding it increasingly difficult to write legibly in over a hundred Christmas cards, and the presence of Alexa in our house, that made me think that we need SPARDS – spoken Christmas cards. They would have recording and playback capabilities so that you could just take one out of the pack you’ve just bought and start talking to the person that’s next on your Christmas Spard list. How nice for the person receiving it, to just make a cup of tea, put it on the mantelpiece, and sit back and relax to listen to your recorded missive and greetings for Christmas and the New Year.
One very specific aspect of digital Preservation is ensuring that the contents of physical disks can be accessed in the future. I found I had four types of challenges in this area: 1) old 5.25 and 3.5 disks that I no longer have the equipment to read; 2) a CD with a protected video on it that couldn’t be copied; 3) two CDs with protected data on them that couldn’t be copied; and 4) about 120 CDs and DVDs containing backups taken over a 20 year period. My experiences with each of these challenges are described below:
1) Old 5.25 and 3.5 disks: I looked around the net for services that read old disks and I eventually decided to go with Luxsoft after making a quick phone call to reassure myself that this was a bona fide operation and the price would be acceptable. I duly followed the instructions on the website to number and wrap each disk, before dispatching a package of 17 disks in all (14 x 5.25, 2 x 3.5, 1 x CD). Within a week I’d received a zip file by email of the contents of those disks that had been read and an invoice for what I consider to be a very reasonable £51.50. The two 3.5 disks and 1 CD presented no problems and I was provided with the contents. The 5.25 disks included eight which had been produced on Apple II computers in the mid 1980s and these LuxSoft had been unable to read. I was advised that there are services around that can deal with such disks but that they are very expensive; and that perhaps my best bet would be to ask the people at Bletchley Park (of Enigma fame) who apparently maintain lot of old machines and might be willing to help. However, since these disks were not part of my PAWDOC collection and I didn’t believe there was anything particularly special on them, I decided to do nothing further with them and consigned them to the loft with a note attached saying they could be used for displays etc. or destroyed. Of the six 5.25 disks that were read, most of the material was either in formats which could be read by Notepad or Excel, or in a format that Luxsoft had been able to convert to MS Word, and this was sufficient for me to establish that there was nothing of great import on them. However, one of 5.25 disks (dating from 1990), contained a ReadMe file explaining that the other three files were self-extracting zip files – one to run a communication package called TEAMterm; one to run a TEAMterm tutorial; and one to produce the TEAMterm manual. Since this particular disk was part of the PAWDOC collection (none of the other 5.25 disks were), I asked Luxsoft to do further work to actually run the self-extracting zips and to provide me with whatever contents and screen shots that could be obtained. I was duly provided with about 30 files which included the manual in Word format and several screen shots giving an idea of what the programme was like when it was running. Luxsoft charged a further £25 for this additional piece of work, and I was very pleased with the help I’d been given and the amount I’d been charged.
2) CD with Protected Video files: This CD contained files in VOB format and had been produced for me from the original VHS tape back in 2010. The inbuilt protection prevented me from copying them onto my laptop and converting them to an MP4 file. After searching the net, I found a company called Digital Converters based in the outbuildings of Newby Hall in North Yorkshire which charged a flat rate of £10.99 + postage to convert a VHS tape and to provide the resulting MP4 file in the cloud ready to be downloaded. It worked like a dream: I created the order online, paid the money, sent the tape off, and a few days later I downloaded my mp4 file.
3) CDs with protected data: I’d been advised that one way to preserve the contents of disks is to create an image of them – a sector-by-sector copy of the source medium stored in a single file in ISO image file format. This seemed to be the best way to preserve these two application installation disks which had resisted all my attempts to copy and zip their contents. After reading reviews on the net, I decided to use the AnyBurn software which is free and which is portable (i.e. it doesn’t need to be installed on your machine – you just double click it when you want to use it). This proved extremely easy to use and it duly produced image files of the two CDs in question in the space of a few minutes.
4) Backup CDs and DVDs: The files on these disks were all accessible, so I had a choice of either creating zip files or creating ISO image files. I chose to create zips for two reasons: first, I wanted to minimise the size of the resulting file and I believe that the ISO format is uncompressed; and, second, on some of the disks I only needed to preserve part of the contents and I wasn’t sure if that can be done when creating a disk image.
Having been through each of these 4 exercises, there are some general conclusions that can be drawn:
- The way to preserve disks is to copy their contents onto other types of computer storage.
- The file size capacities of old disk formats are much smaller than the capacities of contemporary computer storage formats. For example, none of the 5.25 disks contained files totalling more than 2 Mb; the CDs contain up to about 700 Mb; and even the DVDs contain no more than 4.7 Gb. In an era where 1Tb hard disks are commonplace, these file sizes aren’t a problem.
- There are three stages in preserving disk contents; first, just getting the contents from the disk onto other storage technology; second, being able to read the files; and third, should the contents include executables, being able to actually run the programs.
- The decision about whether you want to achieve stages 2 or 3 will depend on whether you think the contents and what they will be used for, merit the extra effort and cost involved. In the case of the 5.25 disk containing TEAMterm software described above, providing a capability to run the application would have involved finding an emulator to run on my current platform and getting the programme to work on it. I judged that to be not worth the effort for the purpose that the disk’s contents were being preserved for (to be a record of the artefacts received by an individual working through that stage of the development of computer technology).
Back in February, I reported on my attempts to get Alexa to play the albums in our music collection. I’d found the following:
Coverage: about 80% of our albums were present in the Amazon Music Unlimited library.
Specifying Discs and tracks: for albums consisting of more than one disc, there appears to be no way of specifying that Alexa should start playing Disc 2 as opposed to Disc 1; and, similarly, there’s no way of getting Alexa to play a particular track number.
Voice Recognition: Alexa couldn’t recognise about 10% of the Artist/Title combinations even though I had checked that they were actually available in Amazon’s Music Unlimited library.
Since then I’ve been using Alexa and Amazon Music Unlimited to listen to newly issued albums reviewed in the Guardian/Observer newspapers, and now have a further substantial set of experience to compare with my original findings. The first thing to say is that being able to listen to complete albums, as opposed to just samples of each track from Amazon on my laptop (as I have been doing previously), is, obviously, a far more rewarding experience; and to be able to listen to a range of new releases from start to finish, regardless of whether or not they suit one’s inate preferences, is a real luxury. Most I will never listen to again – and some I have cut short because I really didn’t like them; but there are a few which I’ve really liked and have made a note of at the back of our ‘Sounds for Alexa’ book. At least I now feel a bit more in touch with what sort of music is being produced these days.
Now, to get back to the topics I covered in my earlier findings; below are my further observations on each of the points:
Coverage: Since last February I’ve checked out eleven lots of review sections comprising write-ups of 121 albums. Fourteen of these albums were issued in CD format only, and all the other 107 albums were available in Amazon in MP3 format. All but nine of these 107 were advertised as being available for streaming or available to ‘Listen with your Echo’ (the latter being the Alexa device); and of these nine, six did actually play through the Echo device. Of the three that didn’t, two would play only samples (Bob Dylan’s ‘Triplicate’, and The Unthanks’ ‘The songs and poems of Molly Drake’); and for the other one (Vecchi Requiem by Graindelavoix/Schmetzer) Alexa repeated “Vecchi Requiem” perfectly but said she was unable to find any album by that name. Given that only three items were actually unavailable, I conclude that a lot of the new albums that are being issued in digital format are available in the Amazon Music Unlimited service.
Specifying Discs and tracks: It still appears to be the case that it’s not possible to specify that Alexa play the 2nd disk in a two disk album, nor to play a particular track number. To get round the multiple disks problem, a number of people in the Reddit noticeboard suggest creating a playlist in which the two discs are listed separately. As for the track number, Alexa will step through the tracks if you keep saying ‘next track’; but, if you really do want a particular track played, the best way to achieve that is to use the name of the track when requesting it – both of the following worked for me: ‘Play Kashmir by led Zeppelin’ and ‘Play Cromwell by Darren Hayman’.
Voice Recognition: Of the 121 albums I checked out, Amazon claimed that 98 of them were available to play through the Echo, whereas, in fact, I could only get 85 of them to play. For eleven of the other thirteen albums, Alexa just couldn’t understand what I was requesting; and in the remaining two cases, Alexa a) insisted on playing “Rock with the Hot 8 Brass Band” instead of “On the spot” by the Hot 8 Brass band, and b) played Mozart‘s Gran Partita by the London Philharmonic instead of by the London Symphony Orchestra. Turning to the 85 albums that did play through the Echo, it was significant that only 59 of them played at the first time of asking. For the other 26, I had to repeat the request at least twice and as many as six times (these details are included in this Recognition Analysis spreadsheet). Naturally I was trying out all sorts of combinations of all or part of the particular album title and artist. After much trial and error I have taken to first asking for both the album title and the artist (play me X by Y); then, if that doesn’t work, to ask for the album title on its own (or even just parts of the album title – for example, 1729 for the album title “Carnevale 1729”). Finally, as a last resort, to just ask for the Artist. This strategy proved successful in all but 3 of the 26 instances that didn’t play at the first time of asking. These figures indicate that Alexa’s voice recognition capabilities haven’t improved much since my last write-up in February. This view is reinforced by my (undocumented) experiences of trying to get Alexa to tell me about various golf, rugby and cricket events. Her responses have usually been either about a completely different event or just that she doesn’t know. Perhaps I’m not asking the questions in the right way….. at least Alexa is usually able to provide a weather forecast at the first time of asking. In her defence, I should mention that my son seems to have no trouble in adding all sorts of outlandish things to our Alexa shopping bag (which, I should add, we don’t use – Alexa just provides it if you want to put things into it).
From this summary of my recent experiences with Alexa, it seems that little has changed. Whilst Alexa’s voice recognition capabilities don’t seem to have improved much, the usefulness of the device compared with having stacks of CDs around, is undiminished. So much so, in fact, that we have replaced our last remaining CD player, which was in the conservatory, with another Echo device; and we’ve upgraded to Amazon Music Unlimited for 10 devices at £9.99 a month.
There are undoubtedly many other uses that we could be putting Alexa to – the weekly email from Amazon always suggests several new things that one can ask her or get her to do. We haven’t really followed any of them up. Perhaps I’ll get to printing out the email each week and putting it next to the echo as a prompt. Or maybe I won’t – we’ll see. One thing’s for sure: what with all our CDs in the loft, and no stand-alone CD player, Alexa is going to be with us for the indefinite future.
The inspiration for this thread for Ideas came from a paper-based Ideas Book which I set up in 1972. It didn’t really get many entries and some of them were more reflection than specific ideas; and it’s lain dormant for many years. So, I’ve just scanned and destroyed the physical Ideas Book; however, for completeness, I’ve recorded below some of the items from it (suitably summarised where necessary).
Throwback 1 – 08Jan1972 – The idea of an ideas book
I guess the first idea to go into this Ideas Book must be the idea of having an Ideas Book. Basically, I think that, although thought is of paramount importance, thought without action is a great waste, both of time and – yes – ideas! So, in future, if, sorry – when, I get some crazy idea – something completely original and far out (like establishing a World Tune Library with a cataloguing system based on every possible combination of notes over, say, a two minute period; when a new tune is sent in to the library it is played into an analogue/digital computer, and this would then produce a ‘catalogue number’ – it would be most interesting to see just how many more tunes were available at any one time), then I will write it down and it will be on record to act upon, elaborate, or even just to read over and laugh! Something quite amusing about this Ideas Book is that maybe the only idea I ever put into it that ever gets acted on will be this first idea to have an Ideas Book….
Throwback 2 – 11Jan1972 – A light design
A light-come-ceiling decoration system could be constructed out of hollow cylinders made of stiff white paper of varying lengths – width 32cm with 2cm of that used for overlap and the varying height resulting in holes of about 10cm width at various different levels. A reflector could be made by covering a sheet of stiff card with bacon foil (to which the vertical cylinders could be secured) which could be fixed to the ceiling using aerofix)
Throwback 3 – 20May1972 – The soundproofing stubber
The vast quantities of cigarette stubs that are wasted could be used as sound proofing material by manufacturing attractive boxes which have a stubber through which the tip would be released into an inner, cheap and recyclable, container which could be removed and sent to a sound proofing company. Profits could be made from the sale of decorative external boxes and from the sale of sound proofing material made from used tips. Stubbing would be cleaner and more efficient; and there would be a reduction in cigarette tip pollution.
Throwback 4 – 21May1972 – Investigating the warping point
Our experience of the world tells us that there is a causal factor for everything. When I look at the stars I think about who or what put them there, because logic informs me that there must be an answer. As I think about this question I assimilate all the relevant information I have until there is maximum capacity thought but an inability to provide an answer. The result is a split second of total confusion. It would be interesting to see what electroencephlograph readings appear when this point – the warping point – is reached. I wonder if other people have the same experience, and, if so, would the measures vary depending on the level of comprehension that different people have of the question? Would the measures change over time as people increase their comprehension of the question?
Throwback 5 – 01Jan1984 – Simultaneous phoning and computing
It would be useful to have a unit that would interface between a Type 96A jack plug and a home’s telephones/computers. The unit would enable, at the very least, the simultaneous use of the telephone and the use of the computer over the networks.
Throwback 6 – 01Jan1984 – Game designer CBT – and the potential for progs
A computer-based training program could help children design the logic of a computer game i.e. the design specification prior to programming. A program like that could sell for £1 a time. With the right kit at home and a link to the networks, you could design, build and test such a CBT program in the space of 24 hours at home and be selling it immediately over the networks. If it was a novel and good enough idea the mass network market would soon provide 25,000 purchasers; so you could have made £25,000 within 48 hours of first having had the idea.
Throwback 7 – 28Dec1996 – Crucial pursuits
There are five crucial pursuits for members of the human race:
- Making other individual humans feel good through love, tenderness, intimacy, caring, understanding, and good deeds.
- Creating the conditions for other humans to have better lives.
- Learning and understanding about the world and universe about us and about our fellow humans and the way we live.
- Learning and understanding the origins of humankind and the important findings, discoveries, secrets and developments that humankind has made and encountered.
- Learning and understanding the origins, secrets, and meaning of the universe and its relevance to ourselves and humankind.
Sometimes you hear about people who are always invited to events but never host any themselves. Similarly, some people don’t respond to communications or Xmas cards; and it’s not uncommon for presents sent to growing child relatives to remain unacknowledged or thanked for. In all such situations the giver begins to feel a little aggrieved with the situation, but perhaps feels it inappropriate to raise the matter directly with the individual concerned. To assist all those in such circumstances, it might help if there was an unobtrusive but clear way of signifying dissatisfaction. Perhaps a code could be attached to the bottom of an address or invitation in the same vein as SWALK (sealed with a loving kiss). I suggest NORNOMOT (pronounced nora-no-mo) standing for No Reciprocity (or Response) No More Of This. Maybe the greeting card manufacturers could create special NORNOMOT cards which include pictures of a Last Chance Saloon.
I’ve just finished dealing with two particular digital preservation challenges that exist within the document collection I’m currently working on. The first involved two Lotus Notes files; and the second concerned some Windows Help files. My experience with these issues illustrates a) how just a few files can take a lot of work to resolve, and b) that there’s often an answer out there to seemingly impossible preservation problems provide you are prepared to look diligently enough.
I really didn’t believe I was going to find a way to unlock the Lotus Notes files since Notes is a major and very expensive piece of software that I don’t possess; and, in any case, it applies sophisticated time-limited password and encryption controls for its use. Despite being aware of these issues, I thought I’d take a quick look on the net to see if I could find any relevant advice. It was time well spent; I discovered that it’s possible to download a local evaluation copy of Notes for 90 days, and that, because it doesn’t run on a server, this sometimes enables old Lotus Notes files to be opened. I duly downloaded the software and installed it; and then, regardless of the mysteries of Notes access controls, had access to the whole of one of the files (which contained conference-type material) and to parts of the other (which contained sent messages). I still had the username and expired password from the time the files were created and I think this may have helped to access the latter – though I’m not sure about that. Anyway, in both cases, I was able to print out the material to PDF files. I had to manually reorder the conference-type material and to reinstate a few hundred links in it, but that was it – job done!
The Windows Help files were a lot more demanding. Microsoft stopped supporting the WinHelp system (.HLP files) in 2006 in favour of its replacement, Compiled HTML Help (.CHM files). Although Microsoft did issue a WinHelp viewer for Windows 7 in 2009, WinHelp is essentially an obsolete format – it isn’t supported in Windows 10. I’m still running a Windows 7 system so am still able to view the HLP files – but they had to be converted now if they are ever to be accessed again in the future.
There is much material on the net about how to convert HLP files into CHM files, but, as someone with no knowledge at all about how files in either of these systems are constructed, I didn’t find it easy to understand. I soon realised that converting from one to the other was going to be a challenge. However, I did eventually find a web site which offered clear practical advice which I could follow (http://www.help-info.de/en/Help_Info_WinHelp/hw_converting.htm), and I duly downloaded the recommended HLP decompiler; and the Microsoft HTML Help Workshop software. The process to be followed went something like this:
- Decompile the HLP file into its component parts (consisting of a help project file with the extension .hpj, along with one or more .rtf documents, an optional .cnt contents file, and any image files – .bmp, .wmf, or .shg – that are used within the Help file).
- Convert the various HLP files into HTML Help files using a wizard in the HTML Help Workshop tool (the new files consist of a project file with the extension .hhp, one or more HTML files, a .hhc contents file, an optional .hhk index file, and any image files that are used within the Help file).
- Set parameters in the hhp file to specify a standard Window name and size; and to have a search capability created when the files are compiled into a single CHM file.
- Reconstruct the Table of Contents using the original HLP file as a guide (in many cases no Table of Contents information comes through the conversion process – and, even when some did, it had lost its numbering). Where the contents had to be created from scratch, each new content item created had to be linked to the specific HTML file to be displayed when that content item is selected.
- Re-insert spacings in headings: The conversion process also loses the spacing in headings in the base material resulting in headings that look like this, ‘9.1Revised System’ instead of like this ‘9.1 Revised System’. To rectify this problem, the spacings have to be manually re-inserted into each HTML file of base material.
- Compile the revised files into a single CHM file.
The first HLP file I tried this out on contained just a single Help document with some 130 pages. It took a bit of figuring out, but I eventually got the hang of it. However, the second HLP item was in fact made up of 86 separate HLP files all stitched together to present a unified Table of Contents in a single window in which the base material was also displayed. Many of these 86 separate files had 50 or more pages, and some had many more than that; and each page had to represented separately in the Table of Contents. It was a very long tortuous job converting all 86 HLP files and ensuring that each one had a correct Table of Contents (I didn’t attempt to re-introduce the spacing in the headings – that would have been a torture too far). However, that was not the end of it; the files then had to be stitched together in a single overall file that combined all the individual Tables of Content and that displayed all the base material. This involved inserting a heading for each document, in the master file; and inserting a linking command to call up the Table of Contents for that particular document. Oh, and I should also mention that the HTML Help File Workshop software was very prone to crashing – not a little irritating – I soon learnt to save regularly…..
This overall task must have taken at least 30 or 40 hours – but I did get there in the end. The new CHM file works fine and is perfectly usable, despite three of the documents being displayed in separate windows instead of the single main window (although I spent some time on this issue I was unable to eliminate the problem). Of course, the lack of spacing in the headers is immediately noticeable – but that’s just cosmetics!
No doubt there are specialists out there who would have made a quicker and better job of these conversion activities. However, if you can’t find such people or you haven’t got the money to throw at them, the experiences recounted above show that, with the help of the net, it’s worth having a go yourself at what you may consider to be your most difficult digital preservation challenges.
I’ve been reading an increasing number of reports about how much time people are spending on their mobiles and of the many negative effects of such usage. Perhaps it’s time, therefore, for the emergence of a new breed of app explicitly designed to minimise one’s usage of the mobile. It would be capable of taking a whole variety of steps to reduce the amount of email you get; to summarise incoming communications for you; and to ask searching questions of you about new apps you want to load and new contacts you want to add. It would measure and report your usage of the mobile, and advise on ways that you can cut down the amount of time you are spending on it or reorganise your usage patterns so as to improve your quality of life.