{"id":1327,"date":"2017-12-11T17:45:38","date_gmt":"2017-12-11T17:45:38","guid":{"rendered":"http:\/\/www.pwofc.com\/ofc\/?p=1327"},"modified":"2017-12-11T17:45:38","modified_gmt":"2017-12-11T17:45:38","slug":"a-cursory-tour-of-web-archiving","status":"publish","type":"post","link":"https:\/\/www.pwofc.com\/ofc\/2017\/12\/11\/a-cursory-tour-of-web-archiving\/","title":{"rendered":"A cursory tour of web archiving"},"content":{"rendered":"<p>Web archiving isn&#8217;t a simple proposition because not only do web sites keep changing, but they also have links to other sites. So, I guess I should have expected that my search for web archiving tools would come up with a disparate array of answers. It seems that the gold-plated solution is to pay a service such as Smarsh or PageFreezer to periodically take a snapshot of a website and to store it in their cloud. The period is user-definable and can be anything from every few hours to every month or year. Smarsh was advertising its basic service at $129 a month at the time of writing.<\/p>\n<p>A more basic, do-it-yourself facility, is the Unix WGET command line function for which a downloadable Windows version is available. This enables all sorts of functions to be specified including downloading parts or all of a site, the scheduling of downloads etc.. However, as you might expect with a Unix function, it requires the user to input programming-type commands and to be aware of a large number of specifiable options.<\/p>\n<p>More limited services such as Archive.is are available to capture, save and download individual pages &#8211; and some of these are free to use.<\/p>\n<p>Regarding formats in which web archives can be saved, the Library of Congress&#8217; preferred format is the ISO WARC (Web ARChive) file format. However, I was unable to find any tools or services which purport to store files in this format: it sounds like WARC is being used in the background by large institutions who are trying to preserve large volumes of web content. Interestingly the web hosting service I use for the this blog actually offers backups in various forms of zip files; and indeed, it is zip files that I have used in the past to store web sites that are included in my document collection.<\/p>\n<p>Based on this very quick and certainly incomplete tour of the topic of Web Archiving, I&#8217;ve decided I won&#8217;t be trying to do anything fancy or different in the way I use technology to archive my old web sites. The zip format has worked well up to now and I see no reason to change that approach. As for a non-technological solution to web archiving, the notion of creating and binding a physical book of the first five years of this OFC web site is becoming more and more attractive. There&#8217;s something very solid and immutable about a book on a bookshelf. I&#8217;m definitely going to do that, and have set the end of 2017 as the cut-off date for its contents &#8211; I&#8217;m busy trying to make sure that the Journeys are all at appropriate stages by the 31st December.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web archiving isn&#8217;t a simple proposition because not only do web sites keep changing, but they also have links to other sites. So, I guess I should have expected that my search for web archiving tools would come up with &hellip; <a href=\"https:\/\/www.pwofc.com\/ofc\/2017\/12\/11\/a-cursory-tour-of-web-archiving\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[28],"tags":[],"class_list":["post-1327","post","type-post","status-publish","format-standard","hentry","category-blog-archiving"],"_links":{"self":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/1327","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/comments?post=1327"}],"version-history":[{"count":1,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/1327\/revisions"}],"predecessor-version":[{"id":1328,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/posts\/1327\/revisions\/1328"}],"wp:attachment":[{"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/media?parent=1327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/categories?post=1327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pwofc.com\/ofc\/wp-json\/wp\/v2\/tags?post=1327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}