The Web is an immensely large and constantly changing information landscape that fundamentally resists the idea of archiving it all. At the same time the Web itself is a site for ceaseless breakdown and repair in the form of broken links, failed business models, hijacked domains, obsolescence and general neglect. Web archiving works in varying measures to stem this tide of loss–to save what can be saved, so that it can become part of the historical record.
Not surprisingly the production of Web archives has required the development of new tools, protocols, standards, collaborative networks and expertise. And so today, the practice of archiving cannot be done without the help of automated agents that retrieve selected content, discover new related content, and provide the archivists with a sense of the dimensions of what we call web pages, websites and domains that they care for.
To better understand the work practices of web archiving we conducted 20 semi-structured ethnographic interviews with practicing archivists, researchers and technologists who worked with web archives. These interviews provided a unique glimpse into the work practices of archivists and their automated collaborators who do the maintenance work of Web archiving. In this paper we describe some of these findings and their implications for how we work with the Web that is constantly under construction.
I am a software developer with two decades of experience bridging the worlds of libraries and archives with the World Wide Web. I have worked in academia, star- tups, corporations and the government. I work best in agile, highly collaborative teams, that want to help make the world a better place.