Saving our digital heritage

This article was originally published on Salon on July 19, 2010.

The Library of Congress and other preservation-minded organizations ponder how we preserve what we’re creating

They’re trying to save the news. Among other things.

No, this isn’t yet another thumb-sucking cogitation about the future of journalism, at least not the kind we typically see these days. Rather, this is about a different issue: How do we save journalism (and other media) that’s already been created — including the all too ephemeral information that we’re creating online?

This week in Washington, DC, the Library of Congress is gathering its “Digital Preservation Partners” for a three-day session — one of a number of such meetings the library has been holding under a broad initiative called the “National Digital Information Infrastructure and Preservation Program.” Its multi-year mission is:

to develop a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.

It’s what my technology friends call a non-trivial task, for all kinds of technical, social and legal reasons. But it’s about as important for our future as anything I can imagine. We are creating vast amounts of information, and a lot of it is not just worth preserving but downright essential to save.

My role this week, and at a workshop I joined last year, is to be thinking about the news. My mind almost explodes when I consider the issues.

Even when there were relatively few community information sources — mostly newspapers — we had preservation issues. I started my newspaper career at a small weekly that has long since closed down. While I’m sure someone, somewhere, has a printed copy of the issues, the journalism is nowhere to be found online. And what happens when a newspaper with some printed archives and some online shuts down? Sometimes those archives go dark, too.

Even newspaper archives that exist online tend to live behind paywalls that prevent most people from using them. This greedy policy, which I’ve discussed before, has helped ensure that newspapers are less relevant in their communities than they should be.

A newspaper company I worked for deleted years worth of my blogging, twice. Once was when it changed publishing platforms. The second time was after I left the company. With some technical help I recovered and republished most of it myself.

TV and radio broadcasters have tended to save tapes or digital archives, though huge gaps have emerged in the record. Remember, storage used to be expensive.

The rise of citizen media has complicated everything. Now we had vast new sources of information, some useful and some not. (Kind of like traditional media, no?) Who had the obligation, if there was one, to save this material?

Well, we have the wonderful Brewster Kahle and his team at the Internet Archive to thank that a bunch of it still exists (including my old blogging that we recovered, no thanks to the newspaper company that killed it). The reality, however, is that much of the Web — not to mention many if not most of the great BBS conversations of earlier times — is lost.

After last year’s digital preservation meeting I suggested that we needed better ways to do our own archiving of blogs and other social media. I still believe the Library of Congress, Internet Archive and other preservation-minded folks should help the rest of us with this task.

The social question arises about people who don’t want to save what they’ve done.? Do they have a right to delete it? The Archive will take things down on request. But once you’ve put something up publicly, isn’t it public?

It’s not just a social question, but a legal one, now that judges areordering newspapers to delete archived stories. It’s a legal issue as well because copyright laws are constantly getting in the way of reasonable use of published material. The entertainment industry has taken us down a troubling path in this regard, and things are only getting worse.

And then there’s the entire question of material we create spontaneously, using databases that provide individualized experiences when we seek information. This isn’t just about search queries but about many kinds of community information sources; what you and I see when we visit Everyblock may well differ based on what we type into the text box. The only people archiving this stuff are the ones who own the databases; will the rest of us every have a look? Privacy interests say that we should not reveal it, but historians in the next century and beyond would find this absolutely crucial to their understanding of our times.

Happily, smarter people than yours truly are working on all of this. I’ll be filing some reports from the Washington meetings, to let you know what they’re thinking.