Monday, August 3, 2009

Ok, it's official!

From now on, please visit my new Wordpress blog for further posts. Thank you! The address is: http://www.easydigitalpreservation.wordpress.com

Wednesday, July 29, 2009

Sorry for the Silence

Shortly after a great start with this blog, I have kept a bit quiet! This is because I've decided to move everything over to a Wordpress blog...I am still getting things straightened out over there, but I will post the link for my blog's new home very very soon!

Monday, July 20, 2009

ISO Standards

ISO is the commonly used name for the International Organization for Standardization. This is an international, non-governmental organization that creates standards based on a consensus of international committee members.

One ISO standard that is relevant to digital preservation practices is the OAIS model.

Additionally, there is a working group attempting to create an ISO standard for digital repository certification, which I think is an excellent idea. A wiki is maintained here with information related to their regular remote meetups and the documentation they are creating and collecting to assist in the process of writing a standard. A useful glossary of digital preservation terms can also be found on their wiki.

Thinking about Metadata

Effective retrieval of documents designated for digital preservation will rely on assigning good metadata to each document/data set/image/audio file/etc that is to be preserved in a given repository. There are a few issues related to this process.

First, we must ask what the standards are for assigning the metadata. There are many existing models for metadata standards that are currently used for different file types, but they are generally file type-specific. Examples include Dublin Core for web pages and MARC for bibliographic records.

Secondly, will the kind of file-specific metadata that is assigned independently of a storage system hold up in a mixed storage preservation environment? Or should a new model be applied on top of the existing metadata? Additionally, the trouble with file-specific metadata is that unless you are creating library records, nobody is required to do it. In many cases, people create as much or as little as they want, and that sometimes means none at all. So even if there are standards in place, following them is optional, subjective, and splotchy.

In digital preservation, there are new considerations for the information that needs to be included in metadata. This includes the digital history of each item: what software was used, and which version of that software? what platform was used? what other formats has this item existed in before its current state? has it been migrated from different formats? how many times, and by who?

So now, not only is there the underlying metadata that clarifies the "aboutness" of the item, but new metadata related to digital preservation practices will need to be created. I'd like to assume that this second layer of metadata, unlike the file-specific metadata, will face some stricter regulations and will be required and consistently updated. Keep an eye out because I will discuss some of the existing metadata models for digital repositories and preservation practices in later, more specific posts.

But at the moment, we are left with two levels of metadata, and no real standards for either level. We have models to follow, but each owner of a digital repository can make their own decisions about which to use and how much they want to stick to it.

This could be disastrous for federated digital repository searching in the future.

Image by Flickr user Jametiks.

Wednesday, July 15, 2009

Cloud Computing


Let’s talk about cloud computing.

At its simplest, things that are in the “cloud” are things that float around in a sort of digital airspace and don’t exist on your computer. They exist on remote servers which can be accessed from many computers. For this reason, the cloud is a good metaphor for the Internet. For most of us, keeping things in a cloud results in a convenient and logical way to make life simple.

You can access things in the cloud from anywhere that is connected to the Internet…depending on the service and its security (private cloud or public cloud). It’s kind of like your email or Facebook account. You have lots of stuff stored in these accounts that is specific to you, but you can login from anywhere. And it will always look the same and have all your stuff in it. Your stuff is always just…there.

Getting a bit more technical, your stuff is actually physically stored somewhere as bits on servers that are run by whoever is providing the service. For example, some institutions have servers dedicated to an institutionally-based digital repository. These servers might live on the campus and will store everything that is added to the repository. But the whole repository will not exist on the specific machine that you might use to access documents stored there. Your computer will connect to the remote server to access the repository.

What makes this fun for digital preservationists is that cloud computing can really increase the scale and sharing of preservation duties. Maureen Pennock, the Web archive preservation Project Manager at the British Library, recognizes this in her blog: “This minimises costs for all concerned, addresses the skills shortage, and produces a more efficient, sustainable and reliable preservation infrastructure.”

In the future - and as we are seeing with DuraCloud - all the tech work behind producing ways to store and retrieve data may be provided as part of a single repository product. (This type of service, by the way, is referred to as IaaS – Infrastructure as Service.) This would be excellent news for a great deal of institutions that don’t have the means or skills to set up a repository themselves.

Cloud computing offers a huge potential for an off-site alternative to the out-of-the-box repository products that most institutions currently must use. Instead, external organizations will be able to do the tech work while the institutions will be able to focus on non-technical repository maintenance.

Images by Flickr users mansikka and AJC1

Tuesday, July 14, 2009

DuraSpace and DuraCloud

Here's some news: the projects behind the digital repository platforms DSpace and Fedora have joined efforts and will now live under a bigger umbrella called DuraSpace. This new organization will still enable full and independent functioning of Fedora and DSpace, but there will be new joint projects aimed at advancing digital preservation technology and addressing a larger group of stakeholders.

The May 9, 2009 press release that announced the DuraSpace partnership emphasizes the first new project of its portfolio: DuraCloud. DuraCloud is currently in a year-long pilot phase, and has the advantage of being backed as an NDIIPP project. What makes it special is that it seems to be the first repository project to use cloud technology to store data. Institutions will be relieved of a huge economic and technological burden if they no longer have to store the data themselves. The Library of Congress announcement states that, "Duracloud will let an institution provide data storage and access without having to maintain its own dedicated technical infrastructure." Which means the servers (and knowledgeable techies) are provided with the DuraCloud product.

This means that the duties of the institutions with repositories that are supported by cloud storage technology will be refined to making the repository data standardized and accessible, which is probably a better way to spend time and funding for them. DuraSpace and DuraCloud will maintain the open source and non-profit legacies of DSpace and Fedora, which makes this new organization and its first project even more appealing to institutions on tightened budgets.