DO NOT USE, SEE CURRENT ResourceSync SPECIFICATIONS

ResourceSync Framework Specification - Archives - Beta Draft

8 May 2013

This version:
http://www.openarchives.org/rs/0.6/archives
Latest version:
http://www.openarchives.org/rs/archives
Previous version:
http://www.openarchives.org/rs/0.5/resourcesync
Editors:
Martin Klein, Robert Sanderson, Herbert Van de Sompel - Los Alamos National Laboratory
Simeon Warner - Cornell University
Bernhard Haslhofer - University of Vienna
Michael Nelson - Old Dominion University
Carl Lagoze - University of Michigan

Abstract

The ResourceSync specifications describe a synchronization framework for the web consisting of various capabilities that allow third party systems to remain synchronized with a server's evolving resources. This ResourceSync Archives specification describes additional capabilities that extend the core specification to provide historical information about a set of resources.

This specification is one of several documents comprising the ResourceSync Framework Specifications.

Status of this Document

This specification is a beta draft released for public comment. Feedback is most welcome on the ResourceSync Google Group.

In this draft the Archive capabilities are described separately from the core specification.

Table of Contents

1. Introduction
    1.1 Motivating Examples
    1.2 Notational Conventions
2. Change List Archives
3. Change Dump Archives
4. Resource List Archives
5. Resource Dump Archives
6. Advertising Archive Capabilities
7. References

Appendices

A. XML Element Overview
B. Acknowledgements
C. Change Log

1. Introduction

The ResourceSync specifications introduce a range of easy to implement capabilities that a server may support in order to enable remote systems to remain more tightly in step with its evolving resources. They also describe how a server can advertise the capabilities it supports. Remote systems can inspect this information to determine how best to remain aligned with the evolving data.

This ResourceSync Archives specification adds to the framework capabilities that allow a server to provide historical data based on Archives of the core capabilities (Change Lists, Change Dumps, Resource Lists, and Resource Dumps). Like all other capabilities, Archives are implemented using the document formats introduced by the Sitemap protocol. Each Archive capability is optional and may be implemented independently of any other Archive capability. Archives need not be implemented in order to support synchronization with ResourceSync, but may facilitate certain use cases.

For example, a Change List Archive allows a server to list a timestamped set of historical Change Lists, thus allowing description of changes over an extended period without placing addition requirements on the generation and rotation of the current Change List. A Resource Dump Archive allows a server to list a timestamped set of historical Resource Dumps, providing snapshots of the server's resources at different times. A remote server may select an appropriate historical Resource Dump to synchronize with a past state of the server's resources.

This document is structured as follows:

1.1. Motivating Examples

Many projects and services have synchronization needs and have implemented ad hoc solutions. ResourceSync provides a standard synchronization method that will reduce implementation effort and facilitate easier reuse of resources. Archive capabilities allow historical data to be described within the same framework as current synchronization information. This section describes motivating examples for the Archive capabilities.

The way in which a ResourceSync Source generates Change Lists will be determined by the particular technical configuration of the Source, the frequency of changes, and the intended use. While Change Lists that use the Sitemap index format and a set of Sitemaps may have a very large number of entries, it may be convenient to rotate individual lists of changes frequently and avoid generating a very large Change List. Change List Archives add flexibility while retaining the ability for a Source to make available a complete change history enabling incremental synchronization from any past state. A Source with very frequent changes might create separate Sitemap files as part of a Change List at hourly intervals, and perhaps each month (about 720 hours) start a new Change List while archiving the old one. If all the resource states were recorded in addition to the change information, then Change Dumps and a Change Dump Archive could be used to optimize download of the changed resources.

Many services provide snapshots of historical content either as stable reference points, or to permit the evolution of the service's resources to be studied in situations where describing all updates would be difficult. Examples include Wikipedia Snapshots and Nature Linked Data Snapshots. The Resource Dump Archive capability provides the opportunity to describe such snapshots in a consistent and machine-navigable way.

Resource List Snapshots provide the ability for servers to describe the state of their resources at particular points in time. This would allow clients to investigate changes expressed in the metadata or to compare the current state with historical state.

1.2. Notational Conventions

This specification uses the terms "resource", "representation", "request", "response", "content negotiation", "client", and "server" as described in [Architecture of the World Wide Web].

Throughout this document, the following namespace prefix bindings are used:

PrefixNamespace URIDescription
http://www.sitemaps.org/schemas/sitemap/0.9 Sitemap XML elements defined in the Sitemap protocol
rshttp://www.openarchives.org/rs/terms/ Namespace for elements and attributes introduced by ResourceSync

Table 1.1: Namespace prefix bindings used in this document

2. Change List Archive

A Change List (ResourceSync Core: Change List) describes the changes in a Source's resources over a certain period of time. The Source determines the length of that time interval. If a Source wishes to offer Change Lists covering prior temporal intervals, it can provide a Change List Archive. A Change List Archive provides a list of pointers to individual Change Lists which would usually represent consecutive lists of changes.

A Change List Archive is based on the <urlset> document format introduced by the Sitemap protocol. It has the <urlset> root element and the following structure:

The pointers in a Change List Archive must be in chronological order. The associated datetime can be used by Destinations to determine if new changes are available to be processed. By downloading the Change Lists a destination may inspect the from and until attributes of the top-level <rs:md> element to determine whether the Change Lists are consecutive and without any time gaps.

Example 2.1 shows a Change List Archive that points to three Change Lists created on consecutive days. To ease navigation for Destinations, the Change Lists referred to in the below example will have the top level <rs:ln> element with the relation type up that points to the Change List Archive.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln rel="resourcesync"
         href="http://example.com/dataset1/capabilitylist.xml"/>
  <rs:md capability="changelist-archive"
         modified="2013-01-03T09:00:00Z"/>
  <url>
      <loc>http://example.com/changelist1.xml</loc>
      <lastmod>2013-01-01T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/changelist2.xml</loc>
      <lastmod>2013-01-02T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/changelist3.xml</loc>
      <lastmod>2013-01-03T09:00:00Z</lastmod>
  </url>
</urlset>

Example 2.1: A Change List Archive

3. Change Dump Archive

If a Source decides to offer Change Dumps of prior temporal intervals, it may provide a Change Dump Archive. A Change Dump Archive points to a number of Change Dumps.

A Change Dump Archive is based on the <urlset> document format introduced by the Sitemap protocol. It has the <urlset> root element and the following structure:

The pointers in a Change Dump Archive must be in chronological order. The associated datetime can be used by Destinations to determine if new changes have to be processed. By downloading the Change Dumps a destination may inspect the from and until attributes of the top-level <rs:md> element to determine whether the Change Dumps are consecutive and without any time gaps.

An example for a Change Dump Archive is shown in Example 3.1 below. It points to three Change Dumps that were created in consecutive weeks. It is recommended that each Change Dump referred to have the top level <rs:ln> element with the relation type up that points to the Change Dump Archive.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln rel="resourcesync"
         href="http://example.com/dataset1/capabilitylist.xml"/>
  <rs:md capability="changedump-archive"
         modified="2013-01-03T09:00:00Z"/>
  <url>
      <loc>http://example.com/changedump-w1.xml</loc>
      <lastmod>2012-12-20T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/changedump-w2.xml</loc>
      <lastmod>2012-12-27T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/changedump-w3.xml</loc>
      <lastmod>2013-01-03T09:00:00Z</lastmod>
  </url>
</urlset>

Example 3.1: A Change Dump Archive

4. Resource List Archive

As part of the regular update of its Resource List, a Source might maintain old Resource Lists to provide historical snapshot views of its content. Such Resource List Archives provide an easy way for a Destination to compare the states of the resources at different times.

A Resource List Archive is based on the <url> document format introduced by the Sitemap protocol. It has the <url> root element and the following structure:

Example 4.1 shows a Resource List Archive that points to the current Resource List http://example.com/resourcedump3.xml and two Resource Lists created in the two previous months. It is recommended that the Resource List documents referred to have a navigational top level <rs:ln> element with the relation type up that points to the Resource List Archive.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln rel="resourcesync"
         href="http://example.com/dataset1/capabilitylist.xml"/>
  <rs:md capability="resourcelist-archive"
         modified="2013-01-03T09:00:00Z"/>
  <url>
      <loc>http://example.com/resourcelist1.xml</loc>
      <lastmod>2012-11-03T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/resourcelist2.xml</loc>
      <lastmod>2012-12-03T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/resourcelist3.xml</loc>
      <lastmod>2013-01-03T09:00:00Z</lastmod>
  </url>
</urlset>

Example 4.1: A Resource List Archive

5. Resource Dump Archive

As part of the regular maintenance of its data, a Source might maintain old Resource Dumps. For a Destination that wishes to compare or archive versions of the data over time, access to these Resource Dumps allows the packaged historical data to be downloaded all at once, rather than requiring the Source to support access to the individual resource versions, and for the Destination to collect them one at a time.

A Resource Dump Archive not only points to the current Resource Dump but also to previously created and published Resource Dumps. Each of these Resource Dumps represents a snapshot of the Source's data at a certain point in time - the creation time of the Resource Dump.

A Resource Dump Archive is based on the <urlset> document format introduced by the Sitemap protocol. It has the <urlset> root element and the following structure:

Example 5.1 shows a Resource Dump Archive that points to the current Resource Dump http://example.com/resourcedump3.xml and two Resource Dumps created in the two previous months. It is recommended that Resource Dump documents referred to in Example 5.1 have a navigational top level <rs:ln> element with the relation up that points to the Resource Dump Archive.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln rel="resourcesync"
         href="http://example.com/dataset1/capabilitylist.xml"/>
  <rs:md capability="resourcedump-archive"
         modified="2013-01-03T09:00:00Z"/>
  <url>
      <loc>http://example.com/resourcedump1.xml</loc>
      <lastmod>2012-11-03T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/resourcedump2.xml</loc>
      <lastmod>2012-12-03T09:00:00Z</lastmod>
  </url>
  <url>
      <loc>http://example.com/resourcedump3.xml</loc>
      <lastmod>2013-01-03T09:00:00Z</lastmod>
  </url>
</urlset>

Example 5.1: A Resource Dump Archive

6. Advertising Archive Capabilities

In order to make use of the capabilities that a Source provides, a Destination must first determine which capabilities are supported, and the URIs of the corresponding capability documents. The Archive capabilities described in this specification may be added to a Capability List in the same manner as other ResourceSync capabilities (see ResourceSync Core: Capability List).

The four additional Archive capabilities describe in this specification that a Source can point to are: changelist-archive, and changedump-archive, resourcelist-archive, resourcedump-archive. These values have been shown in the <rd:md capability="..."> attributes in Example 2.1, Example 3.1, Example 4.1, and Example 5.1. A Capability list may contain only one entry per capability.

Within a Capability List it is expected that all capabilities listed describe the same set of resources. Under this assumption, Destinations can select from the capabilities offered the best one to serve their synchronization goal.

The Capability List is based on the <urlset> format and is described in detail in ResourceSync Core: Capability List. Example 6.1 shows a Capability List where the Source offers eight capabilities: a Resource List, a Resource Dump, a Change List, a Change Dump, a Resource List Archive, a Resource Dump Archive, a Change List Archive, and a Change Dump Archive.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
  <rs:ln rel="describedby"
         href="http://example.com/dataset1/info_about_source.xml"/>
  <rs:md capability="capabilitylist"
         modified="2013-01-02T14:00:00Z"/>
  <url>
      <loc>http://example.com/dataset1/resourcelist.xml</loc>
      <rs:md capability="resourcelist"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/resourcedump.xml</loc>
      <rs:md capability="resourcedump"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/changelist.xml</loc>
      <rs:md capability="changelist"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/changedump.xml</loc>
      <rs:md capability="changedump"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/resourcelist-archive.xml</loc>
      <rs:md capability="resourcelist-archive"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/resourcedump-archive.xml</loc>
      <rs:md capability="resourcedump-archive"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/changelist-archive.xml</loc>
      <rs:md capability="changelist-archive"/>
  </url>
  <url>
      <loc>http://example.com/dataset1/changedump-archive.xml</loc>
      <rs:md capability="changedump-archive"/>
  </url>
</urlset>

Example 6.1: A Capability List which includes Archive capabilities

The provision of Archive capabilities and their inclusion in a Capability Lists does not change how a source would expose the Capability List or the discovery of the Capability List document (see ResourceSync Core: Discovery).

7. References

[Web Architecture]
Architecture of the World Wide Web, Volume One, I. Jacobs and N. Walsh, Editors, World Wide Web Consortium, 15 January 2004.
[Sitemaps]
Sitemaps XML format and protocol, sitemaps.org, 27 February 2008.
[W3C Datetime]
Date and Time Formats, Misha Wolf, Charles Wicksteed, 15 September 1997.
[Memento Internet Draft]
Memento Internet Draft, H. Van de Sompel, M. L. Nelson, R. D. Sanderson, May 2012

A. XML Element Overview

This specification adds the following values for the capability attribute.

Capability Attribute ValueSection
<urlset>
    <rs:md capability="...">
                        changelist-archive 2. Change List Archive
                        changedump-archive 3. Change List Archive
                        resourcelist-archive 4. Resource List Archive
                        resourcedump-archive 5. Resource Dump Archive

Table A.1: Additional values for the capability attribute of the <rs:md> child element of the <urlset> element used to indicate Archive Capabilities

B. Acknowledgements

This specification is the collaborative work of NISO and the Open Archives Initiative. Funding for ResourceSync is provided by the Alfred P. Sloan Foundation. UK participation is supported by Jisc.

The names of individual contributors will be listed here when the final specification is released.

C. Change Log

Date Editor Description
2012-08-13 martin, herbert, simeon, bernhard first alpha spec draft
2013-02-01 martin, herbert, rob, simeon beta spec draft
2013-02-06 simeon, herbert, martin typo fixes
2013-05-06 simeon separated archives portion for 0.6
2013-05-08 martin, simeon minor fixes

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.