Open Archives Initiative ResourceSync Framework Specification |
The ResourceSync specifications describe a synchronization framework for the web consisting of various capabilities that allow third party systems to remain synchronized with a server's evolving resources. This ResourceSync Archives specification describes additional capabilities that extend the core specification to provide historical information about a set of resources.
This specification is one of several documents comprising the ResourceSync Framework Specifications.
This specification is a beta draft released for public comment. Feedback is most welcome on the ResourceSync Google Group.
1. Introduction
1.1 Motivating Examples
1.2 Notational Conventions
2. Resource List Archives
3. Resource Dump Archives
4. Change List Archives
5. Change Dump Archives
6. Advertising Archive Capabilities
7. References
A. Acknowledgements
B. Change Log
The ResourceSync specifications introduce a range of easy to implement capabilities that a server may support in order to enable remote systems to remain more tightly in step with its evolving resources. They also describe how a server can advertise the capabilities it supports. Remote systems can inspect this information to determine how best to remain aligned with the evolving data.
This ResourceSync Archives specification adds to the framework capabilities that allow a server to provide historical data based on Archives of the core capabilities (Resource Lists, Resource Dumps, Change Lists, and Change Dumps). Like all other capabilities, Archives are implemented using the document formats introduced by the Sitemap protocol. Each Archive capability is optional and may be implemented independently of any other Archive capability. Archives need not be implemented in order to support synchronization with ResourceSync, but may facilitate certain use cases.
For example, a Change List Archive allows a server to list a timestamped set of historical Change Lists, thus allowing description of changes over an extended period without placing addition requirements on the generation and rotation of the current Change List. A Resource Dump Archive allows a server to list a timestamped set of historical Resource Dumps, providing snapshots of the server's resources at different times. A remote server may select an appropriate historical Resource Dump to synchronize with a past state of the server's resources.
This document is structured as follows:
Many projects and services have synchronization needs and have implemented ad hoc solutions. ResourceSync provides a standard synchronization method that will reduce implementation effort and facilitate easier reuse of resources. Archive capabilities allow historical data to be described within the same framework as current synchronization information. This section describes motivating examples for the Archive capabilities.
The way in which a ResourceSync Source generates Change Lists will be determined by the particular technical configuration of the Source, the frequency of changes, and the intended use. While Change Lists that use the Sitemap index format and a set of Sitemaps may have a very large number of entries, it may be convenient to rotate individual lists of changes frequently and avoid generating a very large Change List. Change List Archives add flexibility while retaining the ability for a Source to make available a complete change history enabling incremental synchronization from any past state. A Source with very frequent changes might create separate Sitemap files as part of a Change List at hourly intervals, and perhaps each month (about 720 hours) start a new Change List while archiving the old one. If all the resource states were recorded in addition to the change information, then Change Dumps and a Change Dump Archive could be used to optimize download of the changed resources.
Many services provide snapshots of historical content either as stable reference points, or to permit the evolution of the service's resources to be studied in situations where describing all updates would be difficult. Examples include Wikipedia Snapshots and Nature Linked Data Snapshots. The Resource Dump Archive capability provides the opportunity to describe such snapshots in a consistent and machine-navigable way.
Resource List Snapshots provide the ability for servers to describe the state of their resources at particular points in time. This would allow clients to investigate changes expressed in the metadata or to compare the current state with historical state.
This specification uses the terms "resource", "representation", "request", "response", "content negotiation", "client", and "server" as described in [Architecture of the World Wide Web].
Throughout this document, the following namespace prefix bindings are used:
Prefix | Namespace URI | Description |
---|---|---|
http://www.sitemaps.org/schemas/sitemap/0.9 |
Sitemap XML elements defined in the Sitemap protocol | |
rs | http://www.openarchives.org/rs/terms/ |
Namespace for elements and attributes introduced by ResourceSync |
As part of the regular update of its Resource List, a Source might maintain old Resource Lists to provide historical snapshot views of its content. Such Resource List Archives provide an easy way for a Destination to compare the states of the resources at different times.
A Resource List Archive is based on the <url>
document
format introduced by the Sitemap protocol. It has the <url>
root element and the following structure:
<rs:md>
child element of <url>
must have a capability
attribute that has a value of
resourcelist-archive
and it must have a from
attribute
that conveys the time the Resource List Archive was created.<rs:ln>
child element of <url>
points to the Capability List with the relation type resourcesync
(see Section 6).<url>
child element of <url>
per
Resource List. This element does not have attributes, but uses child elements to convey
information about the Resource List. The <url>
element has the
following child elements:
<loc>
child element provides the URI of the Resource List.<lastmod>
child element conveys the last modification time of the Resource List with the URI provided in <loc>
, expressed as a W3C Datetime; the use of a complete date and time expressed in UTC using the format
YYYY-MM-DDThh:mm:ss[.s]Z
is recommended.
Example 2.1 shows a Resource List Archive that points
to the current Resource List http://example.com/resourcedump3.xml
and two Resource Lists created in the two previous months. It is recommended
that the Resource List documents referred to have a navigational top level
<rs:ln>
element with the relation type up
that points to the Resource List Archive.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="resourcesync"
href="http://example.com/dataset1/capabilitylist.xml"/>
<rs:md capability="resourcelist-archive"
from="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/resourcelist1.xml</loc>
<lastmod>2012-11-03T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/resourcelist2.xml</loc>
<lastmod>2012-12-03T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/resourcelist3.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
As part of the regular maintenance of its data, a Source might maintain old Resource Dumps. For a Destination that wishes to compare or archive versions of the data over time, access to these Resource Dumps allows the packaged historical data to be downloaded all at once, rather than requiring the Source to support access to the individual resource versions, and for the Destination to collect them one at a time.
A Resource Dump Archive not only points to the current Resource Dump but also to previously created and published Resource Dumps. Each of these Resource Dumps represents a snapshot of the Source's data at a certain point in time - the creation time of the Resource Dump.
A Resource Dump Archive is based on the <urlset>
document format introduced by the Sitemap protocol. It has the
<urlset>
root element and the following structure:
<rs:md>
child element of <urlset>
must have a
capability
attribute that has a value of resourcedump-archive
and it
must have a from
attribute that conveys the time the Resource Dump Archive was created.<rs:ln>
child element of <urlset>
points to the Capability List with the relation type
resourcesync
(see Section 6).<url>
child element of <urlset>
per Resource Dump. This element does not have attributes,
but uses child elements to convey information about the Resource Dump. The <url>
element has the following child elements:
<loc>
child element provides the URI of the Resource Dump.<lastmod>
child element with semantics as described in Section 2.
Example 3.1 shows a Resource Dump
Archive that points to the current Resource Dump
http://example.com/resourcedump3.xml
and two
Resource Dumps created in the two previous months. It is
recommended that Resource Dump documents referred to in
Example 3.1 have a navigational top
level <rs:ln>
element with the relation
up
that points to the Resource Dump Archive.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="resourcesync"
href="http://example.com/dataset1/capabilitylist.xml"/>
<rs:md capability="resourcedump-archive"
from="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/resourcedump1.xml</loc>
<lastmod>2012-11-03T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/resourcedump2.xml</loc>
<lastmod>2012-12-03T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/resourcedump3.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
A Change List (ResourceSync Core: Change List) describes the changes in a Source's resources over a certain period of time. The Source determines the length of that time interval. If a Source wishes to offer Change Lists covering prior temporal intervals, it can provide a Change List Archive. A Change List Archive provides a list of pointers to individual Change Lists which would usually represent consecutive lists of changes.
A Change List Archive is based on the <urlset>
document format introduced by the Sitemap protocol. It has the
<urlset>
root element and the following structure:
<rs:md>
child element of <urlset>
must have a
capability
attribute that has a value of changelist-archive
.
It also should have both the from
and the until
attributes to convey
the temporal interval covered by the collection of Change Lists in the Change List Archive.
The complete set of changes from all of the Change Lists combined should include all changes
that occurred in the specified interval.<rs:ln>
child element of <urlset>
points to the Capability List with the relation type
resourcesync
(see Section 6).<url>
child element of <urlset>
per Change List. This element does not have attributes,
but uses child elements to convey information about the Change List. The <url>
element has the following child elements:
<loc>
child element provides the URI of the Change List.<lastmod>
child element with semantics as described in Section 2.
The pointers in a Change List Archive must be in chronological order.
The associated datetime can be used by Destinations to determine if
new changes are available to be processed. By downloading the Change
Lists a destination may inspect the from
and until
attributes of the top-level <rs:md>
element to
determine whether the Change Lists are consecutive and without any
time gaps.
Example 4.1 shows a Change List Archive
that points to three Change Lists created on consecutive days.
To ease navigation for Destinations, the Change Lists referred to
in the below example will have the top level <rs:ln>
element with the relation type up
that points to the
Change List Archive.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="resourcesync"
href="http://example.com/dataset1/capabilitylist.xml"/>
<rs:md capability="changelist-archive"
from="2013-01-01T09:00:00Z"
until="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/changelist1.xml</loc>
<lastmod>2013-01-01T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/changelist2.xml</loc>
<lastmod>2013-01-02T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/changelist3.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
If a Source decides to offer Change Dumps of prior temporal intervals, it may provide a Change Dump Archive. A Change Dump Archive points to a number of Change Dumps.
A Change Dump Archive is based on the <urlset>
document
format introduced by the Sitemap protocol. It has the
<urlset>
root element and the following structure:
<rs:md>
child element of <urlset>
must have a
capability
attribute that has a value of changedump-archive
.
It also should have both the from
and the until
attributes to convey
the temporal interval covered by the collection of Change Dumps in the Change Dump Archive.
The complete set of changes from all of the Change Dumps combined should include all changes
that occurred in the specified interval.<rs:ln>
child element of <urlset>
points to the Capability List with the relation type
resourcesync
(see Section 6).<url>
child element of <urlset>
per Change Dump. This element does not have attributes, but uses child elements to convey information about the Change Dump. The <url>
element has the following child elements:
<loc>
child element provides the URI of the Change Dump.<lastmod>
child element with semantics as described in Section 2.
The pointers in a Change Dump Archive must be in chronological order.
The associated datetime can be used by Destinations to determine if
new changes have to be processed. By downloading the Change Dumps a
destination may inspect the from
and until
attributes of the top-level <rs:md>
element to
determine whether the Change Dumps are consecutive and without any
time gaps.
An example for a Change Dump Archive is shown in
Example 5.1 below. It points to three Change
Dumps that were created in consecutive weeks. It is recommended that
each Change Dump referred to have the top level <rs:ln>
element with the relation type up
that points to the Change
Dump Archive.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="resourcesync"
href="http://example.com/dataset1/capabilitylist.xml"/>
<rs:md capability="changedump-archive"
from="2012-12-20T09:00:00Z"
until="2013-01-03T09:00:00Z"/>
<url>
<loc>http://example.com/changedump-w1.xml</loc>
<lastmod>2012-12-20T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/changedump-w2.xml</loc>
<lastmod>2012-12-27T09:00:00Z</lastmod>
</url>
<url>
<loc>http://example.com/changedump-w3.xml</loc>
<lastmod>2013-01-03T09:00:00Z</lastmod>
</url>
</urlset>
In order to make use of the capabilities that a Source provides, a Destination must first determine which capabilities are supported, and the URIs of the corresponding capability documents. The Archive capabilities described in this specification may be added to a Capability List in the same manner as other ResourceSync capabilities (see ResourceSync Core: Capability List).
The four additional Archive capabilities describe in this
specification that a Source can point to are:
resourcelist-archive
, resourcedump-archive
,
changelist-archive
, and changedump-archive
.
These values have been shown in the <rd:md capability="...">
attributes in Example 2.1,
Example 3.1, Example 4.1,
and Example 5.1. A Capability List may contain only
one entry per capability.
A resource that is covered by one capability listed in a Capability List must also be covered by all other capabilities that are enumerated in that Capability List. With this understanding, Destinations can select from the capabilities offered the best one to serve their synchronization goal for the particular set of resources.
The Capability List is based on the <urlset>
format
and is described in detail in ResourceSync
Core: Capability List. Example 6.1 shows
a Capability List where the Source offers eight capabilities: a Resource
List, a Resource Dump, a Change List, a Change Dump, a Resource List Archive,
a Resource Dump Archive, a Change List Archive, and a Change Dump Archive.
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln rel="describedby" href="http://example.com/info_about_set1_of_resources.xml" type="application/xml"/> <rs:ln rel="resourcesync" href="http://example.com/resourcesync_description.xml"/> <rs:md capability="capabilitylist"/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability="resourcelist"/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability="resourcedump"/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability="changelist"/> </url> <url> <loc>http://example.com/dataset1/changedump.xml</loc> <rs:md capability="changedump"/> </url> <url> <loc>http://example.com/dataset1/resourcelist-archive.xml</loc> <rs:md capability="resourcelist-archive"/> </url> <url> <loc>http://example.com/dataset1/resourcedump-archive.xml</loc> <rs:md capability="resourcedump-archive"/> </url> <url> <loc>http://example.com/dataset1/changelist-archive.xml</loc> <rs:md capability="changelist-archive"/> </url> <url> <loc>http://example.com/dataset1/changedump-archive.xml</loc> <rs:md capability="changedump-archive"/> </url> </urlset>
The provision of Archive capabilities and their inclusion one or more Capability Lists does not change how a source would expose a ResourceSync Description (see ResourceSync Core: ResourceSync Description), or the discovery of the ResourceSync Description document (see ResourceSync Core: Discovery).
This specification is the collaborative work of NISO and the Open Archives Initiative. Funding for ResourceSync is provided by the Alfred P. Sloan Foundation. UK participation is supported by Jisc.
We also thank numerous individual contributors including: Martin Haye (California Digital Library), Richard Jones (Cottage Labs), Graham Klyne (University of Oxford), Stuart Lewis (University of Edinburgh), Peter Murray (Lyrasis), David Rosenthal (LOCKSS), Shlomo Sanders (Ex Libris, Inc.), Ed Summers (Library of Congress), Paul Walk (UKOLN), Vincent Wehren (Microsoft), Zhiwu Xie (Virginia Tech), and Jeff Young (Online Computer Library Center).
Date | Editor | Description |
---|---|---|
2013-06-07 | simeon | version 0.9 |
2013-05-06 | simeon | separated archives portion for version 0.6 |
2013-02-01 | martin, herbert, rob, simeon | beta spec draft |
2012-08-13 | martin, herbert, simeon, bernhard | first alpha spec draft |
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.