Editors
The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu>
-- Cornell University - Computer Science
Herbert Van de Sompel <herbertv@lanl.gov>
-- Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson
<m.l.nelson@larc.nasa.gov>
-- NASA - Langley Research Center
Simeon Warner
<simeon@cs.cornell.edu>
-- Cornell University - Computer Science
This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
One possible type of data provider is an aggregator that maintains a repository which disseminates metadata that originate from other data providers (repositories that themselves support OAI-PMH). The relationship of an aggregator and originating repository may, in fact be recursive. For example aggregator "A1" may harvest records from repositories "R1" and "R2", and aggregator "A2" may, in turn, harvest records from aggregator "A1".
In the case of such redistribution, the aggregator
may include information about the provenance of the metadata record in
the about container. The following XML schema defines a simple
format for provenance information. The schema defines a provenance
container consisting of a sequence of originDescription
elements that identify the provenance of the metadata record; i.e. the
chain of originating repositories. The expectation is that each aggregator
will append the latest originDescription
onto the list.
Each originDescription
contains the following information:
baseURL
- the baseURL of the
originating repository from which the metadata record was harvested.identifier
- the unique identifier
of the item in the originating repository from which the metadata record
was disseminated.datestamp
- the datestamp of the
metadata record disseminated by the originating repository.metadataNamespace
- the XML namespace URI
of the metadata format of the record harvested from the originating
repository.originDescription
- an optional originDescription
block which was that obtained when
the metadata record was harvested. A set of nested originDescription
blocks will describe provenance over a sequence of harvests.
Each originDescription
must also have the following two
attributes which relate to the act of harvesting and any subsequent
processing:
harvestDate
- the responseDate
of the OAI-PMH response that resulted in the record being harvested from the
originating repository.altered
- a boolean value which must be true
if the harvested record
was altered before being disseminated again.
Note that the formats (granularity) of the datestamp
and
responseDate
values must be preserved when they are included
in the datestamp
and harvestDate
elements
respectively. They must not be changed to match the granularity of the
local repository.
<?xml version="1.0" encoding="UTF-8"?> <schema targetNamespace="http://www.openarchives.org/OAI/2.0/provenance" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:provenance="http://www.openarchives.org/OAI/2.0/provenance" elementFormDefault="qualified" attributeFormDefault="unqualified"> <annotation> <documentation> Schema for the description of the provenance of metadata that is re-exposed by an OAI repository, i.e. metadata that has previously been harvested before being exposed by the repository. See: http://www.openarchives.org/OAI/2.0/guidelines-branding.htm Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002 Simeon Warner - $Date: 2002/05/16 19:48:39 $ </documentation> </annotation> <element name="provenance"> <complexType> <sequence> <element name="originDescription" type="provenance:originDescriptionType"/> </sequence> </complexType> </element> <complexType name="originDescriptionType"> <sequence> <element name="baseURL" type="anyURI"/> <element name="identifier" type="anyURI"/> <element name="datestamp" type="provenance:UTCdatetimeType"/> <element name="metadataNamespace" type="anyURI"/> <element name="originDescription" minOccurs="0" type="provenance:originDescriptionType"/> </sequence> <attribute name="harvestDate" type="provenance:UTCdatetimeType" use="required"/> <attribute name="altered" type="boolean" use="required"/> </complexType> <simpleType name="UTCdatetimeType"> <union memberTypes="date dateTime"/> </simpleType> </schema> |
This Schema is available at http://www.openarchives.org/OAI/2.0/provenance.xsd |
The following example shows the use of this provenance
container in an about
part that would be associated with a metadata record.
The example shows a two element provenance chain with the record originally
having been harvested from a repository with
baseURL
http://some.oa.org. It was then harvested and subsequently
disseminated by a repository with baseURL
http://the.oa.org. The metadataNamespace
elements indicate
that the metadata format has not been changed.
The altered
attributes indicate that the metadata was
not altered between the first harvest and following dissemination,
but was altered between the second harvest and following dissemination.
<about> <provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd"> <originDescription harvestDate="2002-02-02T14:10:02Z" altered="true"> <baseURL>http://the.oa.org</baseURL> <identifier>oai:r2.org:klik001</identifier> <datestamp>2002-01-01</datestamp> <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace> <originDescription harvestDate="2002-01-01T11:10:01Z" altered="false"> <baseURL>http://some.oa.org</baseURL> <identifier>oai:r2.org:klik001</identifier> <datestamp>2001-01-01</datestamp> <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace> </originDescription> </originDescription> </provenance> </about> |
The following example shows a sequence of requests and responses leading to
a response which contains a provenance
container.
Consider a request from crosswalker.oa.org :
http://odd.oa.org?verb=GetRecord&identifier=oai:odd.oa.org:z1x2y3 &metadataPrefix=odd_fmtand the following response from odd.oa.org :
<?xml version="1.0" encoding="UTF-8"?> <GetRecord xmlns="http://www.openarchives.org/OAI/2.0/OAI-PMH" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/OAI-PMH http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46</responseDate> <request verb="GetRecord" metadataPrefix="odd_fmt" identifier="oai:odd.oa.org:z1x2y3">http://odd.oa.org</request> <record> <header> <identifier>oai:odd.oa.org:z1x2y3</identifier> <datestamp>1999-08-07T06:05:04Z</datestamp> </header> <metadata> <md:odd_fmt ...> ...metadata record in odd_fmt... </md:odd_fmt> </metadata> </record> </GetRecord>Imagine that crosswalker.oa.org cross-walks the metadata from
odd_fmt into oai_marc and then re-exposes the
new metadata record with a new identifier.
A request from getmarc.oa.org :
http://crosswalker.oa.org?verb=GetRecord &identifier=oai:crosswalker.oa.org:a9b8c7 &metadataPrefix=oai_marcmight then yield the following response from crosswalker.oa.org :
<?xml version="1.0" encoding="UTF-8"?> <GetRecord xmlns="http://www.openarchives.org/OAI/2.0/OAI-PMH" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/OAI-PMH http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-02-08T08:55:46Z</responseDate> <request verb="GetRecord" metadataPrefix="oai_marc" identifier="oai:crosswalker.oa.org:a9b8c7">http://crosswalker.oa.org</request> <record> <header> <identifier>oai:crosswalker.oa.org:a9b8c7</identifier> <datestamp>2002-02-09T01:15:24Z</datestamp> </header> <metadata> <marc:oai_marc ...> ...metadata record in oai_marc... </marc:oai_marc> </metadata> <about> <provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance http://www.openarchives.org/OAI/2.0/provenance.xsd"> <originDescription harvestDate="2002-02-08T08:55:46Z" altered="true"> <baseURL>http://odd.oa.org</baseURL> <identifier>oai:odd.oa.org:z1x2y3</identifier> <datestamp>1999-08-07T06:05:04Z</datestamp> <metadataNamespace>http://odd.oa.org/odd_fmt</metadataNamespace> </originDescription> </provenance> </about> </record> </GetRecord> |
Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.
2002-06-14: Release of this document, combined with the release of OAI-PMH
version 2.0.
2002-07-02: Corrected to follow OAI-PMH version 2.0
datestamp
and oai-identifier
specifications and to
change the identifier in the example where the metadata is altered.