OAI logo

Implementation Guidelines for the Open Archives Initiative Protocol for Metadata Harvesting

- XML schema to hold provenance information in the "about" part of a record

Protocol Version 2.0 of 2002-06-14
Document Version 2002/12/10T11:00:00Z
http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm

Editors

The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu> -- Cornell University - Computer Science
Herbert Van de Sompel <herbertv@lanl.gov> -- Los Alamos National Laboratory - Research Library

From the OAI Technical Committee:
Michael Nelson <m.l.nelson@larc.nasa.gov> -- NASA - Langley Research Center
Simeon Warner <simeon@cs.cornell.edu> -- Cornell University - Computer Science

This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

XML schema to hold provenance information in the "about" part of a record

One possible type of data provider is an aggregator that maintains a repository which disseminates metadata that originate from other data providers (repositories that themselves support OAI-PMH). The relationship of an aggregator and originating repository may, in fact be recursive. For example aggregator "A1" may harvest records from repositories "R1" and "R2", and aggregator "A2" may, in turn, harvest records from aggregator "A1".

In the case of such redistribution, the aggregator may include information about the provenance of the metadata record in the about container. The following XML schema defines a simple format for provenance information. The schema defines a provenance container consisting of a sequence of originDescription elements that identify the provenance of the metadata record; i.e. the chain of originating repositories. The expectation is that each aggregator will append the latest originDescription onto the list. Each originDescription contains the following information:

Each originDescription must also have the following two attributes which relate to the act of harvesting and any subsequent processing:

Note that the formats (granularity) of the datestamp and responseDate values must be preserved when they are included in the datestamp and harvestDate elements respectively. They must not be changed to match the granularity of the local repository.

<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/provenance" 
  xmlns="http://www.w3.org/2001/XMLSchema" 
  xmlns:provenance="http://www.openarchives.org/OAI/2.0/provenance" 
  elementFormDefault="qualified" attributeFormDefault="unqualified">

  <annotation>
    <documentation>
      Schema for the description of the provenance of metadata that is 
      re-exposed by an OAI repository, i.e. metadata that has previously 
      been harvested before being exposed by the repository.
      See: http://www.openarchives.org/OAI/2.0/guidelines-branding.htm
      Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002
      Simeon Warner - $Date: 2002/05/16 19:48:39 $
     </documentation>
   </annotation>

  <element name="provenance">
    <complexType>
      <sequence>
        <element name="originDescription"
                 type="provenance:originDescriptionType"/>
      </sequence>
    </complexType>
  </element>

  <complexType name="originDescriptionType">
    <sequence>
      <element name="baseURL" type="anyURI"/>
      <element name="identifier" type="anyURI"/>
      <element name="datestamp" type="provenance:UTCdatetimeType"/>
      <element name="metadataNamespace" type="anyURI"/>
      <element name="originDescription" minOccurs="0" 
               type="provenance:originDescriptionType"/>
    </sequence>
    <attribute name="harvestDate" type="provenance:UTCdatetimeType" use="required"/>
    <attribute name="altered" type="boolean" use="required"/>
  </complexType>

  <simpleType name="UTCdatetimeType">
    <union memberTypes="date dateTime"/>
  </simpleType>

</schema>
This Schema is available at http://www.openarchives.org/OAI/2.0/provenance.xsd

Examples

The following example shows the use of this provenance container in an about part that would be associated with a metadata record. The example shows a two element provenance chain with the record originally having been harvested from a repository with baseURL http://some.oa.org. It was then harvested and subsequently disseminated by a repository with baseURL http://the.oa.org. The metadataNamespace elements indicate that the metadata format has not been changed. The altered attributes indicate that the metadata was not altered between the first harvest and following dissemination, but was altered between the second harvest and following dissemination.

<about>
<provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance
                      http://www.openarchives.org/OAI/2.0/provenance.xsd">

<originDescription harvestDate="2002-02-02T14:10:02Z" altered="true">
  <baseURL>http://the.oa.org</baseURL>
  <identifier>oai:r2.org:klik001</identifier>
  <datestamp>2002-01-01</datestamp>
  <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
  <originDescription harvestDate="2002-01-01T11:10:01Z" altered="false">
    <baseURL>http://some.oa.org</baseURL>
    <identifier>oai:r2.org:klik001</identifier>
    <datestamp>2001-01-01</datestamp>
    <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
  </originDescription>
</originDescription>

</provenance>
</about>

The following example shows a sequence of requests and responses leading to a response which contains a provenance container.

Consider a request from crosswalker.oa.org:
http://odd.oa.org?verb=GetRecord&identifier=oai:odd.oa.org:z1x2y3
                 &metadataPrefix=odd_fmt
and the following response from odd.oa.org:
<?xml version="1.0" encoding="UTF-8"?> 
<GetRecord xmlns="http://www.openarchives.org/OAI/2.0/OAI-PMH" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/OAI-PMH 
           http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> 
  <responseDate>2002-02-08T08:55:46</responseDate>
  <request verb="GetRecord" metadataPrefix="odd_fmt" 
           identifier="oai:odd.oa.org:z1x2y3">http://odd.oa.org</request>
  <record> 
    <header>
      <identifier>oai:odd.oa.org:z1x2y3</identifier> 
      <datestamp>1999-08-07T06:05:04Z</datestamp>
    </header>
    <metadata>
      <md:odd_fmt ...>
        ...metadata record in odd_fmt...
      </md:odd_fmt> 
    </metadata>
  </record>
</GetRecord>
Imagine that crosswalker.oa.org cross-walks the metadata from odd_fmt into oai_marc and then re-exposes the new metadata record with a new identifier.

A request from getmarc.oa.org:
http://crosswalker.oa.org?verb=GetRecord
                         &identifier=oai:crosswalker.oa.org:a9b8c7
                         &metadataPrefix=oai_marc
might then yield the following response from crosswalker.oa.org:
<?xml version="1.0" encoding="UTF-8"?> 
<GetRecord xmlns="http://www.openarchives.org/OAI/2.0/OAI-PMH" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/OAI-PMH 
           http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> 
  <responseDate>2002-02-08T08:55:46Z</responseDate>
  <request verb="GetRecord" metadataPrefix="oai_marc"
           identifier="oai:crosswalker.oa.org:a9b8c7">http://crosswalker.oa.org</request>
  <record> 
    <header>
      <identifier>oai:crosswalker.oa.org:a9b8c7</identifier> 
      <datestamp>2002-02-09T01:15:24Z</datestamp>
    </header>
    <metadata>
      <marc:oai_marc ...>
        ...metadata record in oai_marc...
      </marc:oai_marc> 
    </metadata>
    <about>
      <provenance xmlns="http://www.openarchives.org/OAI/2.0/provenance"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance
                            http://www.openarchives.org/OAI/2.0/provenance.xsd">
        <originDescription harvestDate="2002-02-08T08:55:46Z" altered="true">
          <baseURL>http://odd.oa.org</baseURL>
          <identifier>oai:odd.oa.org:z1x2y3</identifier>
          <datestamp>1999-08-07T06:05:04Z</datestamp>
          <metadataNamespace>http://odd.oa.org/odd_fmt</metadataNamespace>
        </originDescription>
      </provenance>
    </about>
  </record>
</GetRecord>

Acknowledgements

Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.

Document History

2002-06-14: Release of this document, combined with the release of OAI-PMH version 2.0.
2002-07-02: Corrected to follow OAI-PMH version 2.0 datestamp and oai-identifier specifications and to change the identifier in the example where the metadata is altered.