|
The Open Archives Initiative Protocol for Metadata Harvesting
Changes from OAI-PMH 1.1 to OAI-PMH 2.0
|
Protocol Version 2.0 of 2002-06-14
Document Version 2002/06/09T16:43:00Z
http://www.openarchives.org/OAI/migration.htm
|
Editors
The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu>
-- Cornell University - Computer Science
Herbert Van de Sompel <hvdsomp@yahoo.com>
-- Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson <m.l.nelson@larc.nasa.gov>
-- NASA - Langley Research Center
Simeon Warner <simeon@cs.cornell.edu>
-- Cornell University - Computer Science
Table of Contents
0. Introduction to this Annotation Document
1. Introduction
2. Definitions and Concepts
2.1. Harvester
2.2. Repository
2.3. Item
2.4. Unique Identifier
2.5. Record
2.6. Set
2.7. Selective Harvesting
2.7.1 Selective Harvesting and Datestamps
2.7.2 Selective Harvesting and
Sets
3. Protocol Features
3.1. HTTP Embedding of OAI-PMH requests
3.1.1. HTTP Request Format
3.1.2. HTTP Response Format
3.1.3. Response Compression
3.2. XML Response Format
3.2.1. XML Schema for Validating Responses to
OAI-PMH Requests
3.3. UTCdatetime
3.3.1. UTCdatetime in Protocol Requests
3.3.2. UTCdatetime in Protocol Responses
3.4. metadataPrefix and Metadata Schema
3.5. Flow Control
3.5.1 Idempotency of resumptionTokens
3.6. Error and Exception Conditions
4. Protocol Requests and Responses
4.1. GetRecord
4.2. Identify
4.3. ListIdentifiers
4.4. ListMetadataFormats
4.5. ListRecords
4.6. ListSets
5. Dublin Core
6. Implementation Guidelines
Acknowledgements
Document History
This document is intended as an accompanying document to the
specification of the Open Archives
Initiative Protocol for Metadata Harvesting (referred to as
the OAI-PMH in the remainder of this document). The purpose is to assist
implementers migrating from OAI-PMH 1.1 to OAI-PMH 2.0. This document is
expressly not intended as a standalone document, nor is it intended for
use by new implementers who have no concerns with migration issues.
The organization of the document, except for this section, exactly parallels
the structure of the OAI-PMH specification, thus making it easy for readers to
move back and forth between the two documents. Each section is organized
as two lists:
- textual changes - describing changes that were made to the
specification for the sake of clarity or changes in terminology.
- functional changes - describing changes that were made to the
protocol rules.
For the sake of brevity, the respective list only appears if necessary.
1. Introduction
Textual Changes
- Language from RFC 2119 clarifying required and optional aspects of the
protocol is introduced.
- Separation of non-core protocol features into a separate Implementation
Guidelines is introduced.
- Minor wording changes.
2. Definitions and Concepts
2.1 Harvester
Textual Changes
- This is a new section introducing a new term.
2.2 Repository
Textual Changes
- This corresponds to section 2.1 in OAI-PMH 1.1.
- The terminological differentiation between resources, items,
and records is introduced.
2.3 Item
Textual Changes
- This is a new section introducing a new term.
2.4 Unique Identifier
Textual Changes
- This section corresponds to section 2.3 in OAI-PMH 1.1.
- Clarification that the Unique Identifier identifies an item and that all
disseminations of records in various metadata formats share the same unique
identifier.
- Clarification of the use of identifiers in the protocol within responses
and requests.
- References and description of the optional oai-identifier format for unique
identifiers has been moved to an accompanying guidelines document and the
syntax of that identifier is changed.
- Additional wording changes.
2.5 Record
Textual Changes
- This section corresponds to section 2.2 in OAI-PMH 1.1.
- Clarification of the distinction between the notion of a record and
its XML-formatted dissemination, which is the result of a harvesting
request.
- Clarification that the unambiguous identification of a record is the
combination of its identifier,
datestamp, and
metadataPrefix.
- Introduction of a new use of the about container in a record: a
data structure to express the provenance of a metadata record.
This use is shown in an example.
Functional Changes
- Loosen the restriction on the tags of sets to allow any character safe in
the query component of a URL.
- The record header must indicate the set membership of the item.
2.6 Set
Textual Changes
- This section corresponds to section 2.5 in OAI-PMH 1.1.
- The text has been clarified to indicate that set organizations need not be
hierarchical, but can be a simple, flat list.
- The text has been changes to clarify the fact that a
setTag is simply a
component of the colon-separated list that is a
setSpec.
- Numerous clarifying textual changes.
Functional Changes
- If a repository supports sets, the header of a record must indicate
the set membership of the item from which that record was disseminated.
- Set membership of items must be returned as part disseminated
records
- Each set in a repository's set organization may include a
setDescription for
community specific XML-encoded data about the set.
2.7 Selective Harvesting
Textual Changes
- This is a new section clarifying selective harvesting as a distinct
concept and the criteria that are available in OAI-PMH
Textual Changes
- This section roughly corresponds to section 2.4 in OAI-PMH 1.1. It
clarifies the intended purpose of datestamps and their relationship
to selective harvesting.
- Numerous wording changes to clarify how record modification, deletion, and
creation affects datestamps..
Functional Changes
- A deleted status is now returned by ListIdentifiers, in addition to
GetRecord and ListRecords.
In all cases "deleted" means the withdrawal of availability of the
respective record.
- Repositories must express the nature of their support of the
deleted status as part of the Identify response.
- Repositories may support datestamp-based selective harvesting at
the granularity of seconds. Repositories must support datestamp-based
selective harvesting at the granularity of days. Support for seconds
granularity is indicated in the response to Identify.
Textual Changes
- This is a new section that clarifies the intended purpose of sets and
their relationship to selective harvesting.
3.0 Protocol Features
3.1 HTTP embedding of OAI-PMH
requests
Textual Changes
3.1.1 HTTP Request Format
Textual Changes
Functional Changes
- Repositories must support the both the GET and
POST method.
Textual Changes
3.1.1.2 Encoding an OAI-PMH request in an HTTP POST
Textual Changes
3.1.1.3 Encoding of special characters in
keyword arguments of OAI-PMH requests
Textual Changes
3.1.2 HTTP Response Format
Textual Changes
- Editing changes to remove duplication.
3.1.2.1 Content-Type
Textual Changes
3.1.2.2 Status-Code
Textual Changes
Functional Changes
3.1.3 Response Compression
Textual Changes
Functional Changes
3.2. XML Response Format
Textual Changes
-
This is a new section containing information
that was formally in Section 3.1.2.1 "Content Type".
Functional Changes
-
The
responseDate included in every OAI response must be in UTC.
-
The
requestURL in Version 1.x is replaced by a <request>
tag that encodes the originating protocol request and its arguments.
-
Error conditions are reported within the XML response
-
Character references, rather than entity references, must
be used within the XML of responses.
3.2.1 XML Schema for Validating
Responses to OAI-PMH Requests
Textual Changes
Functional Changes
3.3 UTCdatetime
Textual Changes
-
The name of the section is changed from "Dates and Times"
-
Major changes to explain new functionality
-
The section is divided into sub-sections to clarify request and
response handling of dates and times.
Functional Changes
3.3.1 UTCdatetime in Protocol Requests
Textual Changes
Functional Changes
- Repositories may support datestamp-based selective harvesting at
the granularity of seconds. Repositories must support
datestamp-based selective harvesting at the granularity of days.
Support for seconds granularity is indicated in the response to Identify.
- A request by a harvester with finer granularity
than that supported by a repository must produce an error.
3.3.2 UTCdatetime in Protocol Responses
Textual Changes
Functional Changes
- Datestamps in response to ListRecord,
GetRecord and
ListRecords are expressed in UTC and must be expressed in
the finest granularity supported by the repository.
- The
responseDate included
with every protocol response is expressed in UTC. This is encoded
using the "Complete date plus hours, minutes, and seconds" variant of
ISO8601 . This format is YYYY-MM-DDThh:mm:ssZ.
- A resumptionToken may include an optional argument
expirationDate.
3.4
metadataPrefix and Metadata Schema
Textual Changes
- Numerous textual changes to reflect modified functionality.
Functional Changes
- The character restrictions of a
metadataPrefix are relaxed to allow any
characters that are safe in a query component
of a URI.
- A metadataPrefix must be included as an argument to the ListIdentifiers
request as a means of requesting headers of records that correspond
to the metadata format identified by the value provided for
metadataPrefix
argument.
- The URL of the XML schema for the required Dublin Core metadata
format is changed to http://www.openarchives.org/OAI/2.0/oai_dc/oai_dc.xsd.
The corresponding
XML namespace URI is
http://www.openarchives.org/OAI/2.0/oai_dc/.
- The protocol reserves the
metadataPrefix 'all'
for future use. Implementations should not use this
metadataPrefix.
3.5 Flow Control
Textual Changes
- Numerous textual changes to reflect modified functionality and to
improve clarity.
Functional Changes
-
resumptionTokens may be accompanied by three optional
attributes to aid processing by a harvester:
expirationDate,
completeListSize,
and cursor.
3.5.1 Idempotency of resumptionTokens
Textual Changes
- This is a new section defining new functionality.
Functional Changes
- The behavior of
a
resumptionToken within the time range defined by its
expirationDate
must be idempotent modulo changes in the
underlying content of the repository.
Textual Changes
- This is a new section describing new functionality.
Functional Changes
- Repositories must indicate OAI-PMH errors,
distinguished from HTTP
status-codes, by including one
or more error elements in the
response.
4. Protocol Requests and
Responses
Textual Changes
- Numerous editing changes.
4.1. GetRecord
Textual Changes
-
Clarification that the required identifier argument is
linked to the item from which a record is requested.
-
Removal of text explaining restriction on the character set in
the metadataPrefix argument (text is included in the appropriate section).
-
The examples are changed to reflect changes in the protocol
specification.
-
Appropriate OAI-PMH error codes are listed.
-
Functional Changes
-
The record header must indicate the set membership of the item.
-
There is no longer an XML schema unique to this
request. All responses are validated via a single schema, described in
Section 3.2.1.
4.2. Identify
Textual Changes
Functional Changes
- The Identify
response must return the repository's level of deleted item support.
- The Identify
response must return the harvesting granularity supported by the
repository.
- The Identify
response must return the compression encodings, other than
Identity, supported
by the repository.
- The Identify
response must return the guaranteed lower limit of all datestamps of
items in the repository.
- There is no longer an XML schema unique to this request. All
responses are validated via a single schema, described in Section 3.2.1.
4.3. ListIdentifiers
Textual Changes
Functional Changes
-
The purpose of ListIdentifiers is reframed as an abbreviated version
of ListRecords.
As such, it returns only the headers of records, omitting the metadata
part.
-
The use of a
metadataPrefix
argument is required to specify that headers
should be returned only if the metadata format matching the supplied metadataPrefix
is available, or has been deleted.
-
The header must indicate the set membership of the item.
-
The from
and until argument
values are expressed in UTC.
-
There is no longer an XML schema unique to this
request. All responses are validated via a single schema, described in
Section 3.2.1.
4.4. ListMetadataFormats
Textual Changes
Functional Changes
4.5. ListRecords
Textual Changes
Functional Changes
-
The record header must indicate the set membership of the
item corresponding to the record.
-
There is no longer an XML schema unique to this
request. All responses are validated via a single schema, described in
Section 3.2.1.
4.6. ListSets
Textual Changes
Functional Changes
-
Each set in a repository's set organization may include a
setDescription for
community specific XML-encoded data about the set.
-
There is no longer an XML schema unique to this
request. All responses are validated via a single schema, described in
Section 3.2.1.
5. Dublin Core
Textual Changes
- This is a new section replacing Appendix 1 in OAI-PMH version 1.1.
- XML schema for metadata formats other than the required Dublin Core
are moved to the Implementation Guidelines document.
Functional Changes
- The XML schema for Dublin Core is based on (imports) an XML schema
for the DC elements supported by DCMI.
6. Implementation Guidelines
Textual Changes
- This is a new section providing a link to the Implementations
Guidelines document.
- Contents of Appendix 2 in OAI-PMH version 1.1 are moved to the
Implementations Guidelines document.
Acknowledgements
Support for the development of the OAI-PMH and for other Open Archives
Initiative activities comes from the Digital
Library Federation, the Coalition for
Networked Information, and from the National Science Foundation through Grant No. IIS-9817416.
Document History
2002-06-14: Release of this document, combined with the release of OAI-PMH
version 2.0.