Editors
The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu>
-- Cornell University - Computer Science
Herbert Van de Sompel <herbertv@lanl.gov>
-- Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson
<m.l.nelson@larc.nasa.gov>
-- NASA - Langley Research Center
Simeon Warner
<simeon@cs.cornell.edu>
-- Cornell University - Computer Science
This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
The OAI identifier format is intended to provide persistent resource identifiers for items in repositories that implement OAI-PMH. This is just one possible format that may be used for identifiers within OAI-PMH.
oai-identifiers
are Uniform Resource Names (URNs) in the sense
of RFC1737; they are resource
identifiers and not resource locators (URLs). Note that here the resource
is the metadata (the items) and not the underlying object or "stuff" that the
metadata describes. Correspondence between an oai-identifier
and
any identifier that the object described by the metadata may have is outside
the scope of this specification and of the OAI-PMH. Adherence
to standards and accord with existing schemes is discussed at the end of
this document.
The oai-identifier
syntax is a restriction of the
"general, absolute URI" syntax:
<scheme>:<scheme-specific-part>
,
defined in
RFC 2396.
The following description uses the same notational conventions as
RFC 2396,
and the same definitions of
digit
, alpha
, alphanum
,
reserved
, unreserved
and uric
.
oai-identifier = scheme ":" namespace-identifier ":" local-identifier scheme = "oai" namespace-identifier = domainname-word "." domainname domainname = domainname-word [ "." domainname ] domainname-word = alpha *( alphanum | "-" ) local-identifier = 1*uric
Any uric
elements are permitted in the local-identifier
.
Since characters in the reserved
set do not have any
special meaning in the local-identifier
component, they
are permitted unescaped. All characters not included
in the unreserved
and reserved
sets must
be escaped
(using the same
encoding
as OAI-PMH requests).
Characters in the unreserved
and reserved
sets
must not be escaped.
An oai-identifier
should never be unescaped, the sole
purpose of permitting escaped
characters is to allow
repositories to map any internal identifier to the
local-identifier
part of an oai-identifier
.
The following definitions are copied from
RFC 2396
for convenience:
uric = reserved | unreserved | escaped reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," unreserved = alphanum | mark mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
To avoid the possibility of inconsistently generated escaped
characters in an oai-identifier
, the hex
digits must use uppercase for the letters A
though F
.
This is a further restriction on RFC 2396. Thus, escaped
and
hex
are defined as follows:
escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F"
Organizations must choose namespace-identifier
values
which correspond to a domain-name that they have registered, and are
committed to maintaining. Note that since the oai-identifier
is case-sensitive, a particular capitalization style must be selected and
used consistently. A single domain name should not be used with variant
capitalizations.
Domain name registration is used to avoid the need for any additional
registration service for oai-identifiers
. Domain name
based identifiers guarantee global uniqueness without the need for
OAI registration as required with the earlier, v1.0/1.1 specification.
Two oai-identifiers
are equivalent if they are identical
strings. All three parts of the oai-identifier
are case
sensitive. Any escaped
elements must be left escaped;
there is no ambiguity because it is permissible (and required) only
to escape characters than cannot be included directly.
An oai-identifier
scheme was introduced in
OAI-PMH v1.0
and remained unchanged in
OAI-PMH v1.1.
This scheme has been widely adopted and existing identifiers may
continue to be used by referring to the old schema:
http://www.openarchives.org/OAI/1.1/oai-identifier.xsd
.
To use this new oai-identifier
scheme, repositories must
make the following changes:
Identify
response to refer to the new schema.
namespace-identifier
to replace the repository-identifier
.
local-identifier
components of any identifiers
exposed use the restricted character set (uric
) of this specification.
This may mean that internal identifiers need to be escaped to create the
local-identifier
component. The characters <space>
and # were used with the earlier oai-identifier
scheme and
may no longer be used in the local-identifier
component.
When used as an argument in an OAI-PMH request, an oai-identifier
must be correctly encoded. This means that the colon (:
)
separators and the percent (%
) characters of escaped
characters in the local-identifier
part must be
URL encoded.
For example, the oai-identifier
oai:an.oai.org:ab%3Ccd
would be encoded as
identifier=oai%3Aan.oai.org%3Aab%253Ccd
in an OAI-PMH request.
This means that characters in some internal identifier that an
oai-identifier
is derived from may be URL encoded twice
-- once to make the oai-identifier
, and a second time
to express the oai-identifier
in a URL. The URL will be decoded
once to recover the oai-identifier
.
The following are valid oai-identifier
identifiers:
oai:arXiv.org:hep-th/9901001 oai:foo.org:some-local-id-53 oai:FOO.ORG:some-local-id-53 ;not the same as above, ;should not use foo.org _and_ FOO.ORG oai:foo.org:some-local-id-54 oai:foo.org:Some-Local-Id-54 ;not the same as above, distinct identifier oai:wibble.org:ab%20cd ;space in internal id correctly escaped oai:wibble.org:ab?cd ;question mark should not be escaped
The following are not valid oai-identifier
identifiers:
something:arXiv.org:hep-th/9901001 ;bad scheme oai:999:abc123 ;namespace-identifier must not start with digit oai:wibble:abc123 ;namespace-identifier must be domain name oai:wibble.org:ab cd ;space not permitted (must be escaped as %20) oai:wibble.org:ab#cd ;# not permitted oai:wibble.org:ab<cd ;< not permitted oai:wibble.org:ab%3ccd ;< must be escaped at %3C not %3c
description
container
The following XML schema
(oai-identifier.xsd
)
defines the format of a description
container in the
Identify
response so that repositories may expose their compliance
with the the oai-identifier
format.
|
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns:oai-identifier="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified"> <annotation> <documentation> Schema for description section of Identify reply of OAI-PMH v2.0. For repositories that comply with the oai format for unique identifiers for items records. See: http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002 Simeon Warner $Date: 2002/06/21 20:14:34 $ </documentation> </annotation> <element name="oai-identifier" type="oai-identifier:oai-identifierType"/> <complexType name="oai-identifierType"> <sequence> <element name="scheme" minOccurs="1" maxOccurs="1" type="string" fixed="oai"/> <element name="repositoryIdentifier" minOccurs="1" maxOccurs="1" type="oai-identifier:repositoryIdentifierType"/> <element name="delimiter" minOccurs="1" maxOccurs="1" type="string" fixed=":"/> <element name="sampleIdentifier" minOccurs="1" maxOccurs="1" type="oai-identifier:sampleIdentifierType"/> </sequence> </complexType> <simpleType name="repositoryIdentifierType"> <restriction base="string"> <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/> </restriction> </simpleType> <simpleType name="sampleIdentifierType"> <restriction base="string"> <pattern value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/> <!--meta ., \, ?, *, +, {, } (, ), [ or ] --> </restriction> </simpleType> </schema> |
This Schema is available at http://www.openarchives.org/OAI/2.0/oai-identifier.xsd |
The following examples are excerpts from Identify
responses which may contain
zero or more <description>
containers.
<description> <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier http://www.openarchives.org/OAI/2.0/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>bespa.org</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:bespa.org:medi99-123</sampleIdentifier> </oai-identifier> </description> |
<description> <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier http://www.openarchives.org/OAI/2.0/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>oai-stuff.foo.org</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:oai-stuff.foo.org:5324</sampleIdentifier> </oai-identifier> </description> |
The following two sections describe how the oai-identifier
meets the requirements for URN schemes outlined in
RFC1737.
oai-identifiers
should have global scope in the sense
that two equivalent oai-identifiers
should have the same meaning
everywhere (i.e. they identify the same metadata item).
oai-identifier
should never be assigned to
different metadata items. To be useful for dedupping, the same metadata item
should not have more than one oai-identifier
. Note that this does not imply
that there will not be more than one metadata item (and hence oai-identifier
)
that describe the same underlying resource.
oai-identifiers
will be permanent.
That is, oai-identifiers
must remain globally unique and items should
retain the same oai-identifier
.
(This is considerably weaker than RFC1737.)
oai-identifiers
should not be
limited by the syntax. Separation into two parts:
a namespace-identifier
and a local-identifier
assures scalability in the same way as other URI schemes.
oai-identifiers
does
not accommodate existing oai-identifiers
created
for use with OAI-PMH versions 1.0 and 1.1. Repositories wishing
to use that scheme may still do so,
see "Backwards compatibility".
oai-identifier
scheme is designed
around a model of namespace-identifier
and
local-identifier
. While the syntax of
local-identifier
is undefined and may be used for some
possible extensions, the rest of the syntax is not. A more complex
scheme could be supported by extension of the
namespace-identifier
syntax or by the creation of a
new URI scheme (OAI-PMH allows arbitrary URIs as identifiers).
(This is considerably weaker than RFC1737.)
oai-identifiers
are intended to serve as
identifiers for metadata items within repositories. It is not intended
that oai-identifiers
be used outside the context of a set
of interacting repositories and harvesters.
With knowledge of the repository that an oai-identifier
was obtained from, it will be possible to obtain the status of the
item and to disseminate metadata from it (provided the OAI-PMH
interface is operational).
No general resolution scheme is proposed or imagined. Any such scheme
would involve an additional registration database.
(This is considerably weaker than RFC1737.)
oai-identifiers
are not designed for human use, they are
designed to be used only with the OAI-PMH. As such, presentation in
text, electronic mail etc. is not important. This makes the encoding
requirements considerably simpler than those described in
RFC1737:
oai-identifier
.
oai-identifiers
.
oai-identifiers
should be able to be
transported unmodified over common Internet protocols (e.g. HTTP) and using
common encoding standards (e.g. XML, RDF).
oai-identifiers
should be easy to parse.
oai-identifiers
should be short so that
transmitting them and managing them within computer programs is convenient.
Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.
2002-06-14: Release of this document, combined with the release of OAI-PMH
version 2.0.
2002-06-21: Added type definitions to scheme
and
delimiter
elements in schema.