the Santa Fe Convention: the core document

	the Santa Fe Convention: the core document

The Santa Fe Convention is discontinued. Please use the Open Archives Initiative Protocol for Metadata Harvesting instead.

This core document presents a step by step approach for making your e-print archive or your service comply with the Santa Fe Convention. To clarify the Santa Fe guidelines, some underlying concepts are introduced first. Technical details are in separate documents, to which links are provided within the text and in a list at the end of this document.

Outline

Introductory concepts

For the data provider: how to make your e-print archive comply with the Santa Fe Convention?

Step 1: Choose a unique identifier for your e-print archive
Step 2: Use unique persistent identifiers for data in the archive
Step 3: Implement the Open Archives Metadata Set
Step 4: Implement and document other metadata formats supported by your e-print archive
Step 5: Implement the Dienst harvesting interface
Step 6: Let the Open Archives initiative know that your e-print archive is open

For the service provider: how to make your services comply with the Santa Fe Convention?

Introductory concepts

Archives and open e-print archives

We consider the following to be crucial components of an e-print archive:

A submission mechanism;
A long-term storage system.

In addition, we consider it crucial that an e-print archive be open, incorporating a mechanism that enables third parties to collect data from the archive. Such a mechanism allows third parties to create end-user services that support the discovery, presentation and analysis of data in the archive. We recognize that most e-print archives will also provide end-user services. However, we consider that facilitating the broad dissemination of archive data through third party services is a crucial feature of an e-print archive.

Managed e-print archives

We also assume that e-print archives are managed. This means that they have some form of policy with regard to the submission of documents and also a policy with regard to the preservation and retention of documents.

Data providers and service providers

Consistent with the objective of the Santa Fe Convention and the identification of the crucial functions of an e-print archive, we make a distinction between data providers and service providers.

A data provider is the manager of an e-print archive, acting on behalf of the authors submitting documents to the archive. As pointed out above, the data provider of an open archive will, at least, provide a submission mechanism, a long-term storage system and a mechanism that enables third parties to collect data from the archive.
A service provider is a third party, creating end-user services based on data stored in e-print archives. For instance, a service provider could implement a search engine for mathematical e-prints stored in archives worldwide.

Data in an e-print archive

An archive may store metadata that describes full content without storing the full content itself. In this case, we consider the metadata as a record. However, we assume that if full content is stored, there will always be associated metadata stored in the archive as well as a mechanism to tie metadata and content together. In this case we consider the combination of metadata and full content as a record.

In this convention, therefore, we define an archive as a collection of records. These records have the following properties:

A record in an e-print archive contains, at least, metadata that describes full content;
A record in an e-print archive may also contain full content such as a research paper, a dataset, software, etc. or a bundle of these.

For the data provider: how to make your e-print archive comply with the Santa Fe Convention?

Step 1: Choose a unique identifier for your e-print archive

To support interoperability, each archive should have a unique archive identifier. This identifier refers to the authority managing the archive or to the archive initiative. Choose an identifier that consists of alphanumerical characters [a-z, A-Z, 0-9]. To make sure that your archive identifier does not coincide with that of another archive, check the list of existing identifiers at http://www.openarchives.org/sfc/sfc_archives.htm. Formally, the case of characters in the archive identifier is significant; however identifiers should be selected to be distinct regardless of case.

When setting up an open archive for the University of Spa in Belgium, one could choose BESPA as the archive identifier.

The existing NCSTRL initiative could choose NCSTRL, while the RePEc initiative could opt for RePEc.

Step 2: Use unique persistent identifiers for records in the archive

Records in your archive should have unique persistent identifiers. It is up to you to make sound decisions on their structure and generation.

When you combine a unique archive identifier and a unique record identifier for a record in your archive, the result is a full identifier for a record in your archive, that will never coincide with a full identifier of a record in another archive. This makes the job of service providers a lot easier. Choose a printable, non-alphanumeric character as a separator to delimit the archive identifier from the record identifier.

The archive of the University of Spa will accord meaningful identifiers to metadata submitted to the archive. Let us assume that those identifiers consist of the faculty of the first author followed by a sequence number that contains date information. Hence, a record identifier might be MEDICINE/19991104/012. If this archive chooses a hyphen to delimit the archive identifier from the record identifier, the full identifier for the record would be: BESPA-MEDICINE/19991104/012.

arXiv.org uses record identifiers that start with the name of a sub-archive followed by a date-sensitive sequence number. For instance, physics/9811004 and hep-th/9909044 are valid identifiers in this archive. In this case, using a colon as delimiter, the full identifiers would become arXiv:physics/9811004 and arXiv:hep-th/9909044.

The identifier of a record in your archive - either the full identifier or the record identifier (without its leading archive identifier) - will be the crucial key for extracting the metadata for a record. In some archives, it may also be the key to get to the full content of the record. In other archives, other metadata elements within a record will point to the full content.

Step 3: Implement the Open Archives Metadata Set

We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set (oams) which is described at http://www.openarchives.org/sfc/sfc_oams.htm.

Step 4: Implement and document other metadata formats supported by your e-print archive

Service providers will be able to provide more powerful services for users if a metadata format that is richer than the basic oams can be harvested from your archive. We encourage data providers to provide access to the full richness of metadata available to support discovery and retrieval of records in their archive, preferably by adopting an exchange format already used by other e-print archives. To help you determine whether an existing format can serve your needs, we maintain a list of such metadata formats at http://www.openarchives.org/sfc/sfc_metadata.htm.

If no existing format suits your needs, you must take on the tough task of compiling your own format. For compliance with the harvesting interface presented in Step 5, you will also have to define an XML representation for it. It is important that you document your metadata format fully and make the documentation available to service providers. Again, it will make their jobs a lot easier. If you inform the Open Archives initiative about the alternative metadata format for your archive, we will include it on the OA list of metadata formats at http://www.openarchives.org/sfc/sfc_metadata.htm, so that others can benefit from your work.

The formats that are in use or under implementation by the archives contributing to the compilation of this convention are - at the time of writing - the following: MARC, ReDIF, Dublin Core, REFER, RFC 1807, Open Archives Metadata Set.

Step 5: Implement the Dienst harvesting interface

Once your archive has identifiers and supports one or more metadata formats, the next step turns it into an open archive, by making sure that service providers can access data from your archive. Because we anticipate the creation of many open archives and because we want services across those archives to be built in a short to medium time period, we strongly recommend that all archives implement the same harvesting interface. Such a harvesting interface must allow third parties to write software that collects data selectively from the open archives. We propose a harvesting interface that complies with the Open Archives Subset of the Dienst Protocol, which is described in detail at http://www.openarchives.org/sfc/sfc_dienst.htm. To support a better understanding of the recommendations in this convention, we provide a brief description of the protocol subset here.

The Dienst protocol has an HTTP-based implementation. Its Open Archives subset defines a communication procedure, as well as the syntax for the corresponding messages and responses, that will allow service providers to harvest metadata selectively from open archives that comply with the Santa Fe Convention. The procedure has three steps:

Action 1: an archive can be polled to obtain the following information about the archive:
- The logical partitions into which records are grouped within the archive;
- The metadata formats that are supported for delivery of archive metadata in response to a harvesting request.

The Open Archives Subset of the Dienst protocol defines how such polling requests should be sent and the syntax used by an archive to respond to such requests. It does not define the set of valid responses to the metadata format request. But in the Open Archives context, the list of valid formats and their identifiers is available at http://www.openarchives.org/sfc/sfc_metadata.htm.

Archives should support several criteria to divide their content into logical partitions that are recognized by the Open Archives Dienst Subset. We especially recommend subject-oriented partitioning as well as partitioning based on author affiliation.

Action 2: a list of identifiers for records in an archive can be requested.
We expect these identifiers to be the unique persistent identifiers of the records in the archives. The Open Archives Dienst Subset defines the syntax to request:

A list of identifiers for all records in the archive;
A list of identifiers for the records in a partition of the archive;
A list of identifiers for records that have become available in the archive after a specified date;
A list of identifiers for records that have become available in an archive partition after a specified date.

The Open Archives Dienst Subset also defines the way in which the list of identifiers will be returned.

Action 3: given a list of identifiers (obtained by Action 2) and a supported metadata format (identified through Action 1), a request to harvest metadata can be sent.
The Open Archives Dienst Subset defines the syntax for the harvesting request. In the subset, exchange of metadata is always in XML, whichever metadata set is chosen. A list of formats and details on their interpretation and XML representation is available at http://www.openarchives.org/sfc/sfc_metadata.htm. An archive should respond to a request for metadata in a specified format by returning data rendered according to the corresponding exchange format.

Step 6: Let the Open Archives initiative know that your e-print archive is open

The last step is to join the community of open e-print archives by informing the Open Archives initiative of the details for your open archive. You should use the template we provide for that purpose at http://www.openarchives.org/sfc/data_provider_template.htm to do so. Fill out the appropriate information, install it on your archive server and send us an e-mail at openarchives@openarchives.org to inform us of its URL. We will include your archive in the list of Santa Fe compliant archives that we maintain at http://www.openarchives.org/sfc/sfc_archives.htm , thus helping to make your archive conveniently visible to service providers. You will see that the form also includes a possibility to inform service providers of restrictions that apply in the usage of your data.

You may also want to insert the Open Archives logo in the main entry point of your archive to indicate that you have joined the initiative.

For the service provider: how to make your services comply with the Santa Fe Convention?

As you can see from the recommendations we make to data providers, our aim is to make it easy for you - the service provider - to create services based on open archive data. In return, we ask you to comply with the following:

Step 1: Retain the original identifiers in your services

When you create services based on records originating from open archives, keep the original full identifiers associated with the data as a means of indicating the provenance of records.

Step 2: Comply with the usage restrictions specified by the data providers

With regard to the usage of data from the open archives, respect the restrictions that data providers mention in the form describing their archive. For each open archive, this form is accessible via http://www.openarchives.org/sfc/sfc_archives.htm.

Step 3: Let the Open Archives initiative know that you have developed a service based on open archives data

Inform the Open Archives initiative and thus the data providers about the data that you are harvesting and about the use you make of it. To facilitate this, we provide a template at http://www.openarchives.org/sfc/service_provider_template.htm that you are encouraged to use and fill out. Install it on your server and inform us of its URL by sending us e-mail at openarchives@openarchives.org. By doing so, you are joining the community of Open Archives. We will list all information on service providers at http://www.openarchives.org/sfc/sfc_services.htm.

You may also want to insert the Open Archives logo in the main entry point of your service to indicate that you have joined the initiative.

Supporting information is available at:

the Santa Fe Convention
the Open Archives Metadata Set
the Open Archives Dienst Subset
the list of Open Archives data providers, including their unique archive identifiers
the list of Open Archives service providers
the list of metadata formats used in the Open Archives context
the template to be used by data providers to register as a Santa Fe compliant archive
the template to be used by service providers to register as a Santa Fe compliant service

	__________________

	get in touch with the Open Archives initiative by contacting openarchives@openarchives.org
	last updated January 20th 2001