the Santa Fe Convention: the core document |
|
The Santa Fe Convention is discontinued.
Please use the Open
Archives Initiative Protocol for Metadata Harvesting instead.
|
This core document presents a step by step approach for making your e-print archive or your service comply with the Santa Fe Convention. To clarify the Santa Fe guidelines, some underlying concepts are introduced first. Technical details are in separate documents, to which links are provided within the text and in a list at the end of this document.
We consider the following to be crucial components of an e-print archive:
In addition, we consider it crucial that an e-print archive be open, incorporating a mechanism that enables third parties to collect data from the archive. Such a mechanism allows third parties to create end-user services that support the discovery, presentation and analysis of data in the archive. We recognize that most e-print archives will also provide end-user services. However, we consider that facilitating the broad dissemination of archive data through third party services is a crucial feature of an e-print archive.
We also assume that e-print archives are managed. This means that they have some form of policy with regard to the submission of documents and also a policy with regard to the preservation and retention of documents.
Consistent with the objective of the Santa Fe Convention and the identification of the crucial functions of an e-print archive, we make a distinction between data providers and service providers.
An archive may store metadata that describes full content without storing the full content itself. In this case, we consider the metadata as a record. However, we assume that if full content is stored, there will always be associated metadata stored in the archive as well as a mechanism to tie metadata and content together. In this case we consider the combination of metadata and full content as a record.
In this convention, therefore, we define an archive as a collection of records. These records have the following properties:
To support interoperability, each archive should have a unique archive identifier. This identifier refers to the authority managing the archive or to the archive initiative. Choose an identifier that consists of alphanumerical characters [a-z, A-Z, 0-9]. To make sure that your archive identifier does not coincide with that of another archive, check the list of existing identifiers at http://www.openarchives.org/sfc/sfc_archives.htm. Formally, the case of characters in the archive identifier is significant; however identifiers should be selected to be distinct regardless of case.
Records in your archive should have unique persistent identifiers. It is up to you to make sound decisions on their structure and generation.
When you combine a unique archive identifier and a unique record identifier for a record in your archive, the result is a full identifier for a record in your archive, that will never coincide with a full identifier of a record in another archive. This makes the job of service providers a lot easier. Choose a printable, non-alphanumeric character as a separator to delimit the archive identifier from the record identifier.
The identifier of a record in your archive - either the full identifier or the record identifier (without its leading archive identifier) - will be the crucial key for extracting the metadata for a record. In some archives, it may also be the key to get to the full content of the record. In other archives, other metadata elements within a record will point to the full content.
We recognize that archives will use specific metadata sets and formats that suit the needs of their communities and the types of data they handle. However, interoperability depends on a shared format for exchanging metadata and therefore archives should implement the basic Open Archives Metadata Set (oams) which is described at http://www.openarchives.org/sfc/sfc_oams.htm.
Service providers will be able to provide more powerful services for users if a metadata format that is richer than the basic oams can be harvested from your archive. We encourage data providers to provide access to the full richness of metadata available to support discovery and retrieval of records in their archive, preferably by adopting an exchange format already used by other e-print archives. To help you determine whether an existing format can serve your needs, we maintain a list of such metadata formats at http://www.openarchives.org/sfc/sfc_metadata.htm.
If no existing format suits your needs, you must take on the tough task of compiling your own format. For compliance with the harvesting interface presented in Step 5, you will also have to define an XML representation for it. It is important that you document your metadata format fully and make the documentation available to service providers. Again, it will make their jobs a lot easier. If you inform the Open Archives initiative about the alternative metadata format for your archive, we will include it on the OA list of metadata formats at http://www.openarchives.org/sfc/sfc_metadata.htm, so that others can benefit from your work.
Once your archive has identifiers and supports one or more metadata formats, the next step turns it into an open archive, by making sure that service providers can access data from your archive. Because we anticipate the creation of many open archives and because we want services across those archives to be built in a short to medium time period, we strongly recommend that all archives implement the same harvesting interface. Such a harvesting interface must allow third parties to write software that collects data selectively from the open archives. We propose a harvesting interface that complies with the Open Archives Subset of the Dienst Protocol, which is described in detail at http://www.openarchives.org/sfc/sfc_dienst.htm. To support a better understanding of the recommendations in this convention, we provide a brief description of the protocol subset here.
The Dienst protocol has an HTTP-based implementation. Its Open Archives subset defines a communication procedure, as well as the syntax for the corresponding messages and responses, that will allow service providers to harvest metadata selectively from open archives that comply with the Santa Fe Convention. The procedure has three steps:
The Open Archives Subset of the Dienst protocol defines how such polling requests should be sent and the syntax used by an archive to respond to such requests. It does not define the set of valid responses to the metadata format request. But in the Open Archives context, the list of valid formats and their identifiers is available at http://www.openarchives.org/sfc/sfc_metadata.htm.
Archives should support several criteria to divide their content into logical partitions that are recognized by the Open Archives Dienst Subset. We especially recommend subject-oriented partitioning as well as partitioning based on author affiliation.
We expect these identifiers to be the unique persistent identifiers of the records in the archives. The Open Archives Dienst Subset defines the syntax to request:
The Open Archives Dienst Subset also defines the way in which the list of identifiers will be returned.
The Open Archives Dienst Subset defines the syntax for the harvesting request. In the subset, exchange of metadata is always in XML, whichever metadata set is chosen. A list of formats and details on their interpretation and XML representation is available at http://www.openarchives.org/sfc/sfc_metadata.htm. An archive should respond to a request for metadata in a specified format by returning data rendered according to the corresponding exchange format.
The last step is to join the community of open e-print archives by informing the Open Archives initiative of the details for your open archive. You should use the template we provide for that purpose at http://www.openarchives.org/sfc/data_provider_template.htm to do so. Fill out the appropriate information, install it on your archive server and send us an e-mail at openarchives@openarchives.org to inform us of its URL. We will include your archive in the list of Santa Fe compliant archives that we maintain at http://www.openarchives.org/sfc/sfc_archives.htm , thus helping to make your archive conveniently visible to service providers. You will see that the form also includes a possibility to inform service providers of restrictions that apply in the usage of your data.
You may also want to insert the Open Archives logo in the main entry point of your archive to indicate that you have joined the initiative.
As you can see from the recommendations we make to data providers, our aim is to make it easy for you - the service provider - to create services based on open archive data. In return, we ask you to comply with the following:
When you create services based on records originating from open archives, keep the original full identifiers associated with the data as a means of indicating the provenance of records.
With regard to the usage of data from the open archives, respect the restrictions that data providers mention in the form describing their archive. For each open archive, this form is accessible via http://www.openarchives.org/sfc/sfc_archives.htm.
Inform the Open Archives initiative and thus the data providers about the data that you are harvesting and about the use you make of it. To facilitate this, we provide a template at http://www.openarchives.org/sfc/service_provider_template.htm that you are encouraged to use and fill out. Install it on your server and inform us of its URL by sending us e-mail at openarchives@openarchives.org. By doing so, you are joining the community of Open Archives. We will list all information on service providers at http://www.openarchives.org/sfc/sfc_services.htm.
You may also want to insert the Open Archives logo in the main entry point of your service to indicate that you have joined the initiative.
__________________ | |
get in touch with the Open Archives initiative by contacting openarchives@openarchives.org |
|
last updated January 20th 2001 |