[OAI-implementers] Net::OAI::Harvester
Ed Summers
ehs@pobox.com
Tue, 8 Jul 2003 13:45:26 -0400
A beta version of a Perl OAI-PMH harvesting library was just uploaded to
CPAN as Net::OAI::Harvester. The idea behind Net::OAI::Harvester is to
provide an object-oriented client interface to the data found in OAI-PMH
repositories (similar to what LWP::UserAgent does for HTTP).
More about OAI-PMH can be found here:
http://www.openarchives.org
And more about Net::OAI::Harvester can be found here:
http://search.cpan.org/author/ESUMMERS/OAI-Harvester-0.1/
All of the 6 OAI-PMH verbs are supported. As an example here is the code to
retrieve a particular record from LC as Dublin Core and display the title.
my $harvester = Net::OAI::Harvester->new(
baseUrl => 'http://memory.loc.gov/cgi-bin/oai2_0'
);
my $record = $harvester->getRecord(
identifier => 'oai:lcoa1.loc.gov:loc.gmd/g3764s.pm003250',
metadataPrefix => 'oai_dc'
);
my $metadata = $record->metadata();
print "title: ", $metadata->title(), "\n";
Features:
- OAI-PMH responses can often be rather large XML files. Net::OAI::Harvester
uses stream based parsing (XML::SAX) and serializes data as Perl objects on
disk (using YAML). This serialized data is then made available through
an iterator interface which means that you keep a relatively low
memory foot print when doing ListRecords or ListIdentifiers requests.
- Net::OAI::Harvester includes Net::OAI::Record::OAI_DC which is an
XML::SAX handler for parsing and providing an object oriented
interface to baseline Dublin Core metadata. It also provides a
framework for dropping in your own XML::SAX handler if you want to
parse other types of metadata. The idea is that as people create their
own handlers they can be easily included in the Net::OAI::Harvester
distribution.
- If you are interested in the XML itself you can easily get a hold of the
temporary file that contains the full XML response, and do what you want
with it.
- You can easily can a hold of the error code and message associated with any
request.
Caveats:
- Net::OAI::Harvester only supports OAI-PMH v.2.
- No support for compression (yet).
- Needs more documentation, and examples.
- You need to handle resumptionTokens explicitly. This means a call to
listRecords() will not go and grab everything, but just the first chunk.
However, there is infrastrucutre and methods to easily get at and pass the
tokens.
Feedback/comments/testser would be appreciated. If you are at all interested in
getting involved in the project please write to me directly, or (preferably)
use perl4lib@perl.org or oai-implementers@oaisrv.nsdl.cornell.edu.
//Ed