[OAI-implementers] Re: oia_dc records (was Re: [EP-tech] Free Text Indexing)
Andy Powell
a.powell@ukoln.ac.uk
Tue, 9 Jul 2002 21:16:09 +0100 (BST)
On Mon, 8 Jul 2002, ePrints Support wrote:
> X-Posted to OAI-Implementers from eprints-tech
>
> I've been thinking hard on this issue.
>
> I think that it is futile to build anything "clever" on top of
> unqualified DC.
>
> What would be far more useful is if a number of interested parties
> could agree on a value-added metadata format for doing "clever stuff".
I guess this is what the Academic Metadata Format (AMF) people have been
trying to do?
http://amf.openlib.org/doc/ebisu.html
I don't have an objection to what you suggest.
Just to clarify the rationale behind my suggestions about the current
oai_dc defaults in the eprints.org software...
1) all eprint archives that support OAI-PMH *must* support oai_dc (to be
compliant with the protocol)
2) we might as well make the default configuration of oai_dc in the
eprints.org software generate metadata that is as useful as possible (even
though it might not be as useful as some other, richer, format).
3) the oai_dc metadata generated by the eprints.org software should
conform to DCMI semantics and guidelines on best practice.
That's all my suggestions were trying to do. I hope this makes sense?
Regards,
Andy.
> Eg. You, the AKT project and citebase project at Southampton and
> similar projects.
>
> If a good number of service providers all use the same advanced
> metadata type then this will be an incentive for OAI archives to
> support it (esp. if EPrints leads the way).
>
> The alternative situation is that every archive supports oai_dc or
> oai_dc + 1 random other. Which is (nearly) useless for federating
> richer metadata.
>
> So the real question is what would people want to use OAI for
> beyond "dumb text search" resource discovery. Things which would
> involve more than just storing and searching the data but actually
> processing it. And what metadata schema would best suit the
> majority of these needs?
>
>
>
>
>
> On Sat, Jul 06, 2002 at 10:44:33AM +0100, Andy Powell wrote:
> > On Sat, 6 Jul 2002, Andy Powell wrote:
> >
> > > On Fri, 5 Jul 2002, ePrints Support wrote:
> > >
> > > > Has anybody actually modified the ArchiveOAIConfig.pm module? If none or
> > > > very few have a will feel confident to totally replace it with a different
> > > > system.
> > > >
> > > > If anyone made any *good* changes, lemmie know, they might be useful!
> > >
> > > Not sure if this is what you were asking for, but I have some
> > > comments/questions about the use of DC in the records exposed using
> > > OAI by default (i.e. without modifying the OAI config in 2.0.1).
> >
> > A simple set of context diffs to acheive most of what I say below (for
> > v2.0.1, sorry I haven't upgraded yet!) are attached. Note, probably
> > better to handle mime typing thru normal mime.types or whatever if
> > possible? Also, I didn't know how to get at the alternative URLs.
> >
> > Results can be viewed using repository explorer
> >
> > http://oai.dlib.vt.edu/cgi-bin/Explorer/oai2.0/testoai
> >
> > against
> >
> > http://eprints.bath.ac.uk/perl/oai
> >
> > Andy.
> >
> > > Here's a text view (cut-and-paste from the repository explorer) of a
> > > record from ePrints@Bath:
> > >
> > >
> > > title: An OAI Approach to Sharing Subject Gateway Content
> > > creator: Powell, Andy
> > > subject: UKOLN
> > > description: The Resource Discovery Network (RDN) has taken a...
> > > date: 2001-01-01
> > > type: Conference Poster
> > > identifier: http://eprints.bath.ac.uk/archive/00000003/
> > > format: pdf http://eprints.bath.ac.uk/archive/00000003/01/1097.pdf
> > >
> > > I think that dc:identifier should be the URI of the item not the URI of
> > > the abstract page about the item. For multiple-format items, simply
> > > repeat dc:identifer (as you currently do in dc:format).
> > >
> > > Your current use of dc:format is not ideal. It would be better (i.e.
> > > conform more with DCMI recommendations) to put a MIME type in dc:format -
> > > putting both a type and a URI in dc:format doesn't match with DCMI
> > > recommendations.
> > >
> > > I think the abstract page is related to the resource being described by
> > > the metadata (i.e. the abstract page is related to the item(s)), therefore
> > > it would be better to put the URI of the abstract into dc:relation.
> > >
> > > So, for the record above, I'd prefer to see
> > >
> > > identifier: http://eprints.bath.ac.uk/archive/00000003/01/1097.pdf
> > > format: application/pdf
> > > relation: http://eprints.bath.ac.uk/archive/00000003/
> > >
> > > with identifier and format repeated if more than one format is available.
> > > This isn't perfect (because there's no strong tie between the
> > > format/identifier pairs) but it is more in line with DCMI recommendations
> > > and semantics.
> > >
> > > The URIs for 'alternative' locations of the item (e.g. URIs external to
> > > the eprint archive) should also appear in repeated dc:identifier elements
> > > IMHO.
> > >
> > > (Note: there will be some in the DC-camp that would say that in the case
> > > of multiple formats being available, you should expose multiple DC
> > > metadata records using OAI-PMH (one for each available format), perhaps
> > > using dc:relation to tie them together. I would argue that this is
> > > probably over the top and would make life quite difficult for software
> > > trying to process the multiple metadata records).
> > >
> > > Finally, you seem to hardcode the day in the date to "01" (I think). Note
> > > that 2001-09 (i.e. a date without a day) is a perfectly valid ISO8601
> > > date so you don't really need to force the '-01' bit.
> > >
> > > I'd be interested in other's views on all this...
> > >
> > > Andy
> > > --
> > > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> > > http://www.ukoln.ac.uk/ukoln/staff/a.powell +44 1225 383933
> > > Resource Discovery Network http://www.rdn.ac.uk/
> > >
> > >
> > >
> >
> > Andy
> > --
> > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> > http://www.ukoln.ac.uk/ukoln/staff/a.powell +44 1225 383933
> > Resource Discovery Network http://www.rdn.ac.uk/
> >
>
> > *** ArchiveOAIConfig.pm Wed Jun 5 12:43:09 2002
> > --- ArchiveOAIConfig.pm.orig Thu Jun 13 17:29:33 2002
> > ***************
> > *** 280,293 ****
> >
> > $month = "01" if( !defined $month );
> >
> > ! push @dcdata, [ "date", "$year-$month" ];
> > }
> >
> > my $ds = $eprint->get_dataset();
> > push @dcdata, [ "type", $ds->get_type_name( $session, $eprint->get_value( "type" ) ) ];
> >
> > ! # dc:relation is the URL of the sbstract
> > ! push @dcdata, [ "relation", $eprint->get_url() ];
> >
> > # Export the type and URL of each actual document, this
> > # is far from ideal, but DC offers no easy solution to
> > --- 280,294 ----
> >
> > $month = "01" if( !defined $month );
> >
> > ! push @dcdata, [ "date", "$year-$month-01" ];
> > }
> >
> > my $ds = $eprint->get_dataset();
> > push @dcdata, [ "type", $ds->get_type_name( $session, $eprint->get_value( "type" ) ) ];
> >
> > ! # The identifier is the URL of the abstract page.
> > ! # possibly this should be the OAI ID, or both.
> > ! push @dcdata, [ "identifier", $eprint->get_url() ];
> >
> > # Export the type and URL of each actual document, this
> > # is far from ideal, but DC offers no easy solution to
> > ***************
> > *** 295,311 ****
> > # citation linking systems, so better to have it than not.
> >
> > my @documents = $eprint->get_all_documents();
> > - my %mime_types = (
> > - pdf => "application/pdf"
> > - );
> > foreach( @documents )
> > {
> > ! push @dcdata, [ "identifier", $_->get_url() ];
> > ! if( $_->is_set( "format" ) )
> > ! {
> > ! my $format = $mime_types{$_->get_value( "format" )};
> > ! push @dcdata, [ "format", $format ] if $format;
> > ! }
> > }
> >
> > return @dcdata;
> > --- 296,304 ----
> > # citation linking systems, so better to have it than not.
> >
> > my @documents = $eprint->get_all_documents();
> > foreach( @documents )
> > {
> > ! push @dcdata, [ "format", $_->get_value( "format" )." ".$_->get_url() ];
> > }
> >
> > return @dcdata;
>
>
> --
>
> Christopher Gutteridge eprints-support@ecs.soton.ac.uk
> ePrints2 Coder, Support and Stuff +44 23 8059 4833
>
>
Andy
--
Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/