[OAI-implementers] XML Schema for OAI compliance with NSDL harvesting
Carl Lagoze
lagoze@cs.cornell.edu
Thu, 13 Dec 2001 07:34:20 -0500
Tim,
Thanks for the note. Since I am closely involved in both OAI and NSDL,
I was thinking of this issue also. I hope you don't mind that I've
reflected this issue back to the tech group and to Diane Hillmann, my
colleague at Cornell who is the metadata specialist in NSDL.
I see us getting drawn into the implementation end of what the dc
community calls "application profiles". For a brief bit of background
to the rest of the group:
- DC started out with this concept of 15 elements with some fuzzy
thinking that these elements would make up some kind of interoperable
metadata "record", defined as a packaging of the elements e.g., as a set
of meta tags, or xml file, etc. As noted many times, all elements are
optional and repeatable in instances of such records, and the data types
of values of elements is undefined. This allows some measure of
interoperabilty among "dc records, albeit pretty low, essentially saying
"in a set of many 'dc records' you will only find the specified 15
elements, but no guarantee which ones will be there". Essentially, this
is what we've tried to achieve with mandatory dc and dc.xsd in OAI-PMH
1.x.
- DC then moved onto the notion of "qualified dc", where semantics or
value constraints on DC could be tightenned. In this process, there was
still some fuzzy thinking that there might be a "dc record" but instead
of being made up of statements like "date is Sept. 1, 2001", it would
consist of statements like "date, created, is 2001-09-01, in ISO 8601
format". The previous loose interoperability assumption than existed
with the added "dumb-down" notion specifying "you wil find something
more constrained than a standard dc element/value pair and you will be
able to map it back to its unqualified form". As we all know, in
OAI-PMH we decided
- The latest DC thinking now includes the notion of "application
profiles", saying that a "metadata record" may include dc elements (all
optional, all repeatable) that can be mixed and matched with metadata
elements from one or more other metadata vocabularies. The nature of
the mixing is, I believe, unspecified by DC - i.e., the application
profile notion may allow for a dc element to be nested within the value
structure of an element from another vocabulary (such as the dc-ed
vocabulary mentioned in Tim's note). For example, imagine a metadata
record such as:
<foo.hairColor>brown</foo.hairColor>
<foo.origin>
<dc.date>2000-01-01</dc.date>
<blatz.personality>nasty</blatz.personality>
</foo.origin>
Interoperability in this world of application profiles now becomes
"interoperability among a set of records conforming to several
application profiles means that somewhere in each of those records are 0
or more dc elements". Of course, one could restrict interoperability
within one application profile, which is presumably defined by some
schema, but then the presumed cross domain interoperability purpose of
dc is seemingly lost.
In the OAI world, I see we are left with the following options:
1. Remain at our notion of original, albeit low-level, dc
interoperability - demand through a schema a record of unqualifed dc
elements.
2. Loosen the schema to allow qualified dc
3. Allow for full-blast dc "application profiles", dc elements may be
mixed with otheres from other vocabularies in unconstrained ways,
undefined by a schema.
4. find some proper mid-points. - for example, create an xml schema that
specifies 0 or more dc elements at the top level, perhaps qualified,
that are mixed with other top-level elements.
My, as usual subjective, view is:
1. our original goal of mandatory dc for interoperability in OAI-PMH is
looking somewhat threadbare anyway, in light of the fact that dc as a
record format is appropriate for non-bibliographic items - people,
events.
2. the application profile stuff sounds fine from a conversational
perspective, but at the level of implementation and interoperability it
is far from clean. I'm not certain what a schema defining the rules for
interopreability among application profiles would look like.
Perhaps Andy Powell can kick in with some thinking since he is more
involved in DCMI than I am. Or Diane Hillmann whose NSDL and DCMI ties
are close.
Carl
> -----Original Message-----
> From: Tim Cole [mailto:t-cole3@uiuc.edu]
> Sent: Wednesday, December 12, 2001 6:43 PM
> To: Carl Lagoze
> Cc: Thomas G. Habing
> Subject: Re: [OAI-implementers] XML Schema for OAI compliance
> with NSDL
> harvesting
>
>
> Carl-
>
> I know XML Schema was one of last week's Tech Committee
> topics, but Zubair's
> note suggested an additional issue the Committee might want
> to consider:
>
> The way I read the NSDL metadata recommendation, they're endorsing the
> addition of a dc-ed:audience element to the base 15 elements
> of simple DC.
> Metadata files including this element will not validate against either
> current or our proposed oai_dc.xsd. If OAI decides to revise
> oai_dc XML
> schema along lines suggested in XML Schema
> (http://oaitech.comm.nsdlib.org/WhitePapers/xml_schema_whitepa
> per.htm),
> should we also allow for optional inclusion of dc-ed:audience element?
>
> Tom Habing and I have been experimenting this week with what
> an XSD that
> allows inclusion of dc-ed:audience element might look. When we're
> satisfied, I'll go ahead and post (on SourceForge XML Schema
> Forum) details
> of how this might be done for the Committee's consideration.
> Might be a
> complication we don't want to undertake, but on the other
> hand, it could
> facilitate use of OAI in NSDL context, and likely in IMLS
> context as well.
> If we decide to revise oai_dc.xsd, we should at least
> consider it as an
> option.
>
> Tim Cole
> University of Illinois at Urbana-Champaign
>
> ----- Original Message -----
> From: <zubair@cs.odu.edu>
> To: <schalk@unf.edu>
> Cc: <collections-group@nsdl1.comm.nsdlib.org>;
> <oai-implementers@oaisrv.nsdl.cornell.edu>; "Carl Lagoze"
> <lagoze@cs.cornell.edu>
> Sent: Sunday, December 09, 2001 2:44 PM
> Subject: RE: [OAI-implementers] XML Schema for OAI compliance
> with NSDL
> harvesting
>
>
> >
> > Dear Stuart,
> >
> > I was there in the Dec. 3-4 NSDL PIs meeting and may be
> able to add a
> > little more to what Carl has already stated. After talking to Diane
> Hillman
> > and browsing information on the NSDL website, this is my
> understanding.
> >
> > "The NSDL Standards Working Group has determined that the
> Dublin Core set
> > of 15 qualified elements, plus the elements recommended by the DC
> Education
> > Working Group, will be the standard set used by the NSDL metadata
> > repository.
> > (Ref:
> >
> http://siteforscience.nsdl.cornell.edu/metadata_info/overview.
> html#NSDL ).
> >
> > Besides DC-Ed, they have also identified other formats NSDL plans to
> > support. You can get information about these from the Web sites:
> >
> > http://siteforscience.nsdl.cornell.edu/metadata_info/outline.html
> > http://www.smete.org/nsdl/workgroups/standards/standards_home.html
> > http://128.253.121.110/NSDLmetaWG/IntroPage.html
> >
> > These metadata formats can be supported as parallel
> metadata formats in
> > OAI. Regarding Mac OS X, I recently bought a Powerbook G4
> with OS X and
> > there is full Java support on it and it comes with Apache
> web server. I
> > have not yet tried the Tomcat servlet engine on it - there is one
> available
> > for it. In summary, you should be able to host an OAI compliant
> > collection using some of the Java based tools available on
> the tool site
> > pointed out by Carl. In fact you should be able to use even
> Perl tools (OS
> > X is based on BSD Unix).
> >
> >
> > Zubair
> >
> >
> >
> >
> >
> >
> >
> > "Carl Lagoze"
> > <lagoze@cs.cornell.edu> To:
> "Stuart Chalk" <schalk@unf.edu>,
> > Sent by:
> <oai-implementers@oaisrv.nsdl.cornell.edu>
> > oai-implementers-admin@oaisrv.nsdl.c cc:
> <collections-group@nsdl1.comm.nsdlib.org>
> > ornell.edu
> Subject:
> RE: [OAI-implementers] XML Schema for OAI compliance with
> > NSDL
> harvesting
> >
> > 12/09/2001 01:55 PM
> >
> >
> >
> >
> >
> >
> > Stuart,
> >
> > Thanks for your question. I've taken the liberty to add
> the members of
> > NSDL collections to this response.
> >
> > Regarding your first question: The OAI already defines an
> xml schema for
> > its one required metadata format - dublin core. This is
> explained at
> >
> http://www.openarchives.org/OAI_protocol/openarchivesprotocol.
> html#dubli
> > ncore and the actual schema is at
> > http://www.openarchives.org/OAI/1.1/dc.xsd. As for the
> other metadata
> > formats that NSDL participants will share, there are as far
> as I know no
> > established schema as of yet. Perhaps Diane Hillmann can
> chip in here
> > and share whether this is true. If there are indeed no
> schemas as of
> > yet, we should certainly settle on these in the near future.
> >
> > Regarding your second question: This is the right list for
> discussion
> > of hardward and software for acting as an OAI data
> provider. At this
> > point I don't know of anyone who is using Mac OS X for this but that
> > doesn't mean that there isn't someone else who isn't doing
> that. There
> > is a growing list of tools at
> > http://www.openarchives.org/tools/tools.html and I believe that the
> > DLESE folks are working on a java based implementation that
> should run
> > with little problem on OS X.
> >
> > Carl
> >
> > > -----Original Message-----
> > > From: Stuart Chalk [mailto:schalk@unf.edu]
> > > Sent: Sunday, December 09, 2001 6:35 AM
> > > To: oai-implementers@oaisrv.nsdl.cornell.edu
> > > Subject: [OAI-implementers] XML Schema for OAI compliance
> with NSDL
> > > harvesting
> > >
> > >
> > > I would like to thank all of you that were at the meeting talking
> > > about harvesting and OAI. Being a novice at this I
> > > appreciate how well
> > > the area of OAI was described and I now have a much better
> > > perspective on
> > > its use and implementation.
> > >
> > > My question, I hope, is simple. From reading
> > > http://www.openarchives.org/OAI_protocol/openarchivesprotocol.html
> > > I get that I need to generate a few files that describe
> the different
> > > schema for supplying metadata via OAI. I also see the
> > > formatted XML that
> > > needs to be returned to the requester. My question is - is
> > > there yet an
> > > NSDL format for the data returned via XML, and what are the
> > > minimum number
> > > of schema files I need to generate so that I can support NSDL
> > > harvesting
> > > via OAI? If they do exist where are they and is there a help
> > > file that
> > > goes with them?
> > >
> > > Any help greatly appreciated.
> > >
> > > On a separate topic - is there a list like this for
> discussion of the
> > > hardware and software being used to serve the collection data? As
> > > webmaster of the "Analytical Sciences Digital Library"
> > > project I want to
> > > use Mac OS X, Webstar V, Lasso Pro 5, and Filemaker Pro
> 5.5 all on the
> > > same Mac. I am convinced that this will be both powerful
> enough and
> > > scalable enough my co-PIs are worried I will run into trouble
> > > using this
> > > for production. Any places to go greatly appreciated also.
> > >
> > > --
> > > Stuart Chalk, Ph.D.
> > > Department of Natural Sciences
> > > Phone:904-620-2831
> > > University of North Florida
> > > Fax:904-620-3885
> > > 4567 St. Johns Bluff Road S. "The Flow
> > > Analysis Database"
> > > Jacksonville FL 32224 USA
> > http://www.fia.unf.edu/
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
>
>