[OAI-implementers] XML Schema problem?
Young,Jeff
jyoung@oclc.org
Mon, 23 Apr 2001 11:42:05 -0400
Works fine for me. I like it.
Jeff
> -----Original Message-----
> From: herbert van de sompel [mailto:herbertv@cs.cornell.edu]
> Sent: Monday, April 23, 2001 11:21 AM
> To: thabing@uiuc.edu; OAI-implementers
> Subject: Re: [OAI-implementers] XML Schema problem?
>
>
> hi Thomas,
>
> thanks for this. this approach sounds good to me:
>
> * it validates with XSV
>
> * it would be nice if Jeff could test this approach in Xerces
>
> * I checked this with the most recent XML Schema specs, and the nice
> thing about it is that this approach would not require any
> changes to be
> made when moving over at a certain point.
>
> If Jeff comes back with a positive message, I suggest to go
> for Thomas'
> approach.
>
> herbert
>
>
>
> "Thomas G. Habing" wrote:
> >
> > Herbert-
> >
> > In the XSDs, wouldn't it be simpler to change use attribute
> value in the
> > status attribute declaration to "optional" (delete the
> value attribute) and
> > then tie it's type to an enumerated list that only allows the value
> > "deleted". With no value attribute and the use attribute
> set to optional
> > (as opposed to default or fixed) in the status attribute
> declaration, the
> > parser shouldn't assume a value. The enumerated list still
> restricts the
> > allowable values of the status attribute in document
> instances. This seems
> > to work in other parsers but we've not tried it in Xerces.
> Here's the
> > attribute declaration as we're suggesting:
> >
> > <complexType name="recordType">
> > <sequence>
> > <element name="header" minOccurs="1" maxOccurs="1"
> > type="oai:headerType"/>
> > <element name="metadata" minOccurs="0" maxOccurs="1"
> > type="oai:metadataType"/>
> > <element name="about" minOccurs="0" maxOccurs="1"
> > type="oai:aboutType"/>
> > </sequence>
> > <attribute name="status" use="optional" type="oai:statusType"/>
> > </complexType>
> >
> > ...
> >
> > <simpleType name="statusType">
> > <restriction base="string">
> > <enumeration value="deleted"/>
> > </restriction>
> > </simpleType>
> >
> > Tim Cole
> > Tom Habing
> > University of Illinois
> >
> > herbert van de sompel wrote:
> > >
> > > hi Jeff,
> > >
> > > Thanks for this. Your consideration is correct, there is
> a problem in
> > > the schema that use the "status" attribute. That is GetRecord,
> > > ListRecords and ListIdentifiers.
> > >
> > > This is what the September 2000 schema specs say re specifying
> > > occurencies of attributes. In the excerpt that I
> include, reference is
> > > made to the following declaration in an xsd file:
> > >
> > > <xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed"
> > > value="US"/>
> > >
> > > "
> > > Attributes may appear once or not at all (the default), and so the
> > > syntax for specifying occurrences of attributes
> > > is different than the syntax for elements. In particular, a use
> > > attribute is used in an attribute declaration to
> > > indicate whether the attribute is required or optional,
> and if optional
> > > whether the attribute's value is fixed or
> > > whether there is a default. A second attribute, value,
> provides any
> > > value that is called for. To illustrate, po.xsd
> > > contains a declaration for the country attribute, which
> is declared with
> > > use and value values of fixed and US
> > > respectively. This declaration means that the appearance
> of a country
> > > attribute is optional, although its value
> > > must be US if it does appear, and if it does not appear, a schema
> > > processor will create a country attribute with
> > > this value.
> > > "
> > >
> > > This last line indicates that Xerces is doing the right
> thing, which is
> > > obviously not what we want to happen.
> > >
> > > With Michael Nelson, I have revised the XML Schema that involved a
> > > status attribute. The solution was less straightforward
> than one would
> > > hope. There is no simple way to express what we really
> would like to
> > > express: the status attribute may occur, and if it occurs
> its value must
> > > be "deleted". The workaround is to list legitimate
> values of the status
> > > attribute and specify a default. We chose the values to
> be "deleted"
> > > and "not deleted", with "not deleted" as the default.
> With this in
> > > place, one can express in a schema that the status
> attribute may appear,
> > > and that its default value (if the attribute does not
> appear) is "not
> > > deleted". One can also express that there is only one
> other legitimate
> > > value for status. It is "deleted". And this one must be
> specified,
> > > since it is not the default value.
> > >
> > > Using this approach nothing really changes for data providers (nor
> > > service providers, really). But I guess Xerces will do
> the right thing,
> > > now, add the default value of "not deleted" to all
> records that do not
> > > have the status attribute specified.
> > >
> > > The way to express the above approach in the schema is
> different for the
> > > Sep/Oct 2000 specs that we use and for the most recent
> XML specs. but
> > > that is another story, to be addressed later.
> > >
> > > I attach the edited xsd files. I will put them in place,
> unless someone
> > > disagrees with the approach taken.
> > >
> > > many greetings
> > >
> > > herbert
> > >
> > > "Jeffrey A. Young" wrote:
> > > >
> > > > Someone noticed that my OAIHarvester isn't working
> correctly lately. It
> > > > turns out that the Xerces XML parser is convinced that
> all the records I
> > > > harvest are flagged as status="deleted". Since this
> clearly isn't the case,
> > > > I started stripping the program down until I had a
> small example program
> > > > showing this effect. The Java source code is attached.
> Basically, if I do
> > > > DocumentBuilderFactory.setValidating(true) and then
> convert the XML to a DOM
> > > > Document, it silently "corrects" my records to
> status="deleted". If I dump
> > > > the Document, all looks fine, but when I actually query
> the status
> > > > attribute, it reports back with a value of "deleted".
> On the other hand, if
> > > > I specify setValidating(false), everything works fine.
> I suspect the problem
> > > > is that the XML Schema needs to make the status
> attribute optional. Another
> > > > possibility is that Xerces is processing the XML Schema
> incorrectly. I can
> > > > ignore the problem by always using
> setValidating(false), but that doesn't
> > > > seem right. If someone has a better solution, I would
> appreciate it. Thanks.
> > > >
> > > > Jeff
> > > >
> > > > ---
> > > > Jeffrey A. Young
> > > > Senior Consulting Systems Analyst
> > > > Office of Research, Mail Code 710
> > > > OCLC Online Computer Library Center, Inc.
> > > > 6565 Frantz Road
> > > > Dublin, OH 43017-3395
> > > > www.oclc.org
> > > >
> > > > Voice: 614-764-4342
> > > > Fax: 614-764-2344
> > > > Email: jyoung@oclc.org
> > > >
> > > >
> --------------------------------------------------------------
> ----------
> > > > Name: Test.java
> > > > Test.java Type: unspecified type
> (application/octet-stream)
> > > > Encoding: quoted-printable
> > >
> > > --
> > > Herbert Van de Sompel
> > > Visiting Assistant Professor
> > > Cornell University -- Computer Science
> > > tel + 1 - 607 - 255 - 3085
> > > fax + 1 - 607 - 255 - 4428
> > > http://www.cs.cornell.edu/people/herbertv/
> > > digital life in libraries used to be primitive
> > >
> > >
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
> > >
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
> > > elementFormDefault="qualified"
> > > attributeFormDefault="unqualified">
> > >
> > > <annotation>
> > > <documentation>
> > > Schema to verify validity of responses to GetRecord
> OAI-protocol request.
> > > This Schema validated at
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > > with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > > </documentation>
> > > </annotation>
> > >
> > > <element name="GetRecord" type="oai:GetRecordType"/>
> > >
> > > <!-- response to GetRecord-request -->
> > >
> > > <complexType name="GetRecordType">
> > > <sequence>
> > > <element name="responseDate" minOccurs="1"
> maxOccurs="1" type="timeInstant"/>
> > > <element name="requestURL" minOccurs="1"
> maxOccurs="1" type="string"/>
> > > <element name="record" minOccurs="0" maxOccurs="1"
> type="oai:recordType"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define recordType -->
> > > <!-- a record has a header and a metadata part -->
> > >
> > > <complexType name="recordType">
> > > <sequence>
> > > <element name="header" minOccurs="1" maxOccurs="1"
> type="oai:headerType"/>
> > > <element name="metadata" minOccurs="0" maxOccurs="1"
> type="oai:metadataType"/>
> > > <element name="about" minOccurs="0" maxOccurs="1"
> type="oai:aboutType"/>
> > > </sequence>
> > > <attribute name="status" use="default" value="not
> deleted" type="oai:statusType"/>
> > > </complexType>
> > >
> > > <!-- define headerType -->
> > > <!-- a header has a unique identifier and a datestamp -->
> > >
> > > <complexType name="headerType">
> > > <sequence>
> > > <element name="identifier" minOccurs="1"
> maxOccurs="1" type="uriReference"/>
> > > <element name="datestamp" minOccurs="1" maxOccurs="1"
> type="date"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define metadataType -->
> > > <!-- metadata must be expressed in XML that is compliant
> with another XML Schema -->
> > > <!-- metadata must be explicitely qualified in the response -->
> > >
> > > <complexType name="metadataType">
> > > <sequence>
> > > <any namespace="##any" processContents="lax"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define aboutType -->
> > > <!-- data "about" the record must be expressed in XML -->
> > > <!-- that is compliant with an XML Schema defined by a
> community -->
> > >
> > > <complexType name="aboutType">
> > > <sequence>
> > > <any namespace="##any" processContents="lax"
> minOccurs="0" maxOccurs="1"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define statusType -->
> > > <!-- a record can have a status of "deleted" or "not
> deleted". -->
> > >
> > > <simpleType name="statusType">
> > > <restriction base="string">
> > > <enumeration value="deleted"/>
> > > <enumeration value="not deleted"/>
> > > </restriction>
> > > </simpleType>
> > >
> > > </schema>
> > >
> > >
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
> > >
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
> > > elementFormDefault="qualified"
> > > attributeFormDefault="unqualified">
> > >
> > > <annotation>
> > > <documentation>
> > > Schema to verify validity of responses to
> ListRecords OAI-protocol request.
> > > This Schema validated at
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > > with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > > </documentation>
> > > </annotation>
> > >
> > > <element name="ListRecords" type="oai:ListRecordsType"/>
> > >
> > > <!-- response to ListRecords-request -->
> > > <!-- this response may contain an optional resumptionToken -->
> > >
> > > <complexType name="ListRecordsType">
> > > <sequence>
> > > <element name="responseDate" minOccurs="1"
> maxOccurs="1" type="timeInstant"/>
> > > <element name="requestURL" minOccurs="1"
> maxOccurs="1" type="string"/>
> > > <element name="record" minOccurs="0"
> maxOccurs="unbounded" type="oai:recordType"/>
> > > <element name="resumptionToken" minOccurs="0"
> maxOccurs="1" type="string"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define recordType -->
> > > <!-- a record has a header and a metadata part -->
> > >
> > > <complexType name="recordType">
> > > <sequence>
> > > <element name="header" minOccurs="1" maxOccurs="1"
> type="oai:headerType"/>
> > > <element name="metadata" minOccurs="0" maxOccurs="1"
> type="oai:metadataType"/>
> > > <element name="about" minOccurs="0" maxOccurs="1"
> type="oai:aboutType"/>
> > > </sequence>
> > > <attribute name="status" use="default" value="not
> deleted" type="oai:statusType"/>
> > > </complexType>
> > >
> > > <!-- define headerType -->
> > > <!-- a header has a unique identifier and a datestamp -->
> > >
> > > <complexType name="headerType">
> > > <sequence>
> > > <element name="identifier" minOccurs="1"
> maxOccurs="1" type="uriReference"/>
> > > <element name="datestamp" minOccurs="1"
> maxOccurs="1" type="date"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define metadataType -->
> > > <!-- metadata must be expressed in XML that complies
> with another XML Schema -->
> > > <!-- metadata must be explicitely qualified in the response -->
> > >
> > > <complexType name="metadataType">
> > > <sequence>
> > > <any namespace="##any" processContents="lax"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define aboutType -->
> > > <!-- data "about" the record must be expressed in XML -->
> > > <!-- that is compliant with an XML Schema defined by a
> community -->
> > >
> > > <complexType name="aboutType">
> > > <sequence>
> > > <any namespace="##any" processContents="lax"
> minOccurs="0" maxOccurs="1"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <!-- define statusType -->
> > > <!-- a record can have a status of "deleted" or "not
> deleted". -->
> > >
> > > <simpleType name="statusType">
> > > <restriction base="string">
> > > <enumeration value="deleted"/>
> > > <enumeration value="not deleted"/>
> > > </restriction>
> > > </simpleType>
> > >
> > > </schema>
> > >
> > >
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
> > >
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListI
> dentifiers"
> > > elementFormDefault="qualified"
> > > attributeFormDefault="unqualified">
> > >
> > > <annotation>
> > > <documentation>
> > > Schema to verify validity of responses to
> ListIdentifiers OAI-protocol request.
> > > This Schema validated at
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > > with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > > </documentation>
> > > </annotation>
> > >
> > > <element name="ListIdentifiers" type="oai:ListIdentifiersType"/>
> > >
> > > <!-- response to ListIdentifiers-request -->
> > > <!-- records have an optional "deleted" status -->
> > > <!-- this response may contain an optional resumptionToken -->
> > >
> > > <complexType name="ListIdentifiersType">
> > > <sequence>
> > > <element name="responseDate" minOccurs="1"
> maxOccurs="1" type="timeInstant"/>
> > > <element name="requestURL" minOccurs="1"
> maxOccurs="1" type="string"/>
> > > <element ref="oai:identifier" minOccurs="0"
> maxOccurs="unbounded"/>
> > > <element name="resumptionToken" minOccurs="0"
> maxOccurs="1" type="string"/>
> > > </sequence>
> > > </complexType>
> > >
> > > <element name="identifier">
> > > <complexType>
> > > <simpleContent>
> > > <extension base="uriReference">
> > > <attribute name="status" use="default" value="not
> deleted" type="oai:statusType"/>
> > > </extension>
> > > </simpleContent>
> > > </complexType>
> > > </element>
> > >
> > > <!-- define statusType -->
> > > <!-- a record can have a status of "deleted" or "not
> deleted". -->
> > >
> > > <simpleType name="statusType">
> > > <restriction base="string">
> > > <enumeration value="deleted"/>
> > > <enumeration value="not deleted"/>
> > > </restriction>
> > > </simpleType>
> > >
> > > </schema>
> >
> > --
> > Thomas G. Habing
> > Research Programmer, Digital Library Initiative
> > University of Illinois at Urbana-Champaign
> > 052 Grainger Engineering Library, MC-274
> > thabing@uiuc.edu, (217) 244-7809
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
> --
> Herbert Van de Sompel
> Visiting Assistant Professor
> Cornell University -- Computer Science
> tel + 1 - 607 - 255 - 3085
> fax + 1 - 607 - 255 - 4428
> http://www.cs.cornell.edu/people/herbertv/
> digital life in libraries used to be primitive
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>