[OAI-implementers] XML Schema problem?

Young,Jeff jyoung@oclc.org
Mon, 23 Apr 2001 11:42:05 -0400


Works fine for me. I like it.

Jeff

> -----Original Message-----
> From: herbert van de sompel [mailto:herbertv@cs.cornell.edu]
> Sent: Monday, April 23, 2001 11:21 AM
> To: thabing@uiuc.edu; OAI-implementers
> Subject: Re: [OAI-implementers] XML Schema problem?
> 
> 
> hi Thomas,
> 
> thanks for this. this approach sounds good to me:
> 
> * it validates with XSV
> 
> * it would be nice if Jeff could test this approach in Xerces
> 
> * I checked this with the most recent XML Schema specs, and the nice
> thing about it is that this approach would not require any 
> changes to be
> made when moving over at a certain point.
> 
> If Jeff comes back with a positive message, I suggest to go 
> for Thomas'
> approach.
> 
> herbert
> 
> 
> 
> "Thomas G. Habing" wrote:
> > 
> > Herbert-
> > 
> > In the XSDs, wouldn't it be simpler to change use attribute 
> value in the
> > status attribute declaration to "optional" (delete the 
> value attribute) and
> > then tie it's type to an enumerated list that only allows the value
> > "deleted".  With no value attribute and the use attribute 
> set to optional
> > (as opposed to default or fixed) in the status attribute 
> declaration, the
> > parser shouldn't assume a value.  The enumerated list still 
> restricts the
> > allowable values of the status attribute in document 
> instances.  This seems
> > to work in other parsers but we've not tried it in Xerces.  
> Here's the
> > attribute declaration as we're suggesting:
> > 
> >   <complexType name="recordType">
> >    <sequence>
> >      <element name="header" minOccurs="1" maxOccurs="1"
> > type="oai:headerType"/>
> >      <element name="metadata" minOccurs="0" maxOccurs="1"
> > type="oai:metadataType"/>
> >      <element name="about" minOccurs="0" maxOccurs="1"
> > type="oai:aboutType"/>
> >    </sequence>
> >      <attribute name="status" use="optional" type="oai:statusType"/>
> >   </complexType>
> > 
> >  ...
> > 
> >   <simpleType name="statusType">
> >     <restriction base="string">
> >      <enumeration value="deleted"/>
> >     </restriction>
> >    </simpleType>
> > 
> > Tim Cole
> > Tom Habing
> > University of Illinois
> > 
> > herbert van de sompel wrote:
> > >
> > > hi Jeff,
> > >
> > > Thanks for this.  Your consideration is correct, there is 
> a problem in
> > > the schema that use the "status" attribute.  That is GetRecord,
> > > ListRecords and ListIdentifiers.
> > >
> > > This is what the September 2000 schema specs say re specifying
> > > occurencies of attributes.  In the excerpt that I 
> include, reference is
> > > made to the following declaration in an xsd file:
> > >
> > > <xsd:attribute name="country" type="xsd:NMTOKEN" use="fixed"
> > > value="US"/>
> > >
> > > "
> > > Attributes may appear once or not at all (the default), and so the
> > > syntax for specifying occurrences of attributes
> > > is different than the syntax for elements. In particular, a use
> > > attribute is used in an attribute declaration to
> > > indicate whether the attribute is required or optional, 
> and if optional
> > > whether the attribute's value is fixed or
> > > whether there is a default. A second attribute, value, 
> provides any
> > > value that is called for. To illustrate, po.xsd
> > > contains a declaration for the country attribute, which 
> is declared with
> > > use and value values of fixed and US
> > > respectively. This declaration means that the appearance 
> of a country
> > > attribute is optional, although its value
> > > must be US if it does appear, and if it does not appear, a schema
> > > processor will create a country attribute with
> > > this value.
> > > "
> > >
> > > This last line indicates that Xerces is doing the right 
> thing, which is
> > > obviously not what we want to happen.
> > >
> > > With Michael Nelson, I have revised the XML Schema that involved a
> > > status attribute.  The solution was less straightforward 
> than one would
> > > hope.  There is no simple way to express what we really 
> would like to
> > > express: the status attribute may occur, and if it occurs 
> its value must
> > > be "deleted".  The workaround is to list legitimate 
> values of the status
> > > attribute and specify a default.  We chose the values to 
> be "deleted"
> > > and "not deleted", with "not deleted" as the default.  
> With this in
> > > place, one can express in a schema that the status 
> attribute may appear,
> > > and that its default value (if the attribute does not 
> appear) is "not
> > > deleted".  One can also express that there is only one 
> other legitimate
> > > value for status.  It is "deleted".  And this one must be 
> specified,
> > > since it is not the default value.
> > >
> > > Using this approach nothing really changes for data providers (nor
> > > service providers, really).  But I guess Xerces will do 
> the right thing,
> > > now, add the default value of "not deleted" to all 
> records that do not
> > > have the status attribute specified.
> > >
> > > The way to express the above approach in the schema is 
> different for the
> > > Sep/Oct 2000 specs that we use and for the most recent 
> XML specs.  but
> > > that is another story, to be addressed later.
> > >
> > > I attach the edited xsd files.  I will put them in place, 
> unless someone
> > > disagrees with the approach taken.
> > >
> > > many greetings
> > >
> > > herbert
> > >
> > > "Jeffrey A. Young" wrote:
> > > >
> > > > Someone noticed that my OAIHarvester isn't working 
> correctly lately. It
> > > > turns out that the Xerces XML parser is convinced that 
> all the records I
> > > > harvest are flagged as status="deleted". Since this 
> clearly isn't the case,
> > > > I started stripping the program down until I had a 
> small example program
> > > > showing this effect. The Java source code is attached. 
> Basically, if I do
> > > > DocumentBuilderFactory.setValidating(true) and then 
> convert the XML to a DOM
> > > > Document, it silently "corrects" my records to 
> status="deleted". If I dump
> > > > the Document, all looks fine, but when I actually query 
> the status
> > > > attribute, it reports back with a value of "deleted". 
> On the other hand, if
> > > > I specify setValidating(false), everything works fine. 
> I suspect the problem
> > > > is that the XML Schema needs to make the status 
> attribute optional. Another
> > > > possibility is that Xerces is processing the XML Schema 
> incorrectly. I can
> > > > ignore the problem by always using 
> setValidating(false), but that doesn't
> > > > seem right. If someone has a better solution, I would 
> appreciate it. Thanks.
> > > >
> > > > Jeff
> > > >
> > > > ---
> > > > Jeffrey A. Young
> > > > Senior Consulting Systems Analyst
> > > > Office of Research, Mail Code 710
> > > > OCLC Online Computer Library Center, Inc.
> > > > 6565 Frantz Road
> > > > Dublin, OH   43017-3395
> > > > www.oclc.org
> > > >
> > > > Voice:  614-764-4342
> > > > Fax:            614-764-2344
> > > > Email:  jyoung@oclc.org
> > > >
> > > >   
> --------------------------------------------------------------
> ----------
> > > >                 Name: Test.java
> > > >    Test.java    Type: unspecified type 
> (application/octet-stream)
> > > >             Encoding: quoted-printable
> > >
> > > --
> > > Herbert Van de Sompel
> > > Visiting Assistant Professor
> > > Cornell University -- Computer Science
> > > tel + 1 - 607 - 255 - 3085
> > > fax + 1 - 607 - 255 - 4428
> > > http://www.cs.cornell.edu/people/herbertv/
> > > digital life in libraries used to be primitive
> > >
> > >   
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >          
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
> > >          
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_GetRecord"
> > >          elementFormDefault="qualified"
> > >          attributeFormDefault="unqualified">
> > >
> > >  <annotation>
> > >   <documentation>
> > >     Schema to verify validity of responses to GetRecord 
> OAI-protocol request.
> > >     This Schema validated at 
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > >     with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > >   </documentation>
> > >  </annotation>
> > >
> > >  <element name="GetRecord" type="oai:GetRecordType"/>
> > >
> > >  <!-- response to GetRecord-request -->
> > >
> > >  <complexType name="GetRecordType">
> > >   <sequence>
> > >     <element name="responseDate" minOccurs="1" 
> maxOccurs="1" type="timeInstant"/>
> > >     <element name="requestURL" minOccurs="1" 
> maxOccurs="1" type="string"/>
> > >     <element name="record" minOccurs="0" maxOccurs="1" 
> type="oai:recordType"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <!-- define recordType -->
> > >  <!-- a record has a header and a metadata part -->
> > >
> > >  <complexType name="recordType">
> > >   <sequence>
> > >     <element name="header" minOccurs="1" maxOccurs="1" 
> type="oai:headerType"/>
> > >     <element name="metadata" minOccurs="0" maxOccurs="1" 
> type="oai:metadataType"/>
> > >     <element name="about" minOccurs="0" maxOccurs="1" 
> type="oai:aboutType"/>
> > >   </sequence>
> > >     <attribute name="status" use="default" value="not 
> deleted" type="oai:statusType"/>
> > >  </complexType>
> > >
> > >  <!-- define headerType -->
> > >  <!-- a header has a unique identifier and a datestamp -->
> > >
> > >  <complexType name="headerType">
> > >   <sequence>
> > >     <element name="identifier" minOccurs="1" 
> maxOccurs="1" type="uriReference"/>
> > >     <element name="datestamp" minOccurs="1" maxOccurs="1" 
> type="date"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <!-- define metadataType -->
> > >  <!-- metadata must be expressed in XML that is compliant 
> with another XML Schema -->
> > >  <!-- metadata must be explicitely qualified in the response -->
> > >
> > >  <complexType name="metadataType">
> > >   <sequence>
> > >    <any namespace="##any" processContents="lax"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <!-- define aboutType -->
> > >  <!-- data "about" the record must be expressed in XML -->
> > >  <!-- that is compliant with an XML Schema defined by a 
> community -->
> > >
> > >  <complexType name="aboutType">
> > >   <sequence>
> > >    <any namespace="##any" processContents="lax" 
> minOccurs="0" maxOccurs="1"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <!-- define statusType -->
> > >  <!-- a record can have a status of "deleted" or "not 
> deleted". -->
> > >
> > >  <simpleType name="statusType">
> > >    <restriction base="string">
> > >     <enumeration value="deleted"/>
> > >     <enumeration value="not deleted"/>
> > >    </restriction>
> > >   </simpleType>
> > >
> > >  </schema>
> > >
> > >   
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >           
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
> > >           
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListRecords"
> > >           elementFormDefault="qualified"
> > >           attributeFormDefault="unqualified">
> > >
> > >   <annotation>
> > >    <documentation>
> > >      Schema to verify validity of responses to 
> ListRecords OAI-protocol request.
> > >     This Schema validated at 
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > >     with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > >    </documentation>
> > >   </annotation>
> > >
> > >   <element name="ListRecords" type="oai:ListRecordsType"/>
> > >
> > >   <!-- response to ListRecords-request -->
> > >   <!-- this response may contain an optional resumptionToken -->
> > >
> > >   <complexType name="ListRecordsType">
> > >    <sequence>
> > >      <element name="responseDate" minOccurs="1" 
> maxOccurs="1" type="timeInstant"/>
> > >      <element name="requestURL" minOccurs="1" 
> maxOccurs="1" type="string"/>
> > >      <element name="record" minOccurs="0" 
> maxOccurs="unbounded" type="oai:recordType"/>
> > >      <element name="resumptionToken" minOccurs="0" 
> maxOccurs="1" type="string"/>
> > >    </sequence>
> > >    </complexType>
> > >
> > >   <!-- define recordType -->
> > >   <!-- a record has a header and a metadata part -->
> > >
> > >   <complexType name="recordType">
> > >    <sequence>
> > >      <element name="header" minOccurs="1" maxOccurs="1" 
> type="oai:headerType"/>
> > >      <element name="metadata" minOccurs="0" maxOccurs="1" 
> type="oai:metadataType"/>
> > >      <element name="about" minOccurs="0" maxOccurs="1" 
> type="oai:aboutType"/>
> > >    </sequence>
> > >     <attribute name="status" use="default" value="not 
> deleted" type="oai:statusType"/>
> > >   </complexType>
> > >
> > >   <!-- define headerType -->
> > >   <!-- a header has a unique identifier and a datestamp -->
> > >
> > >   <complexType name="headerType">
> > >    <sequence>
> > >      <element name="identifier" minOccurs="1" 
> maxOccurs="1" type="uriReference"/>
> > >      <element name="datestamp" minOccurs="1" 
> maxOccurs="1" type="date"/>
> > >    </sequence>
> > >   </complexType>
> > >
> > >   <!-- define metadataType -->
> > >   <!-- metadata must be expressed in XML that complies 
> with another XML Schema -->
> > >   <!-- metadata must be explicitely qualified in the response -->
> > >
> > >   <complexType name="metadataType">
> > >    <sequence>
> > >     <any namespace="##any" processContents="lax"/>
> > >    </sequence>
> > >   </complexType>
> > >
> > >  <!-- define aboutType -->
> > >  <!-- data "about" the record must be expressed in XML -->
> > >  <!-- that is compliant with an XML Schema defined by a 
> community -->
> > >
> > >  <complexType name="aboutType">
> > >   <sequence>
> > >    <any namespace="##any" processContents="lax" 
> minOccurs="0" maxOccurs="1"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <!-- define statusType -->
> > >  <!-- a record can have a status of "deleted" or "not 
> deleted". -->
> > >
> > >  <simpleType name="statusType">
> > >    <restriction base="string">
> > >     <enumeration value="deleted"/>
> > >     <enumeration value="not deleted"/>
> > >    </restriction>
> > >   </simpleType>
> > >
> > > </schema>
> > >
> > >   
> --------------------------------------------------------------
> --------------
> > > <schema xmlns="http://www.w3.org/2000/10/XMLSchema"
> > >          
> xmlns:oai="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers"
> > >          
> targetNamespace="http://www.openarchives.org/OAI/1.0/OAI_ListI
> dentifiers"
> > >          elementFormDefault="qualified"
> > >          attributeFormDefault="unqualified">
> > >
> > >  <annotation>
> > >   <documentation>
> > >     Schema to verify validity of responses to 
> ListIdentifiers OAI-protocol request.
> > >     This Schema validated at 
> http://www.w3.org/2000/09/webdata/xsv on 2001-04-22
> > >     with XSV 1.176/1.87 of 2001/02/16 16:38:43
> > >   </documentation>
> > >  </annotation>
> > >
> > >  <element name="ListIdentifiers" type="oai:ListIdentifiersType"/>
> > >
> > >  <!-- response to ListIdentifiers-request -->
> > >  <!-- records have an optional "deleted" status -->
> > >  <!-- this response may contain an optional resumptionToken -->
> > >
> > >  <complexType name="ListIdentifiersType">
> > >   <sequence>
> > >     <element name="responseDate" minOccurs="1" 
> maxOccurs="1" type="timeInstant"/>
> > >     <element name="requestURL" minOccurs="1" 
> maxOccurs="1" type="string"/>
> > >     <element ref="oai:identifier" minOccurs="0" 
> maxOccurs="unbounded"/>
> > >     <element name="resumptionToken" minOccurs="0" 
> maxOccurs="1" type="string"/>
> > >   </sequence>
> > >  </complexType>
> > >
> > >  <element name="identifier">
> > >   <complexType>
> > >    <simpleContent>
> > >     <extension base="uriReference">
> > >      <attribute name="status" use="default" value="not 
> deleted" type="oai:statusType"/>
> > >     </extension>
> > >    </simpleContent>
> > >   </complexType>
> > >  </element>
> > >
> > >  <!-- define statusType -->
> > >  <!-- a record can have a status of "deleted" or "not 
> deleted". -->
> > >
> > >  <simpleType name="statusType">
> > >    <restriction base="string">
> > >     <enumeration value="deleted"/>
> > >     <enumeration value="not deleted"/>
> > >    </restriction>
> > >   </simpleType>
> > >
> > >  </schema>
> > 
> > --
> > Thomas G. Habing
> > Research Programmer, Digital Library Initiative
> > University of Illinois at Urbana-Champaign
> > 052 Grainger Engineering Library, MC-274
> > thabing@uiuc.edu, (217) 244-7809
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 
> -- 
> Herbert Van de Sompel
> Visiting Assistant Professor
> Cornell University -- Computer Science
> tel + 1 - 607 - 255 - 3085
> fax + 1 - 607 - 255 - 4428
> http://www.cs.cornell.edu/people/herbertv/
> digital life in libraries used to be primitive
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>