[OAI-implementers] Re: [OAI-PMH] an error in a regular expression
describing an OAI Identifier
Simeon Warner
simeon at cs.cornell.edu
Thu Jun 7 15:50:03 EDT 2007
I have updated the oai-identifier schema as below. Current schema
http://openarchives.org/OAI/2.0/oai-identifier.xsd
Previous version for reference at
http://openarchives.org/OAI/2.0/oai-identifier.2002-06-21.xsd
Updated test instance which points at current schema
http://openarchives.org/OAI/2.0/oai-identifier-test4.xml
Cheers,
Simeon
On Tue, 22 May 2007, Simeon Warner wrote:
> Hi All,
>
> Agnieszka Lewandowska has pointed out an error in the patterns matching
> domain names in the oai-identifier.xsd schema (message excerpt below, the
> current schema doesn't permit single letter subdomain names). Such names
> should be permitted (see: http://www.ietf.org/rfc/rfc1035.txt) so I propose
> the following updates:
>
> in definition of repositoryIdentifierType:
>
> < <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
> ---
>> <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+"/>
>
> and in definition of sampleIdentifierType:
>
> < <pattern
> value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>
> --
>> <pattern
>> value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>
>
> I have put the updated schema online at
> http://openarchives.org/OAI/2.0/oai-identifier.xsd.2007-05-22
> and there is a test instance at
> http://openarchives.org/OAI/2.0/oai-identifier-test4.xml
>
> This change should not invalidate any currently valid instance. Unless
> someone points out an error I will update the live schema version in a week
> or two.
>
> Cheers,
> Simeon
>
>
>
> (For the really pedantic, the schema pattern is too broad in that it permits
> a subdomain name ending in a hyphen (e.g "a-.com") which is not valid
> according to RFC1035. Correcting this would make the patterns more
> complicated and and I think it probably isn't worth it to change to something
> like
> <pattern
> value="[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?)+">
> However, we could do this is people see value in it.)
>
>
> On Tue, 22 May 2007, Agnieszka Lewandowska wrote:
>> In one of documents describing the format of the OAI Identifier might be an
>> error. The regular expression
>> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"
>>
>> from a file under URI:
>>
>> http://www.openarchives.org/OAI/2.0/oai-identifier.xsd
>>
>> do not validate a proper URL address: 'ebipol.p.lodz.pl' (the part with
>> '.p' is causing an error). Furthermore a regular expression from a site
>>
>> http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
>>
>> (especially point 2.1) enables URL 'ebibpol.p.lodz.pl'. After a little
>> change in the regular expression:
>>
>> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"
>>
>> it works.
>
More information about the OAI-implementers
mailing list