[OAI-implementers] Re: [OAI-PMH] an error in a regular expression
describing an OAI Identifier
Simeon Warner
simeon at cs.cornell.edu
Tue May 22 17:21:10 EDT 2007
Hi All,
Agnieszka Lewandowska has pointed out an error in the patterns matching
domain names in the oai-identifier.xsd schema (message excerpt below, the
current schema doesn't permit single letter subdomain names). Such names
should be permitted (see: http://www.ietf.org/rfc/rfc1035.txt) so I
propose the following updates:
in definition of repositoryIdentifierType:
< <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
---
> <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+"/>
and in definition of sampleIdentifierType:
< <pattern value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>
--
> <pattern value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>
I have put the updated schema online at
http://openarchives.org/OAI/2.0/oai-identifier.xsd.2007-05-22
and there is a test instance at
http://openarchives.org/OAI/2.0/oai-identifier-test4.xml
This change should not invalidate any currently valid instance. Unless
someone points out an error I will update the live schema version in a
week or two.
Cheers,
Simeon
(For the really pedantic, the schema pattern is too broad in that it
permits a subdomain name ending in a hyphen (e.g "a-.com") which is not
valid according to RFC1035. Correcting this would make the patterns more
complicated and and I think it probably isn't worth it to change to
something like
<pattern value="[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?)+">
However, we could do this is people see value in it.)
On Tue, 22 May 2007, Agnieszka Lewandowska wrote:
> In one of documents describing the format of the OAI Identifier might be
> an error. The regular expression
> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"
>
> from a file under URI:
>
> http://www.openarchives.org/OAI/2.0/oai-identifier.xsd
>
> do not validate a proper URL address: 'ebipol.p.lodz.pl' (the part with
> '.p' is causing an error). Furthermore a regular expression from a site
>
> http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
>
> (especially point 2.1) enables URL 'ebibpol.p.lodz.pl'. After a little
> change in the regular expression:
>
> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"
>
> it works.
More information about the OAI-implementers
mailing list