[OAI-implementers] OAI identifier resolver
Patrick CH Hochstenbach
Patrick Hochstenbach <hochsten@lanl.gov>
Mon, 20 Oct 2003 21:39:17 -0600 (MDT)
A MD5 hash is an incredible safe hash to use. It would take you
2^64 unique identifiers to come up with a collision. This will take
you about 585 years if you allow for a creation of about 1,000,000,000
identifiers per second.
Patrick
On Mon, 20 Oct 2003, Xiaoming Liu wrote:
>
> On Mon, 20 Oct 2003, Lonnie D. Harvel wrote:
>
> >
> > I am in favor of just the URL:[collection name] approach. Why make it
> > more complicated than necessary? URL's are unique. Is there a particular
> > reason why it needs to be shorter?
>
> This is back to the problem why we need a resolver. If both baseURL and
> record identifier are supplied, it doesn't make a lot sense to develop a
> resolver. I think the motivation is to provide a "cool" URL for each
> record, and make it easy to exchange information by REST model.
>
> OAI has no centralized mechanism to maintain unique repository name, it's
> either done by one centralized registry -- like UIUC registry, or done
> by a distributed way -- like hashing baseURL or other better ways. In the
> distributed way, I can add a link to Purl-OAI resolver without prior
> knowledge of how repository name is maintained in Purl-OAI resolver.
> That's my reason of favoring distributed method.
>
> xiaoming
>
>
>
>
>
> >
> > Adam Farquhar wrote:
> >
> > > Xiaoming,
> > >
> > > Selecting an approach that will be certain to fail, but unpredictably,
> > > is not a good 'engineering' approach, especially when there are other
> > > approaches that do not fail. For example, taking a base64 encoding of
> > > the base URL or just using the base URL itself will both provide a
> > > unique identifier.
> > >
> > > Adam.
> > >
> > >>>Hash algorithms such as MD5 or CRC32 cannot be used to generate unique
> > >>>identifiers. These algorithms will occasionally produce the same output for
> > >>>different input strings (this is why hash tables require a mechanism for dealing
> > >>>with collisions). Common approaches to generating unique identifiers use some
> > >>>sort of a registration mechanism to appropriately partition the space of possible
> > >>>values. Successful ones will leverage an existing registration mechanism, such
> > >>>as DNS.
> > >>>
> > >>>
> > >>
> > >>I agree hash algorithm is not a "perfect" way to generate unique
> > >>identifier for a repository, but it may be acceptable in engineering
> > >>perspect, the collision possibility will be pretty low in current scale of oai data
> > >>providers (<500?).
> > >>
> > >>I think the basic problem is how to render OAI baseURL to a shorter,
> > >>readable string in non-collision way. The algorithm should be repeatable
> > >>-- Anyone can use same algorithm to generate same output given a baseURL.
> > >>I will be glad to see other approaches.
> > >>
> > >>
> > >>
> > > _______________________________________________ OAI-implementers
> > > mailing list List information, archives, preferences and to
> > > unsubscribe:
> > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> >
> >
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
--
Patrick Hochstenbach -------------------, ,==. ,---------- PO Box 1663
Los Alamos National Laboratory / /@ | / Los Alamos
Research Library, MS P362 / /_ < / New Mexico 87544-7113
+1 (505) 665 1475 -------------------' =" `g' '-------- hochsten@lanl.gov