[OAI-implementers] issues with OAI-PMH specifications for
OAI-Provider implementations using a cache
Michael Nelson
mln at cs.odu.edu
Tue Jun 2 11:21:39 EDT 2009
On Tue, 2 Jun 2009, Hussein Suleman wrote:
> hi Rozita
>
> if you use a purpose-built cache, hopefully you can update the datestamp in
> the cache so the datestamps of the cache are used to answer queries instead
> of the original datestamps ... if you do this, you will not have a problem,
> and i do believe this is the recommend OAI-PMH usage for
> hierarchical/intermediate systems (i am sure it is written down somewhere but
> i cant recall where)
I just realized that my response was essentially the same as Hussein's
here -- I should have sent my mesg in reply & support of this one.
regards,
Michael
>
> then, regarding cache downtime, i was going to say what Simeon has just
> written regarding using multiple 503s ...
>
> (a day granularity may be restrictive, but it does depend on specifics of
> your application)
>
> regarding the metadata issue, the reason for the requirement is so that
> metadata records are self-contained and can be stored, verified and moved
> around without losing namespace information. this requirement exists to some
> degree because OAI-PMH was designed in the early and somewhat "wild-west"
> days of XML when XML parsers were not very namespace-aware ... although i
> should add that even today if you programmatically extract an XML sub-tree
> with many parsing tools, you will still not have have fully validifiable
> (valid?) XML unless namespace information is in the inner tags ... so it is
> all about maintaining verification information within records come what may
> ...
>
> ttfn,
> ----hussein
>
> =====================================================================
> hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
> =====================================================================
>
>
> Fridman, Rozita wrote:
>> Hello all,
>>
>> we developed an OAI-Provider for Escidoc repositories.
>> Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses a
>> cache to reduce a response time. Escidoc repositories intend to contain
>> multiple millions of objects. The Escidoc-Core framework only requires
>> that objects metadata stored in a Escidoc repository are well formed
>> xml-structures. Therefore using of a cache in the Escidoc-OAI-Provider
>> is essential to ensure validness of metadata in OAI-PMH response and an
>> acceptable response time.
>> But the current OAI-PMH protocol specification doesn't account for some
>> issues, caused by the employment of a cache.
>> The main problem is a time lag between a harvester request and a last
>> cache update:
>> A harvester asks the OAI-Provider for all records that have changed
>> between T0 and T2 in the underlying repository. The last cache update
>> was at T1.The harvester gets records that have changed between T0 and
>> T1, but assumes that it got all changes between T0 and T2. Therefore in
>> the next request it asks for records that have changed between T2 and T3
>> and is missing all changes between T1 and T2. If cache update interval
>> is long and the next cache update takes place after T3, the harvester is
>> also missing all changes between T2 and T3 and so on.
>> One proposal would be to put a date stamp of the last cache update into
>> the OAI-PMH response, in order to inform a harvester about possibly
>> missed records.
>> Does anybody face the same problem? What do you think about it? Maybe
>> there are better solutions for this problem?
>>
>> The other issue is that depending on the OAI-Provider implementation a
>> cache may be in an inconsistent state while a cache update process is
>> running. Are there means in the OAI-PMH protocol to respond to harvester
>> requests during a cache update? A possible solution would be to respond
>> with a HTTP-status code 503-Service unavailable (section 3.1.2.2 of the
>> specification), but the problem is to specify Retry-After period. A
>> duration of the cache update is not constant, it depends on the changes
>> in the repository.
>>
>> Thanks a lot,
>> Rozita
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>
>>
>> -------------------------------------------------------
>>
>> Fachinformationszentrum Karlsruhe, Gesellschaft für
>> wissenschaftlich-technische Information mbH. Sitz der Gesellschaft:
>> Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892.
>> Geschäftsführerin: Sabine Brünger-Weilandt. Vorsitzender des Aufsichtsrats:
>> MinR Hermann Riehl.
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> OAI-implementers mailing list
>> List information, archives, preferences and to unsubscribe:
>> http://www.openarchives.org/mailman/listinfo/oai-implementers
>>
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
>
----
Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)
More information about the OAI-implementers
mailing list