[OAI-implementers] issues with OAI-PMH specifications for OAI-Provider implementations using a cache

Hussein Suleman hussein at cs.uct.ac.za
Tue Jun 2 09:46:09 EDT 2009


hi Rozita

if you use a purpose-built cache, hopefully you can update the datestamp 
in the cache so the datestamps of the cache are used to answer queries 
instead of the original datestamps ... if you do this, you will not have 
a problem, and i do believe this is the recommend OAI-PMH usage for 
hierarchical/intermediate systems (i am sure it is written down 
somewhere but i cant recall where)

then, regarding cache downtime, i was going to say what Simeon has just 
written regarding using multiple 503s ...

(a day granularity may be restrictive, but it does depend on specifics 
of your application)

regarding the metadata issue, the reason for the requirement is so that 
metadata records are self-contained and can be stored, verified and 
moved around without losing namespace information. this requirement 
exists to some degree because OAI-PMH was designed in the early and 
somewhat "wild-west" days of XML when XML parsers were not very 
namespace-aware ... although i should add that even today if you 
programmatically extract an XML sub-tree with many parsing tools, you 
will still not have have fully validifiable (valid?) XML unless 
namespace information is in the inner tags ... so it is all about 
maintaining verification information within records come what may ...

ttfn,
----hussein

=====================================================================
hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================


Fridman, Rozita wrote:
> Hello all,
> 
> we developed an OAI-Provider for Escidoc repositories.
> Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses a
> cache to reduce a response time. Escidoc repositories intend to contain
> multiple millions of objects. The Escidoc-Core framework only requires
> that objects metadata stored in a Escidoc repository are well formed
> xml-structures. Therefore using of a cache in the Escidoc-OAI-Provider
> is essential to ensure validness of metadata in OAI-PMH response and an
> acceptable response time. 
> 
> But the current OAI-PMH protocol specification doesn't account for some
> issues, caused by the employment of a cache.
>  
> The main problem is a time lag between a harvester request and a last
> cache update:
> A harvester asks the OAI-Provider for all records that have changed
> between T0 and T2 in the underlying repository. The last cache update
> was at T1.The harvester gets records that have changed between T0 and
> T1, but assumes that it got all changes between T0 and T2. Therefore in
> the next request it asks for records that have changed between T2 and T3
> and is missing all changes between T1 and T2. If cache update interval
> is long and the next cache update takes place after T3, the harvester is
> also missing all changes between T2 and T3 and so on.
>    
> One proposal would be to put a date stamp of the last cache update into
> the OAI-PMH response, in order to inform a harvester about possibly
> missed records. 
> 
> Does anybody face the same problem? What do you think about it? Maybe
> there are better solutions for this problem?
> 
> The other issue is that depending on the OAI-Provider implementation a
> cache may be in an inconsistent state while a cache update process is
> running. Are there means in the OAI-PMH protocol to respond to harvester
> requests during a cache update? A possible solution would be to respond
> with a HTTP-status code 503-Service unavailable (section 3.1.2.2 of the
> specification), but the problem is to specify Retry-After period. A
> duration of the cache update is not constant, it depends on the changes
> in the repository.
> 
> Thanks a lot,
> Rozita
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> -------------------------------------------------------
> 
> Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH. 
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
> Geschäftsführerin: Sabine Brünger-Weilandt. 
> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
> 



More information about the OAI-implementers mailing list