[OAI-implementers] new records in combination with a
resumptionToken
Hussein Suleman
hussein@vt.edu
Wed, 23 May 2001 20:32:54 -0400
This is a multi-part message in MIME format.
--Boundary_(ID_vh2fTxevOinGESn14hp4jw)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7bit
hi
this is an interesting problem so im going to share some of our
discussions here at virginia tech that are relevant to this problem ...
of course there is no general solution since i think the OAI quite
deftly avoided handling too much complication in the protocol ... that
said, there are two very interesting "solutions", one of which is
probably relevant to you:
firstly, i recall a while back someone (cant remember who) related how
they implemented the protocol by making a temporary table to support
resumptions ... this would probably solve your problem but would require
a bit more work ...
the alternative is to consider how service providers work (at least this
is how we thought it through when building our first experimental
harvester):
a) since you can always add records at any time during the day and the
granularity of harvesting is a day, you cannot trust data you got on the
same day.
b) since dates are local to different timezones, if the data provider is
west of the service provider, asking for everything up until yesterday
is not "interoperationally stable" since it could still be yesterday at
the data provider.
now there are multiple solutions to this and we tried implementing some:
a) dont get anything newer than 2 days old
b) always ask for a 2 day overlap ending on the current date
c) use a 1-day overlap and operate in the timezone of the data provider
(extract an initial responseDate from the data provider and then
increment locally)
as far as we can figure, any service provider that wants to avoid
missing data entries has to do something like this ... since new data is
not "stable" for harvesting it is not trusted and/or not harvested
immediately and your problem of database updates pretty much disappears
as long as harvesting is by date (which i trust it almost always is)
ok, i know this is probably way too much detail for this question :) but
i just wanted to share these thoughts to see if they aligned with the
harvesting approaches used by other people building service provider
interfaces ...
any further comments will be appreciated ...
ttfn
----hussein
--
========================================================================
hussein suleman -- hussein@vt.edu -- vtcs -- http://purl.org/net/hussein
========================================================================
--Boundary_(ID_vh2fTxevOinGESn14hp4jw)
Content-type: text/x-vcard; name=hussein.vcf; charset=us-ascii
Content-description: Card for Hussein Suleman
Content-disposition: attachment; filename=hussein.vcf
Content-transfer-encoding: 7bit
begin:vcard
n:Suleman;Hussein
tel;work:+15402313615
x-mozilla-html:FALSE
url:http://purl.org/net/hussein
org:Virginia Tech;Digital Libraries Research Laboratory
version:2.1
email;internet:hussein@vt.edu
adr;quoted-printable:;;2030 Torgerson Hall=0D=0A;Blacksburg;Virginia;24060;United States of America
note;quoted-printable:http://www.dlib.vt.edu=0D=0Ahttp://www.vt.edu=0D=0A
fn:Hussein Suleman
end:vcard
--Boundary_(ID_vh2fTxevOinGESn14hp4jw)--