ugh, sorry - I pushed send too quickly. The 2 bullet points after the
sentence I quoted clear it up (at least in regards to what the spec
defines). Seems a little ambiguous, though, doesn't it? It seems that
to harvest in the most correct way you almost have to know how the
provider is implemented, which kind of defeats the purpose of a spec. I'm still curious as to whether there's a de facto standard
that most providers are using?<br><br><br><div class="gmail_quote">On Tue, Feb 1, 2011 at 10:26 AM, Benjamin Anderson <span dir="ltr"><<a href="http://benanderson.us">benanderson.us</a>@<a href="http://gmail.com">gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Thanks Simeon. I'm looking over the section you linked to...<br><br><blockquote style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;" class="gmail_quote">
Repositories that implement <code>resumptionTokens</code> <b>must</b> do so in a manner that allows
harvesters to resume a sequence of requests for incomplete lists by re-issuing a
list request with the most recent <code>resumptionToken</code><br></blockquote><div><br>I'm having a hard time understanding this sentence. What is meant by "incomplete list"? What is meant by "re-issuing a list request"?<br>
<br>I was just thinking that my harvester assumption wouldn't work for the given scenario:<br><br>Let's assume a provider that allows for updates during harvests and that this provider only keeps the most recent updated date (not all update dates). If a record was updated before t0 and again after t0 (but before it was included in the harvest initiated at t0), then the harvester will not get the record even though it should have. That's probably a rare case, but nevertheless bound to happen. Are there guidelines for the best way to use an until as a harvester?<br>
<br>Thanks again,<br>Ben<br><br></div><div><div></div><div class="h5"><br><div class="gmail_quote">On Tue, Feb 1, 2011 at 10:05 AM, Simeon Warner <span dir="ltr"><<a href="mailto:simeon.warner@cornell.edu" target="_blank">simeon.warner@cornell.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">Hi Ben,<br>
<br>
This is covered in the in section 3.5.1 of the specification:<br>
<br>
<a href="http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#Idempotency" target="_blank">http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#Idempotency</a><br>
<br>
I think your solution for the harvester is the correct one. Provided the harvester starts again with from=t0 all changes between t0 and t2 will be harvested, irrespective of whether or not they were included in the original response (modulo understood problems with items that move between sets for set selective requests).<br>
<br>
Cheers,<br>
Simeon<div><div></div><div><br>
<br>
On 02/01/2011 09:09 AM, Benjamin Anderson wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div><div></div><div>
Hi,<br>
<br>
I'm wondering what others are doing when a ListRecords request w/out an<br>
until comes in.� Consider this scenario:<br>
<br>
t0 - harvest request (with no until) is initiated<br>
t1 - record 101 is added to the repo<br>
t2 - harvest is finished (it took multiple requests to complete)<br>
<br>
Should record 101 be included in the harvest data?� If not, will the<br>
client better issue their next harvest with a from=t0 (a from=t2 would<br>
be invalid because they'd miss out on record 101).<br>
<br>
We have implemented both oai-pmh harvesters and providers, so I have to<br>
consider both ends of this.� Here's what I'm thinking...<br>
<br>
As a Provider<br>
I will simply lock the repo so that the above scenario can't happen.� If<br>
someone is already harvesting (there exist unexpired resumptionTokens)<br>
then I will not update the repository.<br>
<br>
As a Harvester<br>
I will always use the until parameter with the value of the time the<br>
harvest was initially started.<br>
<br>
I think this keeps me clear of any problems.� Anyone else have thoughts<br>
or care to share your solutions?<br>
<br>
Thanks,<br>
Ben Anderson<br>
<br>
<br>
<br>
<br></div></div>
_______________________________________________<br>
OAI-implementers mailing list<br>
List information, archives, preferences and to unsubscribe:<br>
<a href="http://www.openarchives.org/mailman/listinfo/oai-implementers" target="_blank">http://www.openarchives.org/mailman/listinfo/oai-implementers</a><br>
<br>
</blockquote>
<br>
</blockquote></div><br>
</div></div></blockquote></div><br>