[OAI-implementers] List Id's for multiple sets
deridder
deridder@cs.utk.edu
Fri, 9 Feb 2001 09:51:20 -0500 (EST)
Good gracious, Tim! That *is* complex. What happens if you have 40
harvesters working on your program at once? You would have multiple
tables--- are you using cookies? And do you have time limitations on
accessing those temp tables? If so, how do you implement that--- and do
you remove all current temp tables on each new query? Seems like that
would mess up with several current accessess. But unaccessed tables could
build up also, so I certainly see that they would need to be periodically
cleared out.
Whew!
And yes, I for one would like to see your OAI "bits"; I'd love to
compare how I'm doing things with how others are, to see if I can improve
on my methods.
---jody
On Fri, 9 Feb 2001, Tim Brody wrote:
> On Thu, 8 Feb 2001, deridder wrote:
>
> > This is looking more complicated than I expected. With no dates
> > specified, and no sets specified, the list could be enormous; and as more
> > and more sets are added, the resumption tokens could get pretty hairy too.
>
> Excuse my ignorance if this is already obvious to you:
>
> (as suggested by Chris Gutteridge, this is how I have implemented
> resumptionTokens)
>
> Initial request:
> Build a temporary table of all the identifiers that match the request,
> this CAN get huge but if you want harvesters to get all of your repository
> there isn't much choice...(indeed I would argue this is more efficient
> than enumerating over sets)
> Output the first 400 records (or whatever) from the temporary table, using
> the identifiers as an index into your database/file system. The
> resumptionToken will be the name of your temporary table and an encoded
> string to tell you what the metadataFormat is (required for ListRecords).
>
> Temporary table is:
> pos int,auto_increment
> id char(64) ... this is OAI Identifier/your archive identifier, but
> if you use OAI to index means ListIdentifiers only needs temporary table
>
> Latter requests:
> Get the appropriate list of identifiers by saying get "pos > start".
>
> To manage the temporary tables I have another table, the temp index, which
> stores the table names and the last time they were accessed. Whenever a
> query is started I remove old temporary tables and their associated
> entries in the temp index. To make the resumptionToken even simpler you
> could store the metadataPrefix in the index ...
>
> The initial request can be very slow, as it has to enumerate over your
> entire archive, but subsequent requests are very quick. Each harvester (if
> it is well behaved) will only need to do this once, subsequent queries
> should use "from" to only grab the latest data.
>
> e.g. (liable to be broken and knackered as is my wont)
> http://cite-base.ecs.soton.ac.uk/cgi-bin/oai/OAI-script?debug=1&verb=ListRecords&m
etadataPrefix=oai_dc
>
> As an aside, I have tried to write my OAI "bits" to be in a seperate, non
> archive-specific library - would people be interested in access to this (I
> can not guarantee its correctness nor robustness, just it supports the
> bits of OAI that I've needed)?
>
> All the best,
> Tim Brody
> Computer Science, University of Southampton
> email: tdb198@soton.ac.uk
> Web: http://www.ecs.soton.ac.uk/~tdb198/
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2
mQCNAzpx5qgAAAEEAKYqSHUPBHsE1SIOclJiJN5TpA8PalfWOCWH3X1d9AQWj8Tz
Lf1mT6R3ps7p+Rn2w9QZHEpZf1AiW9XCJ3Hpiu60IIQ9AHFOddtO8IEcUUreOU5k
mVnxfXC2RXtGKN6cwCUzSVT7X8a+UkJq4rHTRR1WMIFe2XieesmMNng0GECtAAUR
tCNKb2R5IERlUmlkZGVyIDxkZXJpZGRlckBjcy51dGsuZWR1PokAlQMFEDpx5vfJ
jDZ4NBhArQEBbHUD/0/JRsgqGaNXlDaO6BV8xosIyVE0FRuyhBaIVCAAij6RFIi5
Wls/hCmkpBtWwYvu4HJTH5ZtmljJK8TiRmKpZZzsGyAg2dVRxytLIDgNuwkoX28v
9G5gBludckV9usAEtYaTwLpwVaBATttc3FBsUafUpZIkRvdtDv6x49JcWhsA
=AGsQ
-----END PGP PUBLIC KEY BLOCK-----
pub 1024/341840AD 2001/01/26 Jody DeRidder <deridder@cs.utk.edu>
Key fingerprint = 07 1D D3 00 21 2F FA 83 E8 FD B7 80 D2 D9 D5 2D