[OAI-implementers] Sets and stuff, OAI 2.0

Wed, 15 May 2002 09:12:54 +1000

On Tue, May 14, 2002 at 05:58:15PM -0400, deridder wrote:
> The dilemma is: how to implement the database to return records in a
> timely manner, and be scalable.
> 
> If I allow a record to be in 0-5 sets, and the set fields are in the
> same table as the record fields, 5 selects on the same table are required
> to respond to a single ListRecord request with set argument.
> 
> If I put the sets in a secondary table, pull out all the identifiers for
> a given set (same request), then when I have a request for ListRecords
> *without* a set argument, I need to do a select on the set table for each
> record returned in the ListRecord response.

I actually don't have a suggestion here. The database engine I am using
(ours! :-) supports nested repeating structures, so we can store and index
multiple sets directly in the record without problem. For a relational
database, what you are saying makes sense.

If you want a hacky suggestion, you could have a field in the same table
as the record which contains all the set names (separated by say spaces
or commas) so when you fetch the record you can return the set names
efficently, but to allow efficient querying have a redundant separate table
of set names which can be joined back to the main record table. But I
am not speaking with any experience here.

> Maybe I should forget the sets altogether.  For those of you with
> harvesters and search engines:  how do you use the repository sets
>  (or do you?)

Our intent is to left our implementation be configured so the person
controlling the harvest selects which sets to use. The idea as I
understand it is so a Museum with lots of different sorts of information
can make it all available, but a physics department could harvest from
lots of different sources information only relating to physics (eg:
how physics are used in carbon dating or something). But its up to
the data provider to define sets, then up to the harvester to decide
which sets look interesting.

> (Oh, and if you can recommend which ListRecord fields you have found
> useful, I'd like to hear about that also;  I'd like to standardize my
> returns.)

I plan to keep the whole <record> in the database and so let applications
use what they want out of it. So I guess I would encourage you to return
as much as you can. Is there some specific areas you had in mind?

Hope this was a little help,
Alan
-- 
Alan Kent (mailto:ajk@mds.rmit.edu.au, http://www.mds.rmit.edu.au/~ajk/)
Project: TeraText Technical Director, InQuirion Pty Ltd (www.inquirion.com)
Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC Australia.
Phone: +61 3 9925 4114  Reception: +61 3 9925 4099  Fax: +61 3 9925 4098