[OAI-implementers] Qualified Dublin Core

Thu Aug 12 21:09:09 EDT 2004

Hi Jeff,

You wrote:
> Speaking only for myself as a service provider, I doubt that I will be
> interested in a generic container of "things". I simply don't have the
> time or patience to figure out what's what in a blob of data.
> Including a "schema" attribute and "element" element might help, but
> the investigation still sounds difficult. In addition, my sense is
> that taking the easy route and blindly pulling DC and DCQ elements
> from the blob (or elements from any other namespace) will produce a
> meaningless jumble 9 times out of 10. I could be wrong, though.

I agree that it would be silly to do what you describe.  But that's
not what I'm talking about.

1)  I am not talking about random blobs of data.  I'm talking about
    "DC-like" metadata which consists of:	

       *  one (or more) published schema identifiers,
       *  a collection of element/encoding/language/value tuples
	  where the element name, encoding and language tags are
          defined by the schema.

    If I recognize the schema identifier as identifying a schema that
    is derived from DCQ [+], then I have a reasonable basis for inferring 
    the meaning of an element whose name is (say) DC.date.published.

2)  I am not talking about randomly pulling out elements.  I'm talking
    about extracting elements that are meaningful to my metadata collection
    from records whose schema identifiers my software recognizes.  Indeed,
    I may well use different filtering for different schemas, or even
    for different source repositories.

    The validation step then decides if the filtered record is acceptable.
    For example it checks

       *  that all elements required BY MY SCHEMA are present,
       *  that all encodings, language known, and
       *  that the values for the elements acceptable according to 
          the specified encodings.

2a) Another alternative is to retain the "unknown" elements from the 
    harvested records.  You can separately decide whether or not to 
    show them to end-users or to pass them on to other repositories.

    This is the roughly model we use in HotMeta / MetaSuite.  Our
    repository understands multiple metadata schemas, and can store 
    records that conform to any ... or none of them.  When an end-user
    does a metadata query, the results show a configurable context
    dependent subset of the available elements.

3)  I am not talking about harvesting metadata from random places.
    Rather, I'm talking about harvesting from reputable metadata 
    repositories. I'm assuming that:

       *  the repository owner is not going to wantonly abuse the
          schemas he/she purports to use; e.g. systematically putting 
          the author name in the DC.Title, and
       *  the repository owner has adequate quality control in place
          to ensure that the element values are reasonably accurate.

    If these are not true, it is not sensible to harvest metadata from
    the repository in question. 

[+]  If you send me a schema identifier for some schema FNORD that I've
never heard of, I probably cannot do anything with your metadata. If you
>>also<< send me that schema identifier for DCQ, I know that the records
are DCQ compatible (if you ignore non-DCQ elements).  Alternatively, if
the schema identifier for FNORD points at a schema definition that says
the FNORD is an extension to DCQ, I can work this out for myself, either
manually or (maybe in the future) automatically.  Then I can configure
my software to treat FNORD records as DCQ records ... or do something
else. 

Obviously, this will work better if there is an agreed standard for
specifying schemas that includes some way of specifying the parentage of
a schema and the semantic relationships with elements in the ancestor
schemas.  We don't need a complete semantic information, just sufficient
to say that (say) that the meaning of DC.Title in an DC record subsumes
the meaning of DC.Title in AGLS (or vice-versa).

-- Steve

+----------------------------------+----------------------------------------
| Stephen Crawley                  | MetaSuite Project Leader
| Level 7, GP South Building (78)  | Distributed Systems Technology CRC
| Staff House Road                 | Tel   : +61 7 3365 4310
| The University of Queensland     | Fax   : +61 7 3365 4311
| Queensland 4072                  | Email : crawley at dstc.edu.au
| Australia                        | WWW   : http://www.dstc.edu.au
|                                  | DSTC is the Australian W3C Office
+----------------------------------+----------------------------------------