[OAI-implementers] Qualified Dublin Core
Stephen Crawley
crawley at dstc.edu.au
Thu Aug 12 21:09:09 EDT 2004
Hi Jeff,
You wrote:
> Speaking only for myself as a service provider, I doubt that I will be
> interested in a generic container of "things". I simply don't have the
> time or patience to figure out what's what in a blob of data.
> Including a "schema" attribute and "element" element might help, but
> the investigation still sounds difficult. In addition, my sense is
> that taking the easy route and blindly pulling DC and DCQ elements
> from the blob (or elements from any other namespace) will produce a
> meaningless jumble 9 times out of 10. I could be wrong, though.
I agree that it would be silly to do what you describe. But that's
not what I'm talking about.
1) I am not talking about random blobs of data. I'm talking about
"DC-like" metadata which consists of:
* one (or more) published schema identifiers,
* a collection of element/encoding/language/value tuples
where the element name, encoding and language tags are
defined by the schema.
If I recognize the schema identifier as identifying a schema that
is derived from DCQ [+], then I have a reasonable basis for inferring
the meaning of an element whose name is (say) DC.date.published.
2) I am not talking about randomly pulling out elements. I'm talking
about extracting elements that are meaningful to my metadata collection
from records whose schema identifiers my software recognizes. Indeed,
I may well use different filtering for different schemas, or even
for different source repositories.
The validation step then decides if the filtered record is acceptable.
For example it checks
* that all elements required BY MY SCHEMA are present,
* that all encodings, language known, and
* that the values for the elements acceptable according to
the specified encodings.
2a) Another alternative is to retain the "unknown" elements from the
harvested records. You can separately decide whether or not to
show them to end-users or to pass them on to other repositories.
This is the roughly model we use in HotMeta / MetaSuite. Our
repository understands multiple metadata schemas, and can store
records that conform to any ... or none of them. When an end-user
does a metadata query, the results show a configurable context
dependent subset of the available elements.
3) I am not talking about harvesting metadata from random places.
Rather, I'm talking about harvesting from reputable metadata
repositories. I'm assuming that:
* the repository owner is not going to wantonly abuse the
schemas he/she purports to use; e.g. systematically putting
the author name in the DC.Title, and
* the repository owner has adequate quality control in place
to ensure that the element values are reasonably accurate.
If these are not true, it is not sensible to harvest metadata from
the repository in question.
[+] If you send me a schema identifier for some schema FNORD that I've
never heard of, I probably cannot do anything with your metadata. If you
>>also<< send me that schema identifier for DCQ, I know that the records
are DCQ compatible (if you ignore non-DCQ elements). Alternatively, if
the schema identifier for FNORD points at a schema definition that says
the FNORD is an extension to DCQ, I can work this out for myself, either
manually or (maybe in the future) automatically. Then I can configure
my software to treat FNORD records as DCQ records ... or do something
else.
Obviously, this will work better if there is an agreed standard for
specifying schemas that includes some way of specifying the parentage of
a schema and the semantic relationships with elements in the ancestor
schemas. We don't need a complete semantic information, just sufficient
to say that (say) that the meaning of DC.Title in an DC record subsumes
the meaning of DC.Title in AGLS (or vice-versa).
-- Steve
+----------------------------------+----------------------------------------
| Stephen Crawley | MetaSuite Project Leader
| Level 7, GP South Building (78) | Distributed Systems Technology CRC
| Staff House Road | Tel : +61 7 3365 4310
| The University of Queensland | Fax : +61 7 3365 4311
| Queensland 4072 | Email : crawley at dstc.edu.au
| Australia | WWW : http://www.dstc.edu.au
| | DSTC is the Australian W3C Office
+----------------------------------+----------------------------------------
More information about the OAI-implementers
mailing list