[UPS] Problems/Comments with Santa Fe Metadata Set
Mark Doyle
doyle@aps.org
Tue, 16 Nov 1999 13:13:14 -0500
Greetings Carl,
> From: Carl Lagoze <lagoze@cs.cornell.edu>
> Date: 1999-11-15 06:41:17 -0500
Sorry, I was unable to answer this sooner... Since I was the one who
initiated the addition of this element, I feel I should address it. I
understand your point of view, but I think that we live in an imperfect world
and one needs to have pragmatic solutions to otherwise vexing problems. The
whole point of these repositories and overlaid services is to make material
available to researchers in a variety of formats, some of which may be much
richer than others (the variation is both within a repository and across
repositories). Formats may be added and removed as the underlying technology
changes. Any service which chooses to just display a subset of a repositories
formats (say, just PostScript or PDF) is likely to short change users. For
instance, xxx offers many flavors of PostScript, some of which require the
user to understand additional issues (e.g., font installation). So the simple
goal (again in the context of doing things on the six month scale) is to
give users a path to the definitive interface of a repository, preferably
anchored around the target that the user is actually interested in at the
moment. I feel it is much more useful to a user of the services to wind up a
"wrapper" page than just the home page of the arXiv. Furthermore, the URL's
in the display ID are to be persistent and freely accessible (some
repositories may have to limit who can access certain components and a
mechanism for authentication has to be made available).
> Our view throughout the design of Dienst (and digital object repositories in
> general) is that a repository is not in the business of human presentation.
> It simply provides sufficient information through a protocol so that other
> services can use its contents. From the perspective human interaction, it
> provides protocol requests that can be used by any user interface to
> construct "display pages" are pages that access specific disseminations or
> parts of disseminations. Thus, there may be many user interfaces and many
> "display Ids" for a particular digital object. Furthermore, a repository
> does not have any record of what these display Ids are (i.e., does the
> publisher of a book know every house, library, bookstore that their book
> sits in).
This is all well and good in theory, but where the rubber hits the road, I
think it fails. Not all repositories are the same. The selection of
repository services that an overlay service makes visible to the reader is
not likely to be the complete set of services. This is a disadvantage to the
users who may not even be aware that they have other choices for retrieval of
information.
> The display ID metadata element presumes that not only does the repository
> or digital object know about these URLs but endows one with the property of
> being the "correct" one (a rather wrong concept since the display ID for an
> Italian audience should be different than for a US audience).
I strongly disagree. The fact is that most repositories have a definitive
wrapper page that provides links to all available repository services
relevant to a particular item in the repository (and a dynamic set of
services at that). To use phrasing from physics, this is a "natural" URL -
"naturalness" is not a statement about correctness (as you imply), but
rather, it allows for a choice of a distinguished member of a class (here,
the class of display URL's). There may be other ways to make the choice
(just as natural), but each arXiv has a very good sense of which URL is
potentially the most useful to end users. The mere fact that these stable,
persistent URLs exist and are made available by the repository distinguishes
them from the rest of the URLs.
Should all overlays be required to track all of the services and mirrors of
its underlying repositories? I think that is what your point of view requires
(and from below, you seem to acknowledge this). You seem to want to keep
users confined to a specific box without even giving them a chance to see
that there may be more in the world than the box you give them.
> Furthermore
> it imprints it as part of the metadata for the digital object, which
> philosophically is a rather persistent entity - yes, objects should be
> persistent but the user interfaces that present them should be malleable.
The URLs were meant to be as persitent as the object itself. The
malleability is in what the URL points to, not what the URL is. URLs are not
inherently non-persistent.
> For a little idea of how this works in the Dienst software take a look at
> the following example:
>
> A document with the URN ncstrl.cornell/TR94-1418
>
> Its display page from the Cornell ncstrl user interface is:
> http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/ncstrl.cornell/TR94-1418
> This information is put together from three protocol requests to the object
> in the cornell repository:
[A rich set of wonderful examples deleted]
> This uses the same raw repository requests to construct its information.
>
> In fact, this is exactly the way that NCSTRL and XXX/CoRR interact. Take a
> look at the URL:
> http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020
> <http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.DL/9812020>
>
> and you will see a document in XXX presented through the NCSTRL user
> interface. You could go to http://xxx.lanl.gov/archive/cs/intro.html
> and get the same document through the XXX User interface.
This is the main counter example right here (thanks for providing it!). I do
not object to your presentation of the information through the NCSTRL
interface (having the uniform interface is quite nice), but I do not
understand why you don't give the user the natural URL
http://xxx.lanl.gov/abs/cs/9812020. Why force the user to navigate from
http://xxx.lanl.gov/archive/cs/intro.html? Actually, this example isn't
really the best because your article is only available in a single format.
Instead, I give http://xxx.lanl.gov/abs/cs.CL/9911006
(http://cs-tr.cs.cornell.edu:80/Dienst/UI/1.0/Display/xxx.cs.CL/9911006)
with source, pdf, and other formats (dvi and about 8 flavors of PS). You
suppress source and dvi, and you chose a single resolution of bitmapped fonts
for the PS (I prefer resolution independent type I fonts). Where would
NCSTRL give me an opportunity for discovering that I can choose a mirror,
that I can choose a default download format (or even that other choices
exist), that author names are conveniently linked for searching, or, in the
case of some physics archives, that xxx provides "cited by" and "refers to"
links?
> Sorry to assault you with all this detail but we at Cornell have been
> somewhat in the business of trying to get DL protocols correct and this
> "display URL" violates some of our thinking on separation of concerns. I
> don't have a real good answer here, since the "correct" answer (from the
> Dienst perspective) involves some more burden on the external services
> (understanding more protocol requests).
Exactly. My point is that there exist natural URLs which may give enhanced
services to users. It may be that some repositories will just give the Dienst
display URL and be done with it. But I submit that the majority of
repositories will function not just as faceless warehouses, but will also
present their own particular view of the world, will have a persistent URL
mechanism for accessing that view, and some set of users will find benefit in
the repository's view. I think you need to change your vocabulary a bit. Try
"natural" or "canonical" rather than "correct."
All that said, I might be persuaded that the display ID doesn't have to be
mandatory, but I think the act of a repository commiting to persistent
nautral URLs (i.e., the notion of making them readable as well as writable)
is one of the foundational principles for making them function as true
repositories. Thus, I don't think any repository should choose to omit it.
Nor do I think any overlay should throw away this item of information if it
is provided.
Cheers,
Mark