[OAI-general] Re: [BOAI] Re: Cliff Lynch on Institutional Archives

Thu, 27 Mar 2003 10:05:54 -0500 (EST)

 I have thought about trying to make sets for each subject entry, and then ran
across the idea of a "home set" identifier that would point to the original
association.  But I am just beginning to work with OAI and probably need to
read all the archives. :)
--Paul Cummins
UT Library, Systems


> hi
>
> this may be stating the obvious, but why not use sets for the separate
> disciplines, aimed at particular service providers? i say it that way
> because some disciplines are not well-defined (namely, computer science)  so
> such archives may want to play ball with multiple service providers  and
> hence may need different sets.
>
> in any event, for something like physics, a simple set might do the  trick
> at the source. then, somewhat in keeping with the Kepler model (as
> published in DLib a while back), the service provider can provide an
> interface for potential data providers to self-register. i know this  sounds
> dodgy, but think of it as an alternative mechanism for
> contribution. either individual users submit individual papers or groups
> submit baseURLS - both go through some kind of review and while one  leads
> to once-off storage, the other leads to periodic harvesting.
>
> what remains a difficult problem, however, is how to recreate the  metadata
> used by the service provider as its native format. so, for a  typical
> example, if arXiv classifies items using a specific set
> structure, this is certainly not going to be the default for an
> institutional archive. does the service provider automatically or  manually
> reclassify? or does it not allow browsing by categories? in  either event,
> the quality of the metadata from the perspective of the  service provider
> may be an impetus for potential users to want to  replicate their effort
> rather than rely on the automated submission from  their own institutions
> ... this needs more thought ...
>
> ttfn,
> ----hussein
>
>
> Christopher Gutteridge wrote:
>> Disciplinary/subject archives vs. Institutional/Organisation/Region based
>> archives. This is going to be a key challenge now open archives begin to
>> gain momentum.
>>
>> For example; we are planning a University-wide eprints archive. I am
>> concerned that some physisists will want to place their items in both the
>> university eprints service AND the arXiv physics archive. They may  be
>> required to use the university service, but want to use arXiv as it is the
>> primary source for their discipline. This is a duplication of  effort and
>> a potential irritation.
>>
>> Ultimately, of course, I'd hope that diciplinary archives will be replaced
>> with subject-specific OAI service providers harvesting from the
>> institutional archives. But there is going to be a very long transition
>> period in which the solution evolves from our experience.
>>
>> What I'm asking is; has anyone given consideration to ways of smoothing
>> over this duplication of effort? Possibly some negotiated automated
>> process for insitutional archives uploading to the subject archive, or at
>> least assisting the author in the process.
>>
>> This isn't the biggest issue, but it'd be good to address it before it
>> becomes more of a problem.
>>
>>   Christopher Gutteridge
>>   GNU EPrints Head Developer
>>   http://software.eprints.org/
>>
>> On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote:
>>
>>>On Sat, 15 Mar 2003, Thomas Krichel wrote:
>>>
>>>
>>>>  Stevan Harnad writes:
>>>>
>>>>sh> There is no need -- in the age of OAI-interoperability -- for sh>
>>>> institutional archives to "feed" central disciplinary archives:
>>>>
>>>>  I do not share what I see as a  blind faith in interoperability through
>>>> a technical protocol.
>>>
>>>I am quite happy to defer to the technical OAI experts on this one, but
>>> let us put the question precisely:
>>>
>>>Thomas Krichel suggests that institutional (OAI) data-archives
>>>(full-texts) should "feed" disciplinary (OAI) data-archives,
>>>because OAI-interoperability is somehow not enough. I suggest that
>>> OAI-interoperability (if I understand it correctly) should be enough. No
>>> harm in redundant archiving, of course, for backup and security, but not
>>> necessary for the usage and functionality itself. In fact, if I understand
>>> correctly the intent of the OAI distinction between OAI data-providers --
>>> http://www.openarchives.org/Register/BrowseSites.pl
>>>-- and OAI service-providers --
>>>http://www.openarchives.org/service/listproviders.html
>>>-- it is not the full-texts of data-archives that need to be "fed" to
>>> (i.e., harvested by) the OAI service providers, but only their metadata.
>>>
>>>Hence my conclusion that distributed, interoperable OAI institutional
>>> archives are enough (and the fastest route to open-access). No need to
>>> harvest their contents into central OAI discipline-based archives (except
>>> perhaps for redundancy, as backup). Their OAI interoperability should be
>>> enough so that the OAI service-providers can (among other things) do the
>>> "virtual aggregation" by discipline (or any other computable criterion) by
>>> harvesting the metadata alone, without the need to harvest full-text
>>> data-contents too.
>>>
>>>It should be noted, though, that Thomas Krichel's excellent RePec archive
>>> and service in Economics -- http://repec.org/ -- goes
>>>well beyond the confines of OAI-harvesting! RePec harvests non-OAI content
>>> too, along lines similar to the way ResearchIndex/citeseer --
>>> http://citeseer.nj.nec.com/cs -- harvests non-OAI content in computer
>>> science. What I said about there being no need to "feed" institutional OAI
>>> archive content into disciplinary OAI archives certainly does not apply to
>>> *non-OAI* content, which would otherwise be scattered willy-nilly all over
>>> the net and not integrated in any way. Here RePec's and ResearchIndex's
>>> harvesting is invaluable, especially as RePec already does (and
>>> ResearchIndex has announced that it plans to) make all its harvested
>>> content OAI-compliant!
>>>
>>>To summarize: The goal is to get all research papers, pre- and
>>>post-peer-review, openly accessible (and OAI-interoperable) as soon as
>>> possible. (These are BOAI Strategies 1 [self-archiving] and 2
>>>[open-access journals]: http://www.soros.org/openaccess/read.shtml ). In
>>> principle this can be done by (1) self-archiving them in central OAI
>>> disciplinary archives like the Physics arXiv (the biggest and first of its
>>> kind) -- http://arxiv.org/show_monthly_submissions
>>>-- by (2) self-archiving them in distributed institutional OAI
>>>Archives -- http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt -- by (3)
>>> self-archiving them on arbitrary Web and FTP sites (and hoping they will
>>> be found or harvested by services like Repec or ResearchIndex) or by (4)
>>> publishing them in open-access journals (BOAI Strategy 2:
>>> http://www.soros.org/openaccess/journals.shtml ).
>>>
>>>My point was only that because researchers and their institutions (*not*
>>> their disciplines) have shared interests vested in maximizing their joint
>>> research impact and its rewards, institution-based
>>>self-archiving (2) is a more promising way to go -- in the age of
>>> OAI-interoperability -- than discipline-based self-archiving (1), even
>>> though the latter began earlier. It is also obvious that both (1) and (2)
>>> are preferable to arbitrary Web and FTP self-archiving (3), which began
>>> even earlier (although harvesting arbitrary Website and FTP contents into
>>> OAI-compliant Archives is still a welcome makeshift strategy until the
>>> practise of OAI self-archiving is up to speed). Creating new open-access
>>> journals and converting the established (20,000) toll-access journals to
>>> open-access is desirable too, but it is obviously a much slower and more
>>> complicated path to open access than self-archiving, so should be pursued
>>> in parallel.
>>>
>>>My conclusion in favor of institutional self-archiving is based on the
>>> evidence and on logic, and it represents a change of thinking,
>>>for I had originally advocated (3) Web/FTP self-archiving --
>>>http://www.arl.org/scomm/subversive/toc.html -- then switched allegiance
>>> to central self-archiving (1), even creating a discipline-based archive:
>>> http://cogprints.ecs.soton.ac.uk/ But with the advent of OAI in 1999, plus
>>> a little reflection, it became apparent that
>>>institutional self-archiving (2) was the fastest, most direct, and most
>>> natural road to open access: http://www.eprints.org/
>>>And since then its accumulating momentum seems to be confirming that this
>>> is indeed so: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2212.html
>>> http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt
>>>
>>>
>>>>  The primary sense of belonging
>>>>  of a scholar in her research activities is with the disciplinary
>>>> community of which she thinks herself a part... It certainly
>>>>  is not with the institution.
>>>
>>>That may or may not be the case, but in any case it is irrelevant to the
>>> question of which is the more promising route to open-access. Our primary
>>> sense of belonging may be with our family, our community, our creed, our
>>> tribe, or even our species. But our rewards (research grant funding and
>>> overheads, salaries, postdocs and students attracted to our research,
>>> prizes and honors) are intertwined and shared with our institutions (our
>>> employers) and not our disciplines (which are often in fact the locus of
>>> competition for those same rewards!)
>>>
>>>
>>>>  Therefore, if you want to fill
>>>>  institutional archives---which I agree is the best long-run way to
>>>> enhance access and preservation to scholarly research--- [the]
>>>> institutional archive has to be accompanied by a discipline-based
>>>> aggregation process.
>>>
>>>But the question is whether this "aggregation" needs to be the "feeding"
>>> of institutional OAI archive contents into disciplinary OAI archives, or
>>> merely the "feeding" of OAI metadata into OAI services.
>>>
>>>
>>>>   The RePEc project has produced such an aggregator
>>>>  for economics for a while now. I am sure that other, similar
>>>>  projects will follow the same aims, but, with the benefit of
>>>>  hindsight, offer superior service. The lack of such services
>>>>  in many disciplines,  or the lack of interoperability between
>>>> disciplinary and  institutional archives, are major obstacle to the
>>>> filling  the institutional archives.  There are no
>>>>  inherent contradictions between institution-based archives
>>>>  and disciplinary aggregators,
>>>
>>>There is no contradiction. In fact, I suspect this will prove to be a
>>> non-issue, once we confirm that (a) we agree on the need for
>>>OAI-compliance and (b) "aggregation" amounts to metadata-harvesting and
>>> OAI service-provision when the full-texts are in the institutional archive
>>> are OAI-compliant (and calls for full-text harvesting only if/when they
>>> are not). Content "aggregation," in other words, is a paper-based notion.
>>> In the online era, it merely means digital sorting of the pointers to the
>>> content.
>>>
>>>
>>>>  In the paper that Stevan refers to, Cliff Lynch writes,
>>>>  at http://www.arl.org/newsltr/226/ir.html
>>>>
>>>>cl> But consider the plight of a faculty member seeking only broader cl>
>>>> dissemination and availability of his or her traditional journal cl>
>>>> articles, book chapters, or perhaps even monographs through use of cl>
>>>> the network, working in parallel with the traditional scholarly cl>
>>>> publishing system.
>>>>
>>>>  I am afraid, there more and more such faculty members. Much
>>>>  of the research papers found over the Internet are deposited
>>>>  in the way. This trend is growing not declining.
>>>
>>>You mean self-archiving in arbitrary non-OAI author websites? There is
>>> another reason why institutional OAI archives and official institutional
>>> self-archiving policies (and assistance) are so important. In reality, it
>>> is far easier to deposit and maintain one's papers in institutional OAI
>>> archives like Eprints than to set up and maintain one's own website. All
>>> that is needed is a clear official institutional policy, plus some startup
>>> help in launching it. (No such thing is possible at a "discipline" level.)
>>>http://www.ecs.soton.ac.uk/~lac/archpol.html
>>>http://www.eprints.org/self-faq/#institution-facilitate-filling
>>> http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
>>>http://paracite.eprints.org/cgi-bin/rae_front.cgi
>>>
>>>
>>>>cl> Such a faculty member faces several time-consuming problems. He or
>>>> cl> she must exercise stewardship over the actual content and its cl>
>>>> metadata: migrating the content to new formats as they evolve over cl>
>>>> time, creating metadata describing the content, and ensuring the cl>
>>>> metadata is available in the appropriate schemas and formats and cl>
>>>> through appropriate protocol interfaces such as open archives cl>
>>>> metadata harvesting.
>>>>
>>>>  Sure, but academics do not like their work-, and certainly
>>>>  not their publishing-habits, [to] be interfered with by external
>>>> forces. Organizing academics is like herding cats!
>>>
>>>I am sure academics didn't like to be herded into publishing with the
>>> threat of perishing either. Nor did they like switching from paper to
>>> word-processors. Their early counterparts probably clung to the oral
>>> tradition, resisting writing too; and monks did not like be herded from
>>> their peaceful manuscript-illumination chambers to the clamour of printing
>>> presses. But where there is a causal contingency -- as there is between
>>> (a) the research impact and its rewards, which academics like as much as
>>> anyone else, and (b) the accessibility of their research -- academics are
>>> surely no less responsive than Prof. Skinner's pigeons and rats to those
>>> causal contingencies, and which buttons they will have to press  in order
>>> to maximize their rewards!
>>>http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm
>>>
>>>Besides, it is not *publishing* habits that need to be changed, but
>>> *archiving* habits, which are an online supplement, not a substitute, for
>>> existing (and unchanged) publishing habits.
>>>
>>>
>>>>cl> Faculty are typically best at creating new
>>>>cl> knowledge, not maintaining the record of this process of
>>>>cl> creation. Worse still, this faculty member must not only manage cl>
>>>> content but must manage a dissemination system such as a personal Web cl>
>>>> site, playing the role of system administrator (or the manager of cl>
>>>> someone serving as a system administrator).
>>>>
>>>>  There are lot of ways in which to maintain a web site or to get access
>>>> to a maintained one. It is a customary activity these days and no
>>>> longer requires much technical expertise. A primitive integration of
>>>> the contents can be done by Google, it requires  no metadata. Academics
>>>> don't care  about long-run preservation, so that problem remains
>>>> unsolved. In the meantime, the academic who uploads papers to a web
>>>> site takes steps to resolve the most pressing problem, access.
>>>
>>>Agreed. And uploading it into a departmental OAI Eprints Archive is  by
>>> far the simplest way and most effective way to do all of that. All it
>>> needs is a policy to mandate it:
>>>http://www.ecs.soton.ac.uk/~lac/archpol.html
>>>
>>>
>>>>cl> Over the past few years, this has ceased to be a reasonable activity
>>>> cl> for most amateurs; software complexity, security risks, backup cl>
>>>> requirements, and other problems have generally relegated effective cl>
>>>> operation of Web sites to professionals who can exploit economies of cl>
>>>> scale, and who can begin each day with a review of recently issued cl>
>>>> security patches.
>>>>
>>>>  These are technical concerns. When you operate a linux box
>>>>  on the web you simply fire up a script that will download
>>>>  the latest version. That is easy enough. Most departments
>>>>  have separate web operations. Arguing for one institutional
>>>>  archive for digital contents is akin to calling for a single web site
>>>> for an institution. The diseconomies of scale of central administration
>>>> impose other types of costs that the ones that it was to reduce. The
>>>> secret is to find a middle way.
>>>
>>>I couldn't quite follow all of this. The bottom line is this: The free
>>> Eprints.org software (for example) can be installed within a few days. It
>>> can then be replicated to handle all the departmental or research group
>>> archives a university wants, with minimal maintenance time or costs. The
>>> rest is just down to self-archiving, which takes a few minutes for the
>>> first paper, and even less time for subsequent papers (as the repeating
>>> metadata -- author, institution, etc., can be "cloned" into each new
>>> deposit template). An institution may wish to impose an institutional
>>> "look" on all of its separate eprints archives; but apart from that, they
>>> can be as autonomous and as distributed and as many as desired:
>>> OAI-interoperability works locally just as well as it does globally.
>>>
>>>
>>>>cl> Today, our faculty time is being wasted, and expended ineffectively,
>>>> cl> on system administration activities and content curation. And, cl>
>>>> because system administration is ineffective, it places our cl>
>>>> institutions at risk: because faculty are generally not capable of cl>
>>>> responding to the endless series of security exposures and patches, cl>
>>>> our university networks are riddled with vulnerable faculty machines cl>
>>>> intended to serve as points of distribution for scholarly works.
>>>>
>>>>  This is the fight many faculty face every day, where they
>>>>  want to innovate scholarly communication, but someone
>>>>  in the IT department does not give the necessary permission
>>>>  for network access...
>>>
>>>I don't think I need to get into this. It's not specific to
>>>self-archiving, and a tempest in a teapot as far as that is concerned. An
>>> efficient system can and will be worked out once there is an effective
>>> institutional self-archiving policy. There are already plenty of excellent
>>> examples, such as CalTech:
>>>http://library.caltech.edu/digital/
>>>See also:
>>>http://software.eprints.org/#ep2
>>>
>>>Stevan Harnad
>>
>>
>
>
> --
> =====================================================================
> hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com
> =====================================================================
>
> _______________________________________________
> OAI-general mailing list
> OAI-general@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general