[OAI-general] OAI and web crawlers
Eric Hellman
eric@openly.com
Tue, 24 Jul 2001 11:10:32 -0400
Well, you can pay Inktomi about 25 cents per record to crawl your
site. In the current economic environment, you won't find much
interest from the mainstream search engines unless they add to the
bottom line.
Google is probably the most sophisticated crawler. Google ignores all
metadata that you provide it, on the reasonable assumption that all
webmasters stuff keywords to try to rig search results. Google will
follow a link into a DL; having a fixed, unique URL for each item
will maximize traffic from google.
Google has roots in the Stanford DL program, an approach them from
that direction.
You can see the results of our "overtures to the webcrawling
community". Search for "the origins of chinese communism" at google.
Northern Light is probably a good first target for outreach because
they build specialty search engines.
Danny Sullivan's Search Engine Watch. at
http://www.searchenginewatch.com/ is a good place to learn more.
Getting Danny interested in OAI would be a good way to reach out to
this community.
Eric
At 3:34 PM -0400 7/23/01, Michael L. Nelson wrote:
>I've just added:
>
> # please use our Open Archives Initiative (OAI) interface instead!
> # http://naca.larc.nasa.gov/oai/
> # see http://www.openarchives.org/ for more info
>
>to my robots.txt file for my two DLs (LTRS & NACATRS). I doubt these
>messages will be read by humans, but stranger things have happened.
>
>if your DL is like mine, at any given time webcrawlers from Inktomi,
>Google, etc. are meandering about. I don't discourage this behaivor
>(cf. arXiv), mostly because its never been too much of a problem.
>
>but it would seem that these crawlers would benefit using the OAI
>interface where possible. OAI is doing quite well within the publishing /
>library community, but has anyone made any overtures to the webcrawling
>community? Any ideas on how to do so? I would expect the potential for
>reduced network traffic and increased indexed content could cause them to
>modify their robots to understand OAI...
>
>regards,
>
>Michael
>
--
Eric Hellman
Openly Informatics, Inc.
http://www.openly.com/
1cate: 1-Click Access To Everything