[OAI-implementers] converting filenames of metadata records
John Weatherley
jweather@ucar.edu
Wed, 22 Oct 2003 11:03:11 -0600 (MDT)
Thomas,
for now it is not possible to tell the DLESE OAI software to leave the
colons in file names unencoded rather than converting them to %3A. The
software does this because Windows file systems don't accept colons (and
some other chars) as valid characters, and the software is designed to be
cross-platform compatible.
A number of people have reported having this problem, however, so I may
change the way file names are encoded in future releases of the software
to make them easier to work with (suggestions anyone?).
That being said, I have had success opening and reading files that are
encoded this way using the dom4j XML APIs (available at
http://www.dom4j.org/). Sample code:
import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.io.SAXReader;
...
File dir = new File("/home/jweather/ ... /dlese.org/oai/provider/avc/nsdl_dc");
File [] files = dir.listFiles();
Document document;
for(int i = 0; i < files.length; i++){
document = reader.read(files[i]);
// Process the doc...
}
Another possibility in your code below: in the builder.build(...)
method, try passing in a java.io.InputStream, java.io.Reader or
java.net.URL instead of the java.io.File and see if that works.
- john
On Wed, 22 Oct 2003, Thomas Krämer wrote:
> Hello
>
> i am developping a middleware, that uses metadata harvested with the
> DLESE OAI software.
>
> thus, there is a directory with hundreds of metadata records, that are
> not sorted nor can queries be formulated in order to retrieve the
> relevant among them.
>
> Q1:Am i right assuming that repositories DO NOT offer any search
> interfaces, but provide their entire metadata and nothing more?
>
> Q2:Am i right assuming that the DLESE OAI software has the apache lucene
> search api integrated, but is not yet working?
>
>
> however, i am currently trying to use the apache lucene search api to
> index these records and make them searchable.
>
> certain problem appears, when i try to read a record :
>
>
> SAXBuilder builder = new SAXBuilder();
> try {
> Document doc = builder.build(recordfile.getAbsoluteFile());
> Element root = doc.getRootElement();
> listChildren(root, 0);
> }
>
> i always get an io.FileNotFoundException, as the oai-pmh changes the
> host separator " : " into " %3A "
> the pathname indicated at debugging is the correct one (using the "%3A"
> , such as the record files on my system)
>
> but the exception tells me :
>
> java.io.FileNotFoundException:
> /home/tom/mwd/metadata/7374617475733D696E7072657373/oai_dc/oai:sammelpunkt.philo.at:103.xml
> (No such file or directory)
>
> i am working on a linux system.
>
>
>
> Q3:Is it possible to tell the DLESE OAI Software to save the records on
> the local system using ":" instead of the hex representation, or, to
> wrap the records filename in a way that
> admits the java native classes to open the records?
>
>
>
> Thanks a lot for any hint
>
> Thomas
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
--
John Weatherley
Software Engineer
DLESE Program Center
University Corporation for Atmospheric Research (UCAR)
Box 3000
Boulder, CO 80307-3000
jweather@ucar.edu (e-mail)
(303) 497-2680 (tel)
(303) 497-8336 (fax)
http://www.dlese.org