[OAI-implementers] Special characters, UNICODE, and OAI tools
Xiaoming Liu
liu_x@cs.odu.edu
Tue, 13 Feb 2001 02:16:42 -0500 (EST)
hi,
On Mon, 12 Feb 2001, Caroline Arms wrote:
>
> Several (although not all) special characters are coming through when I
> use ARC with Netscape 4.7 on Windows. Internet 5.5 doesn't do any better
> than Netscape 4.7. Also, not coming through are a few "XML sanity"
> entities, which we have been expressing as "old-fashioned" character
> entities. I don't claim to be an XML character encoding expert; for OAI
> we accepted the recommendation of our standards office to keep using this
> handful of character entities (e.g. ') in that form. What do others
> think the practice should be on these? They presumably validate against
> the schema because they get through Hussein's Explorer.
>
Thanks for the message. It's **partially** solved now. For the part
solved, it's a bug in program. For the part I did not solve, I need more
test.
Solved part:
The "Entity Reference" is widely used in loc archive, as DOM level 1
specified in
http://www.w3.org/TR/2000/WD-DOM-Level-1-20000929/level-one-core.html#ID-11C98490
, an "Entity Reference" may or may **not** be expanded into Unicode
during parsing, Unfortunately, the parser I am using (java xml parser from
sun), does the expanding randomly, I did not notice this problem before
and treated all as expanding. So the sample URL 1 is working fine after
the bug-fix.
no solved:
In sample2, some expanded "Entity Reference"s are not correctly processed.
I have to do more test.
regards,
liu
> Sample GetRecord URLs that show the issues are:
>
> http://memory.loc.gov/cgi-bin/oai1_0?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lcoa1:loc.music/musdi.213
>
> Title includes apostrophe in d'une
>
> and
>
> http://memory.loc.gov/cgi-bin/oai1_0?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lcoa1:loc.music/musdi.215
>
> 4 special czech characters (regular letters with diacritics)
>
>
> Any thoughts and experiences welcome.
>
> Thanks. Caroline Arms caar@loc.gov
> National Digital Library Program
> Library of Congress
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>