[OAI-implementers] Perl (5.005), utf-8 and special characters: is this a FAQ?
Marin Balgarensky
marinb@gmx.net
Fri, 27 Dec 2002 04:25:26 +0100 (MET)
Thanks Simeon,
> XML::Writer just escapes the characters that are special in XML
> (that is & < > "). Everything else is expected to in the appropriate
> character encoding already.
This is my first experience with the writer, but it seems to do its
work very well...
>
> > An invalid XML character (Unicode: 0x1b2ea5) was found in the element
> > content of the document.
>
> I have no idea how you might get character 0x1b2ea5 -- this is not a
> valid Unicode character. ö should be 0xf6
Neither me. And you are right, it's 0xf6.
> (from: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)
> 00F6;LATIN SMALL LETTER O WITH DIAERESIS
>
> My guess is that you are not correctly producing UTF-8 from whatever
> source documents you have (which like use some other character encoding).
Correct. I was to naive, thinking that perl or the xml::writer will do
the conversion for me... I am now using the Unicode::String module
and it seems to work as expected...
Thanks again,
Marin
--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!