[OAI-implementers] Perl (5.005), utf-8 and special characters:
is this a FAQ?
Simeon Warner
simeon@cs.cornell.edu
Thu, 26 Dec 2002 18:35:50 -0500 (EST)
On Fri, 27 Dec 2002, Marin Balgarensky wrote:
> Hi all,
>
> first of all, best wishes for the New Year!
>
> Can anybody tell me how to handle special characters like ö
> in the XML output?
The appropriate UTF-8 respresentation of the character should be used.
> I thought Perl and the XML::Writer are doing
> the conversion automatically, but for now I am getting the error:
XML::Writer just escapes the characters that are special in XML
(that is & < > "). Everything else is expected to in the appropriate
character encoding already.
> An invalid XML character (Unicode: 0x1b2ea5) was found in the element
> content of the document.
I have no idea how you might get character 0x1b2ea5 -- this is not a
valid Unicode character. ö should be 0xf6
(from: http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)
00F6;LATIN SMALL LETTER O WITH DIAERESIS
My guess is that you are not correctly producing UTF-8 from whatever
source documents you have (which like use some other character encoding).
Cheers,
Simeon.
> respectively in IE:
>
> An Invalid character was found in text content. Line 15, Position 28
>
> <dc:creator>Tanja A. B?l</dc:creator>
> ---------------------------^
>
>
> The question mark is supposed to be the german o with the two dots...
>
> If I encode those characters as HTML entities than they are not
> interpreted correctly by the reading program because the ampersands
> are escaped with &.
>
> For now I am using this aproach. It is not quite correct but at least
> is readable without errors...
>
> Any help very appreciated,
> Marin
>
>