[OAI-implementers] character vs entity references
Todd White
tmwhite@merit.edu
Wed, 5 Nov 2003 08:37:09 -0500 (EST)
since i've been "bugging" the list with my recent questions about
character encoding, i thought i would share the current solution that i've
implemented in our OAI repository. it's one line and i added it to a
function that i had already implemented for processing all data as it
passes from database to XML...
$str =~ s/([^ -~])/'&#' . ord($1) . ';'/eg;
this looks for any characters outside of the range from [space] to [tilde]
and transforms each to its proper character reference. for example, if an
n-tilde is encountered, it is transformed into ñ
thanks for the help many of you provided!
On Tue, 4 Nov 2003, Ed Summers wrote:
> On Tue, Nov 04, 2003 at 09:58:55AM -0500, Todd White wrote:
> > $string =~ tr/\0-\x{ff}//UC;
>
> Search for tr/ in the following pages for some fun Perl archaeology.
>
> http://www.perldoc.com/perl5.005_03/pod/perlop.html
> http://www.perldoc.com/perl5.6.0/pod/perlop.html
> http://www.perldoc.com/perl5.6.1/pod/perlop.html
>
> You can see the UC modifiers were introduced in 5.6.0 and quickly
> dropped in 5.6.1 (and in versions thereafter). 5.6.0 is a notoriously
> buggy release, I think in part because of it's UTF8 handling. These
> problems have been fixed in versions >= 5.8.0, which is the first
> recommended release of Perl for safely working with UTF8.
>
> Funny, I always thought Perl held backwards compatability sacrosanct...
> not including Perl6 of course :)
>
> You might be interested in this list for Perl library folks:
> http://perl4lib.perl.org for discussion of Perl esoterica and more.
>
> //Ed
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>