Bug 475250

Summary: perl-XML-SAX does not properly decode UTF-8 characters
Product: Red Hat Enterprise Linux 5 Reporter: Trevin Beattie <tbeattie>
Component: perl-XML-SAXAssignee: Marcela Mašláňová <mmaslano>
Status: CLOSED ERRATA QA Contact: BaseOS QE <qe-baseos-auto>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: ovasik, psplicha, robin.norwood
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-01-06 08:57:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Perl script which demonstrates the bug
none
Input file for the test script none

Description Trevin Beattie 2008-12-08 17:23:40 UTC
Description of problem:
I have a Perl module which reads a language translation file in XML with UTF-8 encoding and performs string substitution on a template.  The module uses XML::Simple, which in turn utilizes XML::SAX.  Our current production systems are running RHEL 3 with perl-XML-Simple-2.14-2 and perl-XML-SAX-0.12-1 (the latter built in-house).  In testing our web site on RHEL 5.3 beta with perl-XML-Simple-2.14-4.fc6 and perl-XML-SAX-0.14-5, I see that all non-ASCII characters on the translated pages come out as garbage.

If I simply replace the Red Hat perl-XML-SAX-0.14 package with our older 0.12 package, the page then renders correctly.

However, I also have a system running Fedora Core 6 which is using the same perl-XML-Simple-2.14-4.fc6 package and perl-XML-SAX-0.14-2.  On this system the test code I wrote writes out correct UTF-8 characters, so there must be some subtle difference between FC6 and EL5 which would account for the error.

Version-Release number of selected component (if applicable):
perl-XML-SAX-0.14-5

How reproducible:
Every time.

Steps to Reproduce:
1. Run the attached Perl script as follows:
   test-XML-SAX-0.13.pl test-XML-SAX-0.13.xml test-XML-SAX-0.13.out
2. Examine the output file:
   cat test-XML-SAX-0.13.out
3. Check the encoding by passing the output file to hexdump:
   hexdump -C test-XML-SAX-0.13.out
  
Actual results (hexdump on left, output on right):
00000000  c3 a9 c2 83 c2 bd c3 a5  c2 b8 c2 82 0a           |é½å¸.|

Expected results:
00000000  e9 83 bd e5 b8 82 0a                              |都市.|

Additional info:
This seems to have been broken at version 0.13 (introduced by update request in bug #176161):
http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2006-03/msg02219.html

Comment 1 Trevin Beattie 2008-12-08 17:24:44 UTC
Created attachment 326150 [details]
Perl script which demonstrates the bug

Comment 2 Trevin Beattie 2008-12-08 17:25:14 UTC
Created attachment 326151 [details]
Input file for the test script

Comment 3 Trevin Beattie 2008-12-10 21:38:34 UTC
I upgraded perl-XML-SAX on my Fedora Core 6 system to version 0.14-5 from the Red Hat EL 5.3b distribution, and the test script still runs correctly.

I then upgraded perl itself to version 5.8.8-18.el5, and it still runs correctly.

Given that, I can't say whether the perl-XML-SAX package is the source of the bug.  As both systems now have the exact same perl, perl-XML-Simple, and perl-XML-SAX packages, and the latter packages are pure Perl code, what could possibly be different between FC6 and EL5 that would cause the script output to differ?

Comment 4 Marcela Mašláňová 2008-12-11 14:26:40 UTC
Well, there were problems with scriptlets, which are installing ParserDetails.ini. Those scriptlets aren't in RHEL-5, but it is in FC-6. That's only difference between XML::SAX modules. The scriptlets are quite problematic, because they are probably reason of problematic updates from RHEL-4 to RHEL-5. I'll be working on fix.

Comment 5 Trevin Beattie 2008-12-11 16:24:42 UTC
That must be a ghost file, because even though rpm says it belongs to perl-XML-SAX, it isn't part of the package's file listing.

I see this morning that the contents of the file are different between my two systems, but only in the order in which the sections are defined.  Here's the file from FC6:

[XML::SAX::PurePerl]
http://xml.org/sax/features/namespaces = 1

[XML::LibXML::SAX::Parser]
http://xml.org/sax/features/namespaces = 1

[XML::LibXML::SAX]
http://xml.org/sax/features/namespaces = 1


And here is the file from EL5:

[XML::LibXML::SAX::Parser]
http://xml.org/sax/features/namespaces = 1

[XML::LibXML::SAX]
http://xml.org/sax/features/namespaces = 1

[XML::SAX::PurePerl]
http://xml.org/sax/features/namespaces = 1


I did notice that after I downgraded perl-XML-SAX to version 0.12 and then upgraded back to 0.14 again, ParserDetails.ini had disappeared.  I noticed because my program would not run at all -- I got the error "could not find ParserDetails.ini in /usr/lib/perl5/vendor_perl/5.8.8/XML/SAX".  I had to completely remove and then re-install the package to fix that little problem.

I'm able to confirm that the order of entries in ParserDetails.ini *does* make a difference!  When I swapped this file between the two systems, the test script broke on FC6 and worked properly on EL5.

Comment 6 Marcela Mašláňová 2008-12-12 11:42:00 UTC
So the missing file ParserDetails.ini is one bug filed in rhbz as #289061.

The utf8 problem was fixed in the latest version of XML::SAX as upstream bug http://rt.cpan.org/Public/Bug/Display.html?id=26588 It's regression to 0.12.

Comment 14 errata-xmlrpc 2010-01-06 08:57:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0008.html

Comment 15 Trevin Beattie 2010-01-12 23:38:27 UTC
After upgrading perl-XML-LibXML-1.58-6.i386.rpm and perl-XML-SAX-0.14-8.noarch.rpm:

[tbeattie@admin tmp]$ ./test-XML-SAX-0.13.pl test-XML-SAX-0.13.xml test-XML-SAX-0.13.out
could not find ParserDetails.ini in /usr/lib/perl5/vendor_perl/5.8.8/XML/SAX

but the output file was correctly encoded.

After removing and cleanly re-installing the packages, the test script ran without any errors.