Bug 75409

Summary: With LANG=en_US.UTF-8, Locale::Language breaks
Product: [Retired] Red Hat Linux Reporter: Mathieu Chouquet-Stringer <mathieu-acct>
Component: perlAssignee: Chip Turner <cturner>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: franz.sirl-kernel, toniw
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-03-08 19:42:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mathieu Chouquet-Stringer 2002-10-08 04:45:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 Galeon/1.2.6 (X11; Linux i686; U;) Gecko/20020827

Description of problem:
Including Locale::Language in your perl program while using en_US.UTF-8 as the
LANG gives you some warnings.

Version-Release number of selected component (if applicable):
5.8.0-55

How reproducible:
Always

Steps to Reproduce:
1.LANG=en_US.UTF-8 perl -we 'use Locale::Language'


Actual Results:  
Malformed UTF-8 character (unexpected end of string) at
/usr/lib/perl5/5.8.0/Locale/Language.pm line 115, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at
/usr/lib/perl5/5.8.0/Locale/Language.pm line 117, <DATA> line 109.
Malformed UTF-8 character (unexpected non-continuation byte 0x6c, immediately
after start byte 0xe5) in lc at /usr/lib/perl5/5.8.0/Locale/Language.pm line
117, <DATA> line 109.
Malformed UTF-8 character (unexpected end of string) at
/usr/lib/perl5/5.8.0/Locale/Language.pm line 115, <DATA> line 178.
Malformed UTF-8 character (unexpected end of string) at
/usr/lib/perl5/5.8.0/Locale/Language.pm line 117, <DATA> line 178.
Malformed UTF-8 character (unexpected non-continuation byte 0x6b, immediately
after start byte 0xfc) in lc at /usr/lib/perl5/5.8.0/Locale/Language.pm line
117, <DATA> line 178.


Expected Results:  You shouldn't see any errors (undefine LANG and re-run the
same command).

Additional info:

The good news is:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=rt-17439-38139.10.5873486677133%40bugs6.perl.org&rnum=3&prev=/groups%3Fq%3DMalformed%2BUTF-8%2Bcharacter%2B(unexpected%2Bend%2Bof%2Bstring)%26meta%3Dsite%253Dgroups

The patch included has been applied to perl but I can't verify because I don't
have a login to perl.org.

Actually I found this bug while dealing with another one, more on that latter
because I didn't find the culprit and can't submit an incomplete bug.

Comment 1 Toni Willberg 2002-11-06 18:46:42 UTC
This is very urgent issue.

There's working patch already, I suggest RedHat publishing update quite soon.

I have default set up of RH8.0, and I ran into this problem with trying my www
perl script on this box, and it yells errors.




Comment 2 Chip Turner 2002-11-06 20:24:48 UTC
I have integrated the patch from upstream (perl change 17927).  There are other
issues preventing an immediate errata of Perl itself, however.  If you would
like to test a candidate package, it can be arranged, but please be aware it
would be unsupported.

Comment 3 Chip Turner 2002-12-15 23:21:35 UTC
a package fixing this and other utf8 issues should be in rawhide soon (and
should recompile on stock 8.0 with no trouble).

Comment 4 franz.sirl-kernel 2003-01-08 20:03:29 UTC
Now with recent RawHide (I tested -79, -81, -82) regexps with UTF-8 seem to fail:

mirror@entropy home]$ LANG=en_US mirror packages/rawhide.srpm
package=RawHide alviss.et.tudelft.nl:/pub/redhat/rawhide/SRPMS/SRPMS ->
/usr/local/mirror/redhat/rawhide/SRPMS/SRPMS
No files to transfer
[mirror@entropy home]$ LANG=en_US.UTF-8 mirror packages/rawhide.srpm
unknown input in "/etc/mirror.defaults" line 10 of: package=defaults
unknown keyword in "/etc/mirror.defaults" line 10 of:
[mirror@entropy home]$ rpm -q mirror
mirror-2.9-11
[mirror@entropy home]$ rpm -q perl
perl-5.8.0-82

The regexp in question is: /^\s*([^\s=+]+)\s*([=+])(.*)?$/

If I go back to perl-5.8.0-73 all is fine again.


Comment 5 Milan Kerslager 2003-12-17 23:06:21 UTC
See the bug #82652 for a patch.

Comment 6 Miloslav Trmac 2004-03-08 19:42:11 UTC
Fix confirmed in perl-5.8.3-10