Bug 112339 - Perl is unable to work with regex under UTF-8 locale with input readed from files
Summary: Perl is unable to work with regex under UTF-8 locale with input readed from f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: perl
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks: 187539
TreeView+ depends on / blocked
 
Reported: 2003-12-17 23:19 UTC by Milan Kerslager
Modified: 2007-11-30 22:06 UTC (History)
7 users (show)

Fixed In Version: RHBA-2006-0294
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-05-10 21:55:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Upstream patch 18608 (1.96 KB, patch)
2004-03-28 22:14 UTC, Bernd Schmidt
no flags Details | Diff
Upstream patch 18609 (1.00 KB, patch)
2004-03-28 22:15 UTC, Bernd Schmidt
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2004:391 0 normal SHIPPED_LIVE Updated perl packages 2004-09-02 04:00:00 UTC
Red Hat Product Errata RHBA-2006:0294 0 normal SHIPPED_LIVE perl bug fix update 2006-07-19 19:03:00 UTC

Description Milan Kerslager 2003-12-17 23:19:44 UTC
There is no chance to compare strings readed from files when any UTF-8
locale is set. Since UTF-8 is the default since RHL 8.0, every Perl
script should be fixed by adding this line at the beggining:

exec 'env', 'LANG=C', $0, @ARGV unless $ENV{"LANG"} eq "C";

It seems that later Perl version than is in RHEL 3 have this issue
fixed, see bug #82652 for more info. The line above set C locale for
perl script by reexecing it.

Comment 1 Milan Kerslager 2004-01-04 07:33:43 UTC
This bug is real candidate for RHEL3 errata IMHO.

Test case from bug #82652 (using UTF-8 locale [cs_CZ.UTF-8]):

echo abc | perl -ne 'use locale;print if /[^\s]+/'

Actual output: (nothing)
Expected output: abc

This test works when export LANG=C is used.

Comment 2 Howard Owen 2004-03-07 04:28:44 UTC
This bug is breaking scripts at Cisco

Comment 3 grosen 2004-03-12 23:04:39 UTC
Still breaks anything that tries to do a moderately complicated perl
Makefile.PL. (SpamAssassin--any version, have fun!--is a good example.)

This has been a problem for about a year now, and bugs keep getting
closed citing RAWHIDE.

Could we maybe have a fix in the version of the OS we've already paid
for? Thanks.

Comment 4 Bernd Schmidt 2004-03-28 22:13:31 UTC
I've done a bit of archeology, and found upstream patches #18608 and
#18609 which seem to fix it.  I'll append them, they can also be found
at http://www.nntp.perl.org/group/perl.perl5.changes/6597 and
http://www.nntp.perl.org/group/perl.perl5.changes/6598.

Comment 5 Bernd Schmidt 2004-03-28 22:14:36 UTC
Created attachment 98912 [details]
Upstream patch 18608

Comment 6 Bernd Schmidt 2004-03-28 22:15:16 UTC
Created attachment 98913 [details]
Upstream patch 18609

Comment 7 Mimmus 2004-04-14 08:38:03 UTC
Two source patches are not useful for me, I'd like to keep a RPM system.

Could we maybe have a fix in the version of the OS we've already paid
for? Thanks.

Comment 8 Milan Kerslager 2004-04-14 14:28:36 UTC
I builded testing packages (perl-*5.8.0-88.4.ker.rhel3) with two
patches from this page:

ftp://ftp.vslib.cz/pub/local/milan.kerslager/RHEL-3/test/
ftp://ftp.linux.cz/pub/linux/people/milan_kerslager/RHEL-3/test/

At least, test from comment #1 works now. I did not made more testing yet.

Comment 10 Chip Turner 2004-06-10 11:42:34 UTC
the plan to errata to 5.8.3 or 5.8.4 is on hold.  I'll look into
merging select patches back into 5.8.0, including the two here.

Comment 11 Chip Turner 2004-06-28 19:23:54 UTC
perl 5.8.0 with these two patches applied should be available in U3.  look for perl 5.8.0
-88.7 or higher, it should include this fix.

Comment 12 John Flanagan 2004-09-02 06:06:05 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-391.html


Comment 13 Jonathan Manning 2004-12-06 18:17:01 UTC
I'm using perl-5.8.0-88.7 on RHEL3 ES and I am still hit with this bug
when installing certain CPAN modules. By default LANG=en_US.UTF-8, and
setting it to LANG=C or LANG=en_US fixes the problem.

I'm using the ERRATA version, so why is this still present?

perl -MCPAN -e shell
install Storable
install ExtUtils::MakeMaker
install Test::Simple
install Digest::MD5

Errors:
Some occur during Makefile.PL processing and creating the Makefile
(makefile clearly has corruption/errors).

Writing Makefile for Digest::MD5
Makefile:45: *** missing separator.  Stop.

Others appear during make test time.

Setting LANG to C or en_US causes everything to work perfectly.






Comment 14 Jay Turner 2005-06-20 14:44:27 UTC
Reopening based on comment 13.

Comment 15 James Black 2005-08-11 01:52:55 UTC
I am using perl-5.8.0-89.10 on RHEL3 ES, i also still have this issue. I have
changed my LANG=en_US. It still crashs during Makefile.PL on DBD::mysql.

dbdimp.c: In function `mysql_db_quote':
dbdimp.c:3865: dereferencing pointer to incomplete type
make: *** [dbdimp.o] Error 1

Any help/fix would be good....

Comment 16 Jason Vas Dias 2005-11-10 21:09:42 UTC
The problems described in this bug report about bad Makefiles should be fixed
with perl-5.8.0-90.2, which backports some of the UTF support from 5.8.7. 

Also, DBD-mysql's Makefile.PL now specifically tests for perl-5.8.0 and 
sets $ENV{LANG}='C' if so, so its CPAN build now succeeds on RHEL-3.

It's like this: PERL's unicode/UTF-8 support is still work-in-progress in the 
bleadperl 5.9.3 release, and still has serious bugs there: eg perl bug 37646 
( which I raised for RHEL-3 bug #135978, and am endeavouring to fix ).
 
PERL's unicode/UTF-8 support affects almost every aspect of the whole perl 
system, as the upstream maintainers were aiming at seamless compatibility
with byte mode operation (but they have failed as yet). 

PERL 5.8.0 - 5.8.2 had a terminally broken unicode/UTF-8 implementation, that
was unfortunately enabled by default in those releases when a UTF-8 locale
is in effect, regardless of any 'use locale;'. RHEL-3 ships with a UTF-8
locale by default (en_US.UTF-8), so this broken perl unicode/UTF-8 support
is enabled by default in RHEL-3.

Somewhere around 5.8.3 - 5.8.4, the upstream perl maintainers wisely decided 
to disable unicode by default, UNLESS there was a "PERL_UNICODE" environment
variable OR the -C option was given to perl .

perl 5.8.5 - 5.8.7 saw vast improvements in the unicode/UTF-8 support, but
still there are major problems with it.

While I am trying to fix all the outstanding RHEL-3 UTF bugs, eg. bug 122378,
there is no fully working perl unicode/UTF-8 support in the latest upstream
release.

So now we are in a bind with RHEL-3 perl : one set of customers who want UTF-8
support have raised bugs complaining that the UTF-8 support doesn't work
properly, and another set of customers who don't want UTF-8 support complain
that UTF-8 support breaks seemingly non-UTF related features.

So we can go the upstream way, meaning disabling the broken UTF-8 support by
default, which would suddenly require all RHEL customers who want UTF support
to explicitly enable it, or we can try to fix the existing UTF support - making
it even better than it is in the upstream bleadperl release - I will try, but
this is a large, non-trivial development effort.

With RHEL releases, we are committed to introducing no functionality 
changes in update releases - we can only fix existing functionality
where it is broken.

So it looks like we'll have to continue supporting enabling of the
perl in unicode/UTF-8 mode by default, unless we can upgrade PERL
wholesale to the upstream 5.8.7 release, which it is unlikely 
would be allowed.

Perhaps a way around this issue would to be to provide support for a 
'PERL_DISABLE_UNICODE' environment variable, which could be set in 
/etc/profile or  users' rc files, which would engage the upstream 
behaviour: do not enable unicode support unless the 'PERL_UNICODE'
environment variable is set; ie. with no *UNICODE environment variable 
settings, (the default), the behaviour would remain as it is today,
with unicode enabled by default. With PERL_DISABLE_UNICODE set, users
would have to explicitly set 'PERL_UNICODE', or 'use locale;' / 'use utf8;'
to enable it.

So I'll try to fix perl unicode/UTF-8 support in RHEL-3, but perhaps one
temporary solution would be to provide a way of disabling it globally ,
as this is what is done in the upstream version . 

Any ideas / opinions on this issue would be gratefully received.

Comment 17 Jason Vas Dias 2006-03-24 17:42:15 UTC
The latest perl-5.8.0-92.0 version fixes the remaining UTF-8 issue ( Bug 122378 ),
which should be a candidate for inclusion in RHEL-3-U8, and is available from :
  http://people.redhat.com/~jvdias/perl/RHEL-3
Now the install of the CPAN modules listed above completes OK,
so I think this bug can now be closed.

Comment 25 Red Hat Bugzilla 2006-05-10 21:55:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2006-0294.html



Note You need to log in before you can comment on or make changes to this bug.