Bug 104540 - UTF8 breaks [^\w] regexp matches
Summary: UTF8 breaks [^\w] regexp matches
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: perl
Version: 9
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-09-16 21:58 UTC by Jamie Zawinski
Modified: 2007-04-18 16:57 UTC (History)
0 users

Fixed In Version: ALL
Clone Of:
Environment:
Last Closed: 2005-11-11 23:43:50 UTC
Embargoed:


Attachments (Terms of Use)

Description Jamie Zawinski 2003-09-16 21:58:52 UTC
If $LANG contains "utf8", then [^\w] doesn't work right:

      setenv LANG en_US
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo | . | bar" (right)


      setenv LANG en_US.utf8
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo.bar" (wrong!)


It works fine in both cases if you do $_ = "foo.bar" instead of reading
the text from stdin.

    This is perl, v5.8.0 built for i386-linux-thread-multi
    (with 1 registered patch, see perl -V for more detail)

    perl-5.8.0-88
    Red Hat Linux release 9 (Shrike)
    Linux 2.4.20-8smp #1 SMP Thu Mar 13 16:43:01 EST 2003 i686 athlon i386

Maybe this is a dup of 102106, I can't tell.

Comment 1 Jason Vas Dias 2005-11-11 23:43:50 UTC
Very sorry for the long delay in processing this bug report.
This bug is no longer a problem with the perl in any current Red Hat OS release.



Note You need to log in before you can comment on or make changes to this bug.