Bug 104540 - UTF8 breaks [^\w] regexp matches
UTF8 breaks [^\w] regexp matches
Product: Red Hat Linux
Classification: Retired
Component: perl (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Jason Vas Dias
David Lawrence
Depends On:
  Show dependency treegraph
Reported: 2003-09-16 17:58 EDT by Jamie Zawinski
Modified: 2007-04-18 12:57 EDT (History)
0 users

See Also:
Fixed In Version: ALL
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-11-11 18:43:50 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Jamie Zawinski 2003-09-16 17:58:52 EDT
If $LANG contains "utf8", then [^\w] doesn't work right:

      setenv LANG en_US
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo | . | bar" (right)

      setenv LANG en_US.utf8
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo.bar" (wrong!)

It works fine in both cases if you do $_ = "foo.bar" instead of reading
the text from stdin.

    This is perl, v5.8.0 built for i386-linux-thread-multi
    (with 1 registered patch, see perl -V for more detail)

    Red Hat Linux release 9 (Shrike)
    Linux 2.4.20-8smp #1 SMP Thu Mar 13 16:43:01 EST 2003 i686 athlon i386

Maybe this is a dup of 102106, I can't tell.
Comment 1 Jason Vas Dias 2005-11-11 18:43:50 EST
Very sorry for the long delay in processing this bug report.
This bug is no longer a problem with the perl in any current Red Hat OS release.

Note You need to log in before you can comment on or make changes to this bug.