Bug 104540 - UTF8 breaks [^\w] regexp matches
Summary: UTF8 breaks [^\w] regexp matches
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: perl   
(Show other bugs)
Version: 9
Hardware: i386
OS: Linux
Target Milestone: ---
Assignee: Jason Vas Dias
QA Contact: David Lawrence
Depends On:
TreeView+ depends on / blocked
Reported: 2003-09-16 21:58 UTC by Jamie Zawinski
Modified: 2007-04-18 16:57 UTC (History)
0 users

Fixed In Version: ALL
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2005-11-11 23:43:50 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description Jamie Zawinski 2003-09-16 21:58:52 UTC
If $LANG contains "utf8", then [^\w] doesn't work right:

      setenv LANG en_US
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo | . | bar" (right)

      setenv LANG en_US.utf8
      echo -n "foo.bar" | \
      perl -e '$_ = <>; print join (" | ", split (/([^\w]+)/)) . "\n";'

            ===> "foo.bar" (wrong!)

It works fine in both cases if you do $_ = "foo.bar" instead of reading
the text from stdin.

    This is perl, v5.8.0 built for i386-linux-thread-multi
    (with 1 registered patch, see perl -V for more detail)

    Red Hat Linux release 9 (Shrike)
    Linux 2.4.20-8smp #1 SMP Thu Mar 13 16:43:01 EST 2003 i686 athlon i386

Maybe this is a dup of 102106, I can't tell.

Comment 1 Jason Vas Dias 2005-11-11 23:43:50 UTC
Very sorry for the long delay in processing this bug report.
This bug is no longer a problem with the perl in any current Red Hat OS release.

Note You need to log in before you can comment on or make changes to this bug.