Bug 102106 - [^\s],[^\S],[^\w], etc. don't work with UTF-8 scalars (ok in vanilla perl5.8.0)
[^\s],[^\S],[^\w], etc. don't work with UTF-8 scalars (ok in vanilla perl5.8.0)
Product: Red Hat Linux
Classification: Retired
Component: perl (Show other bugs)
i386 Linux
medium Severity medium
: ---
: ---
Assigned To: Chip Turner
David Lawrence
Depends On:
  Show dependency treegraph
Reported: 2003-08-11 10:15 EDT by Petr Pajas
Modified: 2007-04-18 12:56 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-08-11 10:29:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Petr Pajas 2003-08-11 10:15:15 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718

Description of problem:
Some of RH's patches makes regexp engine broken (compared to
vanilla perl compiled from sources). The following script
should match 'oo' in foo via f([^\s]+). It does, if
"foo" is taken as raw scalar, but fails if taken as UTF-8 flagged scalar.
Similar problems appear with negated character-sets
[^...] and \S,\w,\W. 

use Encode;

my $exp="foo";

# raw variant (ok)
print "(raw):  ";
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";

# utf8 variant (BROKEN!!!)
print "(utf8): ";
my $exp=decode('iso-8859-1',$exp);
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.run the script from Description


Actual Results:  
(raw):  Matches 'oo'
(utf8): no-match

Expected Results:  
(raw):  Matches 'oo'
(utf8): Matches 'oo'

Additional info:

Bug is present in perl shipped with both RedHat Linux 8 and RedHat Linux 9.
Comment 1 Chip Turner 2003-08-11 10:29:50 EDT
I get correct behavior using the perl in rawhide.  can you test that to confirm
it behaves properly for you?
Comment 2 D. Hugh Redelmeier 2003-09-20 21:22:09 EDT
This smells like 104540.

This bug causes the "mirror" program (http://sunsite.org.uk/packages/mirror/) to
fail if LANG includes utf8 (default in RHL9).

Note You need to log in before you can comment on or make changes to this bug.