Bug 102106

Summary: [^\s],[^\S],[^\w], etc. don't work with UTF-8 scalars (ok in vanilla perl5.8.0)
Product: [Retired] Red Hat Linux Reporter: Petr Pajas <pajas>
Component: perlAssignee: Chip Turner <cturner>
Status: CLOSED RAWHIDE QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 9CC: hugh
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-08-11 14:29:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Petr Pajas 2003-08-11 14:15:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718

Description of problem:
Some of RH's patches makes regexp engine broken (compared to
vanilla perl compiled from sources). The following script
should match 'oo' in foo via f([^\s]+). It does, if
"foo" is taken as raw scalar, but fails if taken as UTF-8 flagged scalar.
Similar problems appear with negated character-sets
[^...] and \S,\w,\W. 

#!/usr/bin/perl
use Encode;

my $exp="foo";

# raw variant (ok)
print "(raw):  ";
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";

# utf8 variant (BROKEN!!!)
print "(utf8): ";
my $exp=decode('iso-8859-1',$exp);
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";



Version-Release number of selected component (if applicable):
perl-5.8.0-88

How reproducible:
Always

Steps to Reproduce:
1.run the script from Description

    

Actual Results:  
(raw):  Matches 'oo'
(utf8): no-match


Expected Results:  
(raw):  Matches 'oo'
(utf8): Matches 'oo'

Additional info:

Bug is present in perl shipped with both RedHat Linux 8 and RedHat Linux 9.

Comment 1 Chip Turner 2003-08-11 14:29:50 UTC
I get correct behavior using the perl in rawhide.  can you test that to confirm
it behaves properly for you?

Comment 2 D. Hugh Redelmeier 2003-09-21 01:22:09 UTC
This smells like 104540.

This bug causes the "mirror" program (http://sunsite.org.uk/packages/mirror/) to
fail if LANG includes utf8 (default in RHL9).