Bug 102106 - [^\s],[^\S],[^\w], etc. don't work with UTF-8 scalars (ok in vanilla perl5.8.0)
Summary: [^\s],[^\S],[^\w], etc. don't work with UTF-8 scalars (ok in vanilla perl5.8.0)
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: perl
Version: 9
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Chip Turner
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-08-11 14:15 UTC by Petr Pajas
Modified: 2007-04-18 16:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-08-11 14:29:50 UTC
Embargoed:


Attachments (Terms of Use)

Description Petr Pajas 2003-08-11 14:15:15 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5a) Gecko/20030718

Description of problem:
Some of RH's patches makes regexp engine broken (compared to
vanilla perl compiled from sources). The following script
should match 'oo' in foo via f([^\s]+). It does, if
"foo" is taken as raw scalar, but fails if taken as UTF-8 flagged scalar.
Similar problems appear with negated character-sets
[^...] and \S,\w,\W. 

#!/usr/bin/perl
use Encode;

my $exp="foo";

# raw variant (ok)
print "(raw):  ";
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";

# utf8 variant (BROKEN!!!)
print "(utf8): ";
my $exp=decode('iso-8859-1',$exp);
print $exp=~/f([^\s]+)/ ? "Matches '$1'\n" : "no match\n";



Version-Release number of selected component (if applicable):
perl-5.8.0-88

How reproducible:
Always

Steps to Reproduce:
1.run the script from Description

    

Actual Results:  
(raw):  Matches 'oo'
(utf8): no-match


Expected Results:  
(raw):  Matches 'oo'
(utf8): Matches 'oo'

Additional info:

Bug is present in perl shipped with both RedHat Linux 8 and RedHat Linux 9.

Comment 1 Chip Turner 2003-08-11 14:29:50 UTC
I get correct behavior using the perl in rawhide.  can you test that to confirm
it behaves properly for you?

Comment 2 D. Hugh Redelmeier 2003-09-21 01:22:09 UTC
This smells like 104540.

This bug causes the "mirror" program (http://sunsite.org.uk/packages/mirror/) to
fail if LANG includes utf8 (default in RHL9).


Note You need to log in before you can comment on or make changes to this bug.