Bug 136292 - wrong handling of regex on UTF-8 locale
Summary: wrong handling of regex on UTF-8 locale
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: grep   
(Show other bugs)
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Tim Waugh
QA Contact: Mike McLean
URL:
Whiteboard:
Keywords: i18n
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2004-10-19 08:28 UTC by Akira TAGOH
Modified: 2007-11-30 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-10-19 08:50:00 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Akira TAGOH 2004-10-19 08:28:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; ja-JP; rv:1.7.3)
Gecko/20041007 Debian/1.7.3-5

Description of problem:
the handling of character range for regex seems wrong on UTF-8 locale.

Version-Release number of selected component (if applicable):
grep-2.5.1-31

How reproducible:
Always

Steps to Reproduce:
1.echo A | LANG=en_US.UTF-8 grep '[a-z]'
2.echo A | LANG=C grep '[a-z]'
3.
    

Actual Results:  output 'A' on en_US.UTF-8

Expected Results:  it should be nothing to output

Additional info:

character range seems to be a A b B ... by wcscoll in dfa.c on UTF-8
locale.

for another testcase,

[tagoh@devel02 ~]$ echo A | LANG=ja_JP.UTF-8 grep '[a-z]'
[tagoh@devel02 ~]$ echo A | LANG=ko_KR.UTF-8 grep '[a-z]'
[tagoh@devel02 ~]$ echo A | LANG=zh_CN.UTF-8 grep '[a-z]'
A
[tagoh@devel02 ~]$ echo A | LANG=zh_TW.UTF-8 grep '[a-z]'
A

Comment 1 Tim Waugh 2004-10-19 08:50:00 UTC
No, you meant to use '[[:lower:]]'. [a-z] is never what you want unless you are
in the C locale.


Note You need to log in before you can comment on or make changes to this bug.