Bug 1171806

Summary: grep matches lowercase when only searching for uppercase range
Product: Red Hat Enterprise Linux 6 Reporter: Filip Krska <fkrska>
Component: grepAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Jan Kepler <jkejda>
Severity: high Docs Contact:
Priority: high    
Version: 6.6CC: dkutalek, jkejda, jskarvad
Target Milestone: rcKeywords: EasyFix, Patch, Reproducer
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: grep-2.20-1.el6 Doc Type: Rebase: Bug Fixes and Enhancements
Doc Text:
Important: if this rebase instead contains *only bug fixes,* or *only enhancements*, select the correct option from the Doc Type drop-down list. Rebase package(s) to version: grep-2.20 Highlights, important fixes, or notable enhancements: Fixed several problems that can lead to crash. Speed-up for various operations, some operations can now be magnitude faster. Recursive grep now uses fts for directory traversal, so it can handle much-larger directories without reporting things like "File name too long", and it can run much faster when dealing with large directory hierarchies.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-22 06:18:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Krska 2014-12-08 16:19:07 UTC
Description of problem:

grep  '^[A-Z]*$' doesn't work as expected with multibyte locale

Version-Release number of selected component (if applicable):

grep-2.6.3-6.el6


How reproducible:

Always

Steps to Reproduce:
1. echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"


Actual results:

$ echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B
b

c
$

Expected results:

$echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B

$

Additional info:

Backport of upstream commit 

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=7311894799583626ec1208039abd13d430c8a452

attached.

Comment 1 Jaroslav Škarvada 2014-12-08 16:32:13 UTC
Not a bug.

grep-2.21:
echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B
a
b

c

This is due to collating of some multi-byte locales, e.g. AaBbCc..., so 'a' matches the [A-B] range. Please use C locale or character classes, e.g.:
$ echo -e "B\na\nb\n\nc"|grep  "^[[:upper:]]*$"
B

For details see grep manual page, section "Character Classes and Bracket Expressions".

Comment 5 Jaroslav Škarvada 2014-12-09 15:41:20 UTC
It's bug:

$ LANG=en_US.UTF-8

$ echo -e 'A\nb' | grep '^[A-Z]*$'
A
b

$ echo -e 'A\nb' | grep '^[A-Z]$'
A

but:

$ LANG=cs_CZ.UTF-8

$ echo -e 'A\nb' | grep '^[A-Z]*$'
A
b

$ echo -e 'A\nb' | grep '^[A-Z]$'
A
b

For given locale the behavior must be consistent for both commands. Addressed by rebase request in bug 1064668.

Comment 6 Jaroslav Škarvada 2015-01-20 15:34:29 UTC
Patch is OK, but unnecessary due to approved rebase request (bug 1064668).

Comment 7 Jaroslav Škarvada 2015-01-29 15:41:27 UTC
It is not reproducible in grep-2.20.

Comment 11 errata-xmlrpc 2015-07-22 06:18:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1447.html