Bug 1171806

Summary:	grep matches lowercase when only searching for uppercase range
Product:	Red Hat Enterprise Linux 6	Reporter:	Filip Krska <fkrska>
Component:	grep	Assignee:	Jaroslav Škarvada <jskarvad>
Status:	CLOSED ERRATA	QA Contact:	Jan Kepler <jkejda>
Severity:	high	Docs Contact:
Priority:	high
Version:	6.6	CC:	dkutalek, jkejda, jskarvad
Target Milestone:	rc	Keywords:	EasyFix, Patch, Reproducer
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	grep-2.20-1.el6	Doc Type:	Rebase: Bug Fixes and Enhancements
Doc Text:	Important: if this rebase instead contains only bug fixes, or only enhancements, select the correct option from the Doc Type drop-down list. Rebase package(s) to version: grep-2.20 Highlights, important fixes, or notable enhancements: Fixed several problems that can lead to crash. Speed-up for various operations, some operations can now be magnitude faster. Recursive grep now uses fts for directory traversal, so it can handle much-larger directories without reporting things like "File name too long", and it can run much faster when dealing with large directory hierarchies.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2015-07-22 06:18:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Filip Krska 2014-12-08 16:19:07 UTC

Description of problem:

grep  '^[A-Z]*$' doesn't work as expected with multibyte locale

Version-Release number of selected component (if applicable):

grep-2.6.3-6.el6


How reproducible:

Always

Steps to Reproduce:
1. echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"


Actual results:

$ echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B
b

c
$

Expected results:

$echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B

$

Additional info:

Backport of upstream commit 

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=7311894799583626ec1208039abd13d430c8a452

attached.

Comment 1 Jaroslav Škarvada 2014-12-08 16:32:13 UTC

Not a bug.

grep-2.21:
echo -e "B\na\nb\n\nc"|grep  "^[A-Z]*$"
B
a
b

c

This is due to collating of some multi-byte locales, e.g. AaBbCc..., so 'a' matches the [A-B] range. Please use C locale or character classes, e.g.:
$ echo -e "B\na\nb\n\nc"|grep  "^[[:upper:]]*$"
B

For details see grep manual page, section "Character Classes and Bracket Expressions".

Comment 5 Jaroslav Škarvada 2014-12-09 15:41:20 UTC

It's bug:

$ LANG=en_US.UTF-8

$ echo -e 'A\nb' | grep '^[A-Z]*$'
A
b

$ echo -e 'A\nb' | grep '^[A-Z]$'
A

but:

$ LANG=cs_CZ.UTF-8

$ echo -e 'A\nb' | grep '^[A-Z]*$'
A
b

$ echo -e 'A\nb' | grep '^[A-Z]$'
A
b

For given locale the behavior must be consistent for both commands. Addressed by rebase request in bug 1064668.

Comment 6 Jaroslav Škarvada 2015-01-20 15:34:29 UTC

Patch is OK, but unnecessary due to approved rebase request (bug 1064668).

Comment 7 Jaroslav Škarvada 2015-01-29 15:41:27 UTC

It is not reproducible in grep-2.20.

Comment 11 errata-xmlrpc 2015-07-22 06:18:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1447.html