Bug 799863

Summary: inconsistent \w and [[:alnum:]] behaviour
Product: Red Hat Enterprise Linux 6 Reporter: Lukas Zachar <lzachar>
Component: grepAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Jan Kepler <jkejda>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.2CC: dkutalek, jkejda
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: grep-2.20-1.el6 Doc Type: Bug Fix
Doc Text:
Cause: Previously the behavior of \w and \W symbols in regular expressions were inconsistent with the behavior of [:alnum:] character class. Consequence: Some expressions could cause incorrect match/non-match. Fix: Upstream fix was backported that makes the behavior consistent. Result: Now \w is synonym for [_[:alnum:]] and \W for [^_[:alnum:]].
Story Points: ---
Clone Of:
: 1159012 1187243 (view as bug list) Environment:
Last Closed: 2015-07-22 06:17:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1159012, 1187243    
Attachments:
Description Flags
Backported fix
none
grep-2.20 fix none

Description Lukas Zachar 2012-03-05 10:02:23 UTC
Description of problem:

\w and [[:alnum:]] seems to match different set of characters:

$ echo 'á' | grep '\w'
$ echo 'á' | grep '[[:alnum:]]'
$ á

Their negations are inconsistent as well:

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '\W'
$ á

This doesn't seem to be problem of a locale (I tried it with the en_US.UTF-8 and cs_CZ.UTF-8, both made the same results).

Affected are accented characters Á Č Ď É Ě Í Ň Ó Ř Š Ť Ú Ů Ý Ž (I have tested just these)


Version-Release number of selected component (if applicable):
grep-2.6.3-2

How reproducible:
always

Steps to Reproduce:
1. echo 'á' | grep '\w'
2.
3.
  
Actual results:
-empty-

Expected results:
á

Additional info:

Comment 2 Jaroslav Škarvada 2012-03-20 15:37:46 UTC
Upstream ticket:
http://savannah.gnu.org/bugs/?19637

Also reproducible with latest grep-2.11.

Comment 3 RHEL Product and Program Management 2012-09-07 05:04:46 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 4 Jaroslav Škarvada 2014-10-30 16:57:42 UTC
Created attachment 952258 [details]
Backported fix

Comment 6 Jaroslav Škarvada 2014-10-30 17:00:29 UTC
This can be also resolved by rebase to grep > 2.20.

Comment 7 Jaroslav Škarvada 2015-01-29 14:36:03 UTC
Created attachment 985631 [details]
grep-2.20 fix

(In reply to Jaroslav Škarvada from comment #6)
> This can be also resolved by rebase to grep > 2.20.

This is now preferred way, patch for grep-2.20 is attached.

Comment 8 Jaroslav Škarvada 2015-01-29 15:37:21 UTC
RHEL-7 is also affected by this, thus cloning to RHEL-7, not to have regression there.

Comment 12 errata-xmlrpc 2015-07-22 06:17:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1447.html