Bug 799863 - inconsistent \w and [[:alnum:]] behaviour
Summary: inconsistent \w and [[:alnum:]] behaviour
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: grep
Version: 6.2
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: Jan Kepler
URL:
Whiteboard:
Depends On:
Blocks: 1159012 1187243
TreeView+ depends on / blocked
 
Reported: 2012-03-05 10:02 UTC by Lukas Zachar
Modified: 2015-07-22 06:17 UTC (History)
2 users (show)

Fixed In Version: grep-2.20-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: Previously the behavior of \w and \W symbols in regular expressions were inconsistent with the behavior of [:alnum:] character class. Consequence: Some expressions could cause incorrect match/non-match. Fix: Upstream fix was backported that makes the behavior consistent. Result: Now \w is synonym for [_[:alnum:]] and \W for [^_[:alnum:]].
Clone Of:
: 1159012 1187243 (view as bug list)
Environment:
Last Closed: 2015-07-22 06:17:47 UTC


Attachments (Terms of Use)
Backported fix (3.30 KB, patch)
2014-10-30 16:57 UTC, Jaroslav Škarvada
no flags Details | Diff
grep-2.20 fix (3.89 KB, patch)
2015-01-29 14:36 UTC, Jaroslav Škarvada
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1447 normal SHIPPED_LIVE Low: grep security, bug fix, and enhancement update 2015-07-20 18:43:55 UTC

Description Lukas Zachar 2012-03-05 10:02:23 UTC
Description of problem:

\w and [[:alnum:]] seems to match different set of characters:

$ echo 'á' | grep '\w'
$ echo 'á' | grep '[[:alnum:]]'
$ á

Their negations are inconsistent as well:

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '\W'
$ á

This doesn't seem to be problem of a locale (I tried it with the en_US.UTF-8 and cs_CZ.UTF-8, both made the same results).

Affected are accented characters Á Č Ď É Ě Í Ň Ó Ř Š Ť Ú Ů Ý Ž (I have tested just these)


Version-Release number of selected component (if applicable):
grep-2.6.3-2

How reproducible:
always

Steps to Reproduce:
1. echo 'á' | grep '\w'
2.
3.
  
Actual results:
-empty-

Expected results:
á

Additional info:

Comment 2 Jaroslav Škarvada 2012-03-20 15:37:46 UTC
Upstream ticket:
http://savannah.gnu.org/bugs/?19637

Also reproducible with latest grep-2.11.

Comment 3 RHEL Product and Program Management 2012-09-07 05:04:46 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 4 Jaroslav Škarvada 2014-10-30 16:57:42 UTC
Created attachment 952258 [details]
Backported fix

Comment 6 Jaroslav Škarvada 2014-10-30 17:00:29 UTC
This can be also resolved by rebase to grep > 2.20.

Comment 7 Jaroslav Škarvada 2015-01-29 14:36:03 UTC
Created attachment 985631 [details]
grep-2.20 fix

(In reply to Jaroslav Škarvada from comment #6)
> This can be also resolved by rebase to grep > 2.20.

This is now preferred way, patch for grep-2.20 is attached.

Comment 8 Jaroslav Škarvada 2015-01-29 15:37:21 UTC
RHEL-7 is also affected by this, thus cloning to RHEL-7, not to have regression there.

Comment 12 errata-xmlrpc 2015-07-22 06:17:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1447.html


Note You need to log in before you can comment on or make changes to this bug.