Bug 799863 - inconsistent \w and [[:alnum:]] behaviour
inconsistent \w and [[:alnum:]] behaviour
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: grep (Show other bugs)
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: Jaroslav Škarvada
Jan Kepler
: Patch
Depends On:
Blocks: 1159012 1187243
  Show dependency treegraph
Reported: 2012-03-05 05:02 EST by Lukas Zachar
Modified: 2015-07-22 02:17 EDT (History)
2 users (show)

See Also:
Fixed In Version: grep-2.20-1.el6
Doc Type: Bug Fix
Doc Text:
Cause: Previously the behavior of \w and \W symbols in regular expressions were inconsistent with the behavior of [:alnum:] character class. Consequence: Some expressions could cause incorrect match/non-match. Fix: Upstream fix was backported that makes the behavior consistent. Result: Now \w is synonym for [_[:alnum:]] and \W for [^_[:alnum:]].
Story Points: ---
Clone Of:
: 1159012 1187243 (view as bug list)
Last Closed: 2015-07-22 02:17:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Backported fix (3.30 KB, patch)
2014-10-30 12:57 EDT, Jaroslav Škarvada
no flags Details | Diff
grep-2.20 fix (3.89 KB, patch)
2015-01-29 09:36 EST, Jaroslav Škarvada
no flags Details | Diff

  None (edit)
Description Lukas Zachar 2012-03-05 05:02:23 EST
Description of problem:

\w and [[:alnum:]] seems to match different set of characters:

$ echo 'á' | grep '\w'
$ echo 'á' | grep '[[:alnum:]]'
$ á

Their negations are inconsistent as well:

$ echo 'á' | grep '[^[:alnum:]]'
$ echo 'á' | grep '\W'
$ á

This doesn't seem to be problem of a locale (I tried it with the en_US.UTF-8 and cs_CZ.UTF-8, both made the same results).

Affected are accented characters Á Č Ď É Ě Í Ň Ó Ř Š Ť Ú Ů Ý Ž (I have tested just these)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. echo 'á' | grep '\w'
Actual results:

Expected results:

Additional info:
Comment 2 Jaroslav Škarvada 2012-03-20 11:37:46 EDT
Upstream ticket:

Also reproducible with latest grep-2.11.
Comment 3 RHEL Product and Program Management 2012-09-07 01:04:46 EDT
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.
Comment 4 Jaroslav Škarvada 2014-10-30 12:57:42 EDT
Created attachment 952258 [details]
Backported fix
Comment 6 Jaroslav Škarvada 2014-10-30 13:00:29 EDT
This can be also resolved by rebase to grep > 2.20.
Comment 7 Jaroslav Škarvada 2015-01-29 09:36:03 EST
Created attachment 985631 [details]
grep-2.20 fix

(In reply to Jaroslav Škarvada from comment #6)
> This can be also resolved by rebase to grep > 2.20.

This is now preferred way, patch for grep-2.20 is attached.
Comment 8 Jaroslav Škarvada 2015-01-29 10:37:21 EST
RHEL-7 is also affected by this, thus cloning to RHEL-7, not to have regression there.
Comment 12 errata-xmlrpc 2015-07-22 02:17:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.