Bug 683753 - grep does not correctly handle unicode symbol property
Summary: grep does not correctly handle unicode symbol property
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: grep
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: Martin Frodl
URL: http://savannah.gnu.org/patch/?3934
Whiteboard:
Depends On:
Blocks: 947782
TreeView+ depends on / blocked
 
Reported: 2011-03-10 10:00 UTC by Petr Šplíchal
Modified: 2016-06-01 01:41 UTC (History)
9 users (show)

Fixed In Version: grep-2.6.3-6.el6
Doc Type: Bug Fix
Doc Text:
Previously, the grep utility did not request processing of UTF-8 from the Perl-compatible regular expressions (PCRE) library if a UTF-8 locale was in effect. As a consequence, Unicode symbols were not correctly matched if a Perl regular expression (the "-P" option) was used with a UTF-8 locale. This update adds a request for UTF-8 processing to the PCRE library, and grep now correctly handles Unicode symbols in the described situation.
Clone Of:
Environment:
Last Closed: 2014-06-04 09:03:20 UTC


Attachments (Terms of Use)
Proposed fIx (1.24 KB, patch)
2012-08-24 08:41 UTC, Petr Pisar
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0622 normal SHIPPED_LIVE grep bug fix update 2014-06-04 13:03:09 UTC

Description Petr Šplíchal 2011-03-10 10:00:57 UTC
Description of problem:

Perl regular expression '\p{S}' should match any symbol, similarly
'\P{S}' should match any character which is not a symbol. This
works fine for basic ascii characters such as '$' but is broken
for unicode characters such as '€'.

# echo '€' | grep -P '\p{S}'
€
# echo '€' | grep -P '\P{S}'
€

# echo '$' | grep -P '\p{S}'
$
# echo '$' | grep -P '\P{S}'

Looks to be a grep bug as the same works fine in pcregrep:

# echo '€' | pcregrep -u '\p{S}'
€
# echo '€' | pcregrep -u '\P{S}'

Tested under cs_CZ.UTF-8 and en_US.UTF-8 locales.

Version-Release number of selected component (if applicable):
grep-2.6.3-2.el6.x86_64

Comment 1 RHEL Product and Program Management 2011-07-06 01:36:48 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 2 Petr Pisar 2012-08-24 08:07:57 UTC
This is bug in grep. It does not request UTF-8 processing in pcre when running in UTF-8 locale.

Current behavior is identical to ASCII mode:

$ printf '/%s/\n%s\n' '\P{S}' '€' | ./pcretest 
PCRE version 8.31 2012-07-06

  re> data>  0: \xe2
data> 

But it should run in UTF-8 mode:

$ printf '/%s/8\n%s\n' '\P{S}' '€' | ./pcretest 
PCRE version 8.31 2012-07-06

  re> data> No match
data> 

The difference is the "/8" modifier. The PCRE API has PCRE_UTF8 option in pcre_compile(3) for that and grep should pass it to pcre_compile(3) when UTF-8 locale is in effect.

Comment 3 Petr Pisar 2012-08-24 08:41:05 UTC
Created attachment 606796 [details]
Proposed fIx

This is patch against development grep, it applies to 2.14 too.

Comment 4 Petr Pisar 2012-08-24 08:46:24 UTC
There seems to be already an upstream report with different fix <http://savannah.gnu.org/patch/?3934>.

Comment 5 RHEL Product and Program Management 2012-09-07 05:11:32 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 6 Jaroslav Škarvada 2012-10-03 07:55:35 UTC
(In reply to comment #4)
Thanks.

> There seems to be already an upstream report with different fix
> <http://savannah.gnu.org/patch/?3934>.
>
Also linked to this BZ, waiting for upstream resolution.

Comment 7 Paolo Bonzini 2012-10-03 09:20:56 UTC
Fixed in upstream commit 003797b2d498fd16f67790a3c1129df9d0eb4722.

Comment 8 Jaroslav Škarvada 2012-10-04 07:26:38 UTC
(In reply to comment #7)
> Fixed in upstream commit 003797b2d498fd16f67790a3c1129df9d0eb4722.

Thanks.

Comment 15 Jaroslav Škarvada 2014-03-26 10:21:52 UTC
Fixed in grep-2.6.3-6.el6. Test is included, it is run during the compilation.

Comment 20 errata-xmlrpc 2014-06-04 09:03:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0622.html


Note You need to log in before you can comment on or make changes to this bug.