Bug 683753

Summary: grep does not correctly handle unicode symbol property
Product: Red Hat Enterprise Linux 6 Reporter: Petr Šplíchal <psplicha>
Component: grepAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED ERRATA QA Contact: Martin Frodl <mfrodl>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: fholec, jkejda, mfrodl, mnavrati, ohudlick, ovasik, pbonzini, ppisar, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: http://savannah.gnu.org/patch/?3934
Whiteboard:
Fixed In Version: grep-2.6.3-6.el6 Doc Type: Bug Fix
Doc Text:
Previously, the grep utility did not request processing of UTF-8 from the Perl-compatible regular expressions (PCRE) library if a UTF-8 locale was in effect. As a consequence, Unicode symbols were not correctly matched if a Perl regular expression (the "-P" option) was used with a UTF-8 locale. This update adds a request for UTF-8 processing to the PCRE library, and grep now correctly handles Unicode symbols in the described situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-04 09:03:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 947782    
Attachments:
Description Flags
Proposed fIx none

Description Petr Šplíchal 2011-03-10 10:00:57 UTC
Description of problem:

Perl regular expression '\p{S}' should match any symbol, similarly
'\P{S}' should match any character which is not a symbol. This
works fine for basic ascii characters such as '$' but is broken
for unicode characters such as '€'.

# echo '€' | grep -P '\p{S}'
€
# echo '€' | grep -P '\P{S}'
€

# echo '$' | grep -P '\p{S}'
$
# echo '$' | grep -P '\P{S}'

Looks to be a grep bug as the same works fine in pcregrep:

# echo '€' | pcregrep -u '\p{S}'
€
# echo '€' | pcregrep -u '\P{S}'

Tested under cs_CZ.UTF-8 and en_US.UTF-8 locales.

Version-Release number of selected component (if applicable):
grep-2.6.3-2.el6.x86_64

Comment 1 RHEL Program Management 2011-07-06 01:36:48 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 2 Petr Pisar 2012-08-24 08:07:57 UTC
This is bug in grep. It does not request UTF-8 processing in pcre when running in UTF-8 locale.

Current behavior is identical to ASCII mode:

$ printf '/%s/\n%s\n' '\P{S}' '€' | ./pcretest 
PCRE version 8.31 2012-07-06

  re> data>  0: \xe2
data> 

But it should run in UTF-8 mode:

$ printf '/%s/8\n%s\n' '\P{S}' '€' | ./pcretest 
PCRE version 8.31 2012-07-06

  re> data> No match
data> 

The difference is the "/8" modifier. The PCRE API has PCRE_UTF8 option in pcre_compile(3) for that and grep should pass it to pcre_compile(3) when UTF-8 locale is in effect.

Comment 3 Petr Pisar 2012-08-24 08:41:05 UTC
Created attachment 606796 [details]
Proposed fIx

This is patch against development grep, it applies to 2.14 too.

Comment 4 Petr Pisar 2012-08-24 08:46:24 UTC
There seems to be already an upstream report with different fix <http://savannah.gnu.org/patch/?3934>.

Comment 5 RHEL Program Management 2012-09-07 05:11:32 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unable to address this
request at this time.

Red Hat invites you to ask your support representative to
propose this request, if appropriate, in the next release of
Red Hat Enterprise Linux.

Comment 6 Jaroslav Škarvada 2012-10-03 07:55:35 UTC
(In reply to comment #4)
Thanks.

> There seems to be already an upstream report with different fix
> <http://savannah.gnu.org/patch/?3934>.
>
Also linked to this BZ, waiting for upstream resolution.

Comment 7 Paolo Bonzini 2012-10-03 09:20:56 UTC
Fixed in upstream commit 003797b2d498fd16f67790a3c1129df9d0eb4722.

Comment 8 Jaroslav Škarvada 2012-10-04 07:26:38 UTC
(In reply to comment #7)
> Fixed in upstream commit 003797b2d498fd16f67790a3c1129df9d0eb4722.

Thanks.

Comment 15 Jaroslav Škarvada 2014-03-26 10:21:52 UTC
Fixed in grep-2.6.3-6.el6. Test is included, it is run during the compilation.

Comment 20 errata-xmlrpc 2014-06-04 09:03:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0622.html