Bug 683764

Summary: Perl regular expressions using negations broken in grep
Product: Red Hat Enterprise Linux 5 Reporter: Petr Šplíchal <psplicha>
Component: grepAssignee: Jaroslav Škarvada <jskarvad>
Status: CLOSED WONTFIX QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.6CC: ohudlick
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-31 22:18:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Petr Šplíchal 2011-03-10 10:39:51 UTC
Description of problem:

Currently some of the perl regular expressions containing
negations do not work as expected in grep. This does not seem to
be a pcre bug as the same expressions work fine when tested by
pcregrep.

Version-Release number of selected component (if applicable):
grep-2.5.1-55.el5.x86_64

Class negation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 'abcdef'  | grep -P '[^a-f]'
abcdef
# echo 'abcdef'  | pcregrep '[^a-f]'
# echo 'abcdef'  | grep '[^a-f]'

Digit vs. non-digit:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 3 | grep -P '\d'
3
# echo 3 | grep -P '\D'
3
# echo 3 | pcregrep '\d'
3
# echo 3 | pcregrep '\D'


Word vs. non-word:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo x | grep -P '\w'
x
# echo x | grep -P '\W'
x
# echo x | pcregrep '\w'
x
# echo x | pcregrep '\W'


White-space vs. non-white-space:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo nospace | grep -P '\s'
nospace
# echo nospace | grep -P '\S'
nospace
# echo nospace | pcregrep '\s'
# echo nospace | pcregrep '\S'
nospace

Unicode properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 'letter'  | grep -P '\p{L}'
letter
# echo 'letter'  | grep -P '\P{L}'
letter
# echo 'letter'  | pcregrep '\p{L}'
letter
# echo 'letter'  | pcregrep '\P{L}'

Similarly for other negative properties:

> :: [   FAIL   ] :: Running 'echo 'letter'  | grep -P '\P{L}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo 'lowerr'  | grep -P '\P{Ll}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo 'UPPERR'  | grep -P '\P{Lu}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '999999'  | grep -P '\P{N}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '......'  | grep -P '\P{P}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '€€€€€€'  | grep -P '\P{S}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '      '  | grep -P '\P{Z}'' (Expected 1, got 0)

Comment 3 RHEL Program Management 2011-05-31 15:15:24 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 4 Jaroslav Škarvada 2011-07-31 22:18:30 UTC
PCRE can't limit the matching to single lines and each line in the buffer must be matched separately in order to get correct results. In RHEL-5 it causes problems only with PCRE matcher and some expressions. The fix is included in grep-2.6 in RHEL-6. That's why I am closing this as WONTFIX.