Bug 683764 - Perl regular expressions using negations broken in grep
Summary: Perl regular expressions using negations broken in grep
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: grep
Version: 5.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-03-10 10:39 UTC by Petr Šplíchal
Modified: 2016-06-01 01:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-31 22:18:30 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Petr Šplíchal 2011-03-10 10:39:51 UTC
Description of problem:

Currently some of the perl regular expressions containing
negations do not work as expected in grep. This does not seem to
be a pcre bug as the same expressions work fine when tested by
pcregrep.

Version-Release number of selected component (if applicable):
grep-2.5.1-55.el5.x86_64

Class negation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 'abcdef'  | grep -P '[^a-f]'
abcdef
# echo 'abcdef'  | pcregrep '[^a-f]'
# echo 'abcdef'  | grep '[^a-f]'

Digit vs. non-digit:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 3 | grep -P '\d'
3
# echo 3 | grep -P '\D'
3
# echo 3 | pcregrep '\d'
3
# echo 3 | pcregrep '\D'


Word vs. non-word:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo x | grep -P '\w'
x
# echo x | grep -P '\W'
x
# echo x | pcregrep '\w'
x
# echo x | pcregrep '\W'


White-space vs. non-white-space:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo nospace | grep -P '\s'
nospace
# echo nospace | grep -P '\S'
nospace
# echo nospace | pcregrep '\s'
# echo nospace | pcregrep '\S'
nospace

Unicode properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# echo 'letter'  | grep -P '\p{L}'
letter
# echo 'letter'  | grep -P '\P{L}'
letter
# echo 'letter'  | pcregrep '\p{L}'
letter
# echo 'letter'  | pcregrep '\P{L}'

Similarly for other negative properties:

> :: [   FAIL   ] :: Running 'echo 'letter'  | grep -P '\P{L}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo 'lowerr'  | grep -P '\P{Ll}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo 'UPPERR'  | grep -P '\P{Lu}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '999999'  | grep -P '\P{N}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '......'  | grep -P '\P{P}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '€€€€€€'  | grep -P '\P{S}'' (Expected 1, got 0)
> :: [   FAIL   ] :: Running 'echo '      '  | grep -P '\P{Z}'' (Expected 1, got 0)

Comment 3 RHEL Program Management 2011-05-31 15:15:24 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated in the
current release, Red Hat is unfortunately unable to address this
request at this time. Red Hat invites you to ask your support
representative to propose this request, if appropriate and relevant,
in the next release of Red Hat Enterprise Linux.

Comment 4 Jaroslav Škarvada 2011-07-31 22:18:30 UTC
PCRE can't limit the matching to single lines and each line in the buffer must be matched separately in order to get correct results. In RHEL-5 it causes problems only with PCRE matcher and some expressions. The fix is included in grep-2.6 in RHEL-6. That's why I am closing this as WONTFIX.


Note You need to log in before you can comment on or make changes to this bug.