Bug 77531

Summary: expr 2.0.12 regex * operator too greedy
Product: [Retired] Red Hat Linux Reporter: Derek Price <oberon>
Component: glibcAssignee: Jakub Jelinek <jakub>
Status: CLOSED RAWHIDE QA Contact: Ben Levenson <benl>
Severity: low Docs Contact:
Priority: medium    
Version: 8.0CC: fweimer, twaugh
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2002-11-08 22:32:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Derek Price 2002-11-08 17:11:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823
Netscape/7.0

Description of problem:
Expr's regex opertator treats '.*' as overly greedy.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
$ expr 'abc
xxxxx
def' : 'abc
.*
def'

Actual Results:  0


Expected Results:  13

Additional info:

This used to work.  It broke with 8.0.

I believe that the behavior of the * operator is defined under POSIX.

This is the behavior on at least six other platforms including Mac OSX (Darwin
fool.ximbiot.com 5.5 Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002;
root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC  Power Macintosh powerpc), IBM AIX (AIX
rioscpu2 3 4 000030498200), Solaris (SunOS sun120.sdrc.com 5.8 Generic sun4u
sparc SUNW,Ultra-5_10), HP HP-UX (HP-UX hp253 B.11.00 A 9000/785 2000761248
two-user license), SGI IRIX (IRIX64 sgiop131 6.5 04191226 IP30), & BSDI BSD/OS
(BSD/OS thor.sdrc.com 4.0.1 BSDI BSD/OS 4.0.1 Kernel #3: Thu Mar  9 11:29:16 EST
2000).

Comment 1 Tim Waugh 2002-11-08 18:28:49 UTC
expr sets re_syntax_options to RE_SYNTAX_POSIX_BASIC, which implies
RE_DOT_NEWLINE, so I think this is a regex bug.

Comment 2 Derek Price 2002-11-08 18:47:22 UTC
. matches newline just fine:

$ expr 'abc
> xxxxx
> def' : 'abc.*'
13
$ 

The problem, as near as I can tell, is that when .* matches to the end of the
string, it is not backing up again to find if it can find strings that match
characters after the * in the pattern:

$ expr 'abc
> xxxxx
> def' : 'abc
> .*
> def'
0
$ 

I don't think that this is the complete story, though in the pattern above (the
same as I initially reported), notice that .* didn't need to match newline at
all since both the string and the pattern had all the necessary newlines in it,
but I just discovered that this pattern works as expected:

$ expr 'abc
> xxxxx
> def' : 'abc.*def'
13
$ 

so something is definately wrong but the problem may be some combination of the
above.

Perhaps the regex matcher wouldn't back up over newline, causing the matcher to
drop out with no match after backing up the match far enough to decide that
(I've made the newlines explicit):

abc\n
.*

matched

abc\n
xxxxx\n

just fine, but that the

\n
def\n

left in the pattern didn't match the

def\n

which was as far as it would back up into the string since it hit a newline?

Or some such.

Comment 3 Derek Price 2002-11-08 19:47:50 UTC
P.S. This stymies my CVS development under Linux since I can't run the test
suite and it doesn't seem to be an easy task to back up to RH 7.3's glibc 2.2.5
or to move forward to Rawhide's glibc 2.3.1 all by my lonesome.

Derek

Comment 4 Derek Price 2002-11-08 20:56:59 UTC
There are likely to be some other programs, possibly even related to security
issues (LogWatch?), that are relying on proper regex behavior as well.

Comment 5 Derek Price 2002-11-08 21:31:27 UTC
I attempted to build GNU coreutils (formerly sh-utils, I guess) version 4.5.3 on
the advice of its (and sh-utils') maintainer, Jim Meyering, as he was not
experienceing this problem.

I still experience the bug, lending credence to the hypothesis that this bug is
in the glibc regex libraries rather than in expr, and thus the sh-utils package.

Comment 6 Jakub Jelinek 2002-11-08 22:05:29 UTC
Seems to work just fine in glibc-2.3.1-{2,5,6}.
Wonder why you cannot upgrade to glibc-2.3.1-5, in rawhide is what will
eventually become 8.0 bugfix errata.

Comment 7 Derek Price 2002-11-08 22:16:31 UTC
Ah, that's where Rawhide fits in.  I actually wasn't sure.

My bigest problem was finding and downloading all the relevant packages, but I
got them all installed now and sure enough, expr works properly again.

Thanks for all the help.

Comment 8 Jakub Jelinek 2002-11-08 22:21:57 UTC
Well, rawhide is certainly not defined this way, it can contain anything from
very experimental to errata candidate packages.
But in this case ATM glibc should be ok (it has not gone through official QA,
so if you want that, you need to wait, but otherwise you can try it).

Comment 9 Derek Price 2002-11-08 22:32:32 UTC
Well, I've installed 2.3.1-5 and it works so far.  Hopefully it won't intorduce
any other problems.  If it does, I suppose I can try downgrading to 2.2.5 until
the official upgrade release.  For now, this lets me work on CVS again.

Thanks again.