Bug 77531 - expr 2.0.12 regex * operator too greedy
Summary: expr 2.0.12 regex * operator too greedy
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: glibc
Version: 8.0
Hardware: i686
OS: Linux
medium
low
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-11-08 17:11 UTC by Derek Price
Modified: 2016-11-24 15:24 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2002-11-08 22:32:38 UTC
Embargoed:


Attachments (Terms of Use)

Description Derek Price 2002-11-08 17:11:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823
Netscape/7.0

Description of problem:
Expr's regex opertator treats '.*' as overly greedy.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
$ expr 'abc
xxxxx
def' : 'abc
.*
def'

Actual Results:  0


Expected Results:  13

Additional info:

This used to work.  It broke with 8.0.

I believe that the behavior of the * operator is defined under POSIX.

This is the behavior on at least six other platforms including Mac OSX (Darwin
fool.ximbiot.com 5.5 Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002;
root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC  Power Macintosh powerpc), IBM AIX (AIX
rioscpu2 3 4 000030498200), Solaris (SunOS sun120.sdrc.com 5.8 Generic sun4u
sparc SUNW,Ultra-5_10), HP HP-UX (HP-UX hp253 B.11.00 A 9000/785 2000761248
two-user license), SGI IRIX (IRIX64 sgiop131 6.5 04191226 IP30), & BSDI BSD/OS
(BSD/OS thor.sdrc.com 4.0.1 BSDI BSD/OS 4.0.1 Kernel #3: Thu Mar  9 11:29:16 EST
2000).

Comment 1 Tim Waugh 2002-11-08 18:28:49 UTC
expr sets re_syntax_options to RE_SYNTAX_POSIX_BASIC, which implies
RE_DOT_NEWLINE, so I think this is a regex bug.

Comment 2 Derek Price 2002-11-08 18:47:22 UTC
. matches newline just fine:

$ expr 'abc
> xxxxx
> def' : 'abc.*'
13
$ 

The problem, as near as I can tell, is that when .* matches to the end of the
string, it is not backing up again to find if it can find strings that match
characters after the * in the pattern:

$ expr 'abc
> xxxxx
> def' : 'abc
> .*
> def'
0
$ 

I don't think that this is the complete story, though in the pattern above (the
same as I initially reported), notice that .* didn't need to match newline at
all since both the string and the pattern had all the necessary newlines in it,
but I just discovered that this pattern works as expected:

$ expr 'abc
> xxxxx
> def' : 'abc.*def'
13
$ 

so something is definately wrong but the problem may be some combination of the
above.

Perhaps the regex matcher wouldn't back up over newline, causing the matcher to
drop out with no match after backing up the match far enough to decide that
(I've made the newlines explicit):

abc\n
.*

matched

abc\n
xxxxx\n

just fine, but that the

\n
def\n

left in the pattern didn't match the

def\n

which was as far as it would back up into the string since it hit a newline?

Or some such.

Comment 3 Derek Price 2002-11-08 19:47:50 UTC
P.S. This stymies my CVS development under Linux since I can't run the test
suite and it doesn't seem to be an easy task to back up to RH 7.3's glibc 2.2.5
or to move forward to Rawhide's glibc 2.3.1 all by my lonesome.

Derek

Comment 4 Derek Price 2002-11-08 20:56:59 UTC
There are likely to be some other programs, possibly even related to security
issues (LogWatch?), that are relying on proper regex behavior as well.

Comment 5 Derek Price 2002-11-08 21:31:27 UTC
I attempted to build GNU coreutils (formerly sh-utils, I guess) version 4.5.3 on
the advice of its (and sh-utils') maintainer, Jim Meyering, as he was not
experienceing this problem.

I still experience the bug, lending credence to the hypothesis that this bug is
in the glibc regex libraries rather than in expr, and thus the sh-utils package.

Comment 6 Jakub Jelinek 2002-11-08 22:05:29 UTC
Seems to work just fine in glibc-2.3.1-{2,5,6}.
Wonder why you cannot upgrade to glibc-2.3.1-5, in rawhide is what will
eventually become 8.0 bugfix errata.

Comment 7 Derek Price 2002-11-08 22:16:31 UTC
Ah, that's where Rawhide fits in.  I actually wasn't sure.

My bigest problem was finding and downloading all the relevant packages, but I
got them all installed now and sure enough, expr works properly again.

Thanks for all the help.

Comment 8 Jakub Jelinek 2002-11-08 22:21:57 UTC
Well, rawhide is certainly not defined this way, it can contain anything from
very experimental to errata candidate packages.
But in this case ATM glibc should be ok (it has not gone through official QA,
so if you want that, you need to wait, but otherwise you can try it).

Comment 9 Derek Price 2002-11-08 22:32:32 UTC
Well, I've installed 2.3.1-5 and it works so far.  Hopefully it won't intorduce
any other problems.  If it does, I suppose I can try downgrading to 2.2.5 until
the official upgrade release.  For now, this lets me work on CVS again.

Thanks again.


Note You need to log in before you can comment on or make changes to this bug.