From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 Description of problem: Expr's regex opertator treats '.*' as overly greedy. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: $ expr 'abc xxxxx def' : 'abc .* def' Actual Results: 0 Expected Results: 13 Additional info: This used to work. It broke with 8.0. I believe that the behavior of the * operator is defined under POSIX. This is the behavior on at least six other platforms including Mac OSX (Darwin fool.ximbiot.com 5.5 Darwin Kernel Version 5.5: Thu May 30 14:51:26 PDT 2002; root:xnu/xnu-201.42.3.obj~1/RELEASE_PPC Power Macintosh powerpc), IBM AIX (AIX rioscpu2 3 4 000030498200), Solaris (SunOS sun120.sdrc.com 5.8 Generic sun4u sparc SUNW,Ultra-5_10), HP HP-UX (HP-UX hp253 B.11.00 A 9000/785 2000761248 two-user license), SGI IRIX (IRIX64 sgiop131 6.5 04191226 IP30), & BSDI BSD/OS (BSD/OS thor.sdrc.com 4.0.1 BSDI BSD/OS 4.0.1 Kernel #3: Thu Mar 9 11:29:16 EST 2000).
expr sets re_syntax_options to RE_SYNTAX_POSIX_BASIC, which implies RE_DOT_NEWLINE, so I think this is a regex bug.
. matches newline just fine: $ expr 'abc > xxxxx > def' : 'abc.*' 13 $ The problem, as near as I can tell, is that when .* matches to the end of the string, it is not backing up again to find if it can find strings that match characters after the * in the pattern: $ expr 'abc > xxxxx > def' : 'abc > .* > def' 0 $ I don't think that this is the complete story, though in the pattern above (the same as I initially reported), notice that .* didn't need to match newline at all since both the string and the pattern had all the necessary newlines in it, but I just discovered that this pattern works as expected: $ expr 'abc > xxxxx > def' : 'abc.*def' 13 $ so something is definately wrong but the problem may be some combination of the above. Perhaps the regex matcher wouldn't back up over newline, causing the matcher to drop out with no match after backing up the match far enough to decide that (I've made the newlines explicit): abc\n .* matched abc\n xxxxx\n just fine, but that the \n def\n left in the pattern didn't match the def\n which was as far as it would back up into the string since it hit a newline? Or some such.
P.S. This stymies my CVS development under Linux since I can't run the test suite and it doesn't seem to be an easy task to back up to RH 7.3's glibc 2.2.5 or to move forward to Rawhide's glibc 2.3.1 all by my lonesome. Derek
There are likely to be some other programs, possibly even related to security issues (LogWatch?), that are relying on proper regex behavior as well.
I attempted to build GNU coreutils (formerly sh-utils, I guess) version 4.5.3 on the advice of its (and sh-utils') maintainer, Jim Meyering, as he was not experienceing this problem. I still experience the bug, lending credence to the hypothesis that this bug is in the glibc regex libraries rather than in expr, and thus the sh-utils package.
Seems to work just fine in glibc-2.3.1-{2,5,6}. Wonder why you cannot upgrade to glibc-2.3.1-5, in rawhide is what will eventually become 8.0 bugfix errata.
Ah, that's where Rawhide fits in. I actually wasn't sure. My bigest problem was finding and downloading all the relevant packages, but I got them all installed now and sure enough, expr works properly again. Thanks for all the help.
Well, rawhide is certainly not defined this way, it can contain anything from very experimental to errata candidate packages. But in this case ATM glibc should be ok (it has not gone through official QA, so if you want that, you need to wait, but otherwise you can try it).
Well, I've installed 2.3.1-5 and it works so far. Hopefully it won't intorduce any other problems. If it does, I suppose I can try downgrading to 2.2.5 until the official upgrade release. For now, this lets me work on CVS again. Thanks again.