Bug 220087

Summary: bash - incorrect regexp matching in rawhide version
Product: [Fedora] Fedora Reporter: Michal Jaegermann <michal>
Component: initscriptsAssignee: Bill Nottingham <notting>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: anssi.hannula, boklm, curtis, kevquinn, rvokal, twaugh
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 8.57-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-25 15:08:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
updated diff none

Description Michal Jaegermann 2006-12-18 20:27:15 UTC
Description of problem:

The following constructs:

( [[ 0 =~ '0' ]] && echo match || echo no match )
( [[ 0 =~ '^0' ]] && echo match || echo no match )

both print "match" in FC5 and FC6 as expected.  This is not the
case with a version from rawhide.  It prints "match" in the first
case but "no match" in the second one.

Actually it looks like that in rawhide '=~' behaves exactly the
same as '=='.

This affects, among other things, scripts in /etc/sysconfig/network-scripts/
which now are "broken".

Version-Release number of selected component (if applicable):
bash-3.2-1.fc7

How reproducible:
always

Comment 1 Tim Waugh 2007-01-16 12:16:29 UTC
Seems to be to do with quoting:

http://www.mail-archive.com/bug-bash@gnu.org/msg02382.html


Comment 2 Michal Jaegermann 2007-01-16 18:11:26 UTC
> Seems to be to do with quoting

It appears to be even worse than that.  I just tried
( [[ 0 =~ ^0 ]] && echo match || echo no match )
( [[ 0 =~ \^0 ]] && echo match || echo no match )

With bash-3.1-9.fc5.1 both print "match" - as expected;
but with bash-3.2-1.fc7 you will see "match" and "no match".
Consequences are easy to see.

Comment 3 Tim Waugh 2007-01-18 13:58:04 UTC
It is to do with quoting: regular expressions to match against are now treated
in the same way as patterns to match against.  So in comment #2, the first
regular expression is ^0, while the second is \^0 (i.e. ^ loses it's special
meaning).  Similarly, the quotes are taken to be part of the regular expression
in the original test case.

Comment 4 Tim Waugh 2007-01-18 14:49:29 UTC
This is picked out as one of the intentional changes in the FAQ, so I think
we'll have to fix any occurrences we find.  The correct way, and the way that
works with both old and new versions, is this sort of thing:

[[ 0 =~ ^0 ]] && echo match || echo no match

i.e. no quoting, just the regexp.


Comment 5 Tim Waugh 2007-01-18 15:11:13 UTC
I've asked upstream for clarification, since there now seems now way to include
character classes in the regex.

Comment 6 Michal Jaegermann 2007-01-18 17:27:16 UTC
Re comment #4:
This was, on purpose, a trivial example to highlight the issue.
Still assuming that interpretation that no quoting is allowed in
regexp the, besides of seriously breaking existing scripts, this
simply does not work.  How do you propose to put a blank in a pattern
(to keep things simple)?  Even [[:space:]] will fail. And how do you
propose to rewrite matching in /etc/sysconfig/network-script/ifup-aliases
and /etc/sysconfig/network-script/ifup-ifup-routes?  A long list
of regexp characters has a totally different meaning to a shell and
will get expanded so no quoting is simply not an option.

The worst thing is that the breakage caused in who knows in how many
scritps, and where, will be silent.  A network startup in rawhide
pretends that it works, and it even does work in simple cases, while
in reality is currently now utterly broken.  Even banning regexp
matching outright will be vastly saner.

Comment 7 Tim Waugh 2007-01-18 18:16:41 UTC
Michal: see comment #5, in which I mention the character classes problem.  I
discovered it while making a patch for initscripts.

See my post to the bug-bash mailing list.

Comment 8 Tim Waugh 2007-01-19 16:50:00 UTC
No upstream response so far.  I'll back out the change that was introduced in
3.2 for now.

Comment 9 Kevin F. Quinn 2007-02-12 10:45:00 UTC
Upstream response:
http://www.mail-archive.com/bug-bash@gnu.org/msg02445.html

Problem is that bash is relying on undefined behaviour.  BSD libc chooses to 
interpret a back-slashed non-special character in regexes as the character, 
however glibc chooses to interpret them as a back-slash and the character.

Comment 10 Michal Jaegermann 2007-02-12 18:54:14 UTC
> Problem is that bash is relying on undefined behaviour.

If some form of quoting is not available (are different
forms really different internally in bash?) then regular
expressions are not of much use.  Many characters which
you would have to employ there, starting with a white-space,
will be interpreted by bash before getting to regexp.

The original report did not use backslash quoting at all.

Comment 11 Tim Waugh 2007-03-06 10:02:18 UTC
Patch 10 has been released for bash-3.2, which seems to make the situation better:

$ [[ "0 g" =~ ^[[:digit:]]\ [[:lower:]] ]] && echo match || echo no match
match


Comment 12 Tim Waugh 2007-03-06 11:00:42 UTC
Oh, never mind, it still breaks with grouping expressions as in '(#.*)'.

In other words, I can now write all the expressions I'd like to in bash-3.2, but
not in a way that still works with bash-3.1. :-(

I've sent mail to bug-bash about this.  Re-closing for now.

Comment 13 Nicolas Vigier 2007-09-25 12:33:33 UTC
if it can be useful, here's how we fixed on mandriva :
http://svn.mandriva.com/cgi-bin/viewvc.cgi/soft?view=rev&revision=229542
http://qa.mandriva.com/show_bug.cgi?id=32501


Comment 14 Tim Waugh 2007-09-25 12:44:50 UTC
Nicolas: that's really great, thanks.

Reassigning to initscripts to get those fixes in, so that initscripts can work
with upstream bash-3.2 as well as with versions we currently ship.

Comment 15 Tim Waugh 2007-09-25 12:45:23 UTC
Reassigning to owner.

Comment 16 Bill Nottingham 2007-09-25 14:02:51 UTC
Hm, some of those seem excessive - there's nothing quoted in the expressions
that are being changed. I'll attach a modified patch.

Comment 17 Bill Nottingham 2007-09-25 14:12:04 UTC
Moreover, the changes listed don't actually work at all (it doesn't change any
expressions from matching to non-matching.)

So I'm not seeing the point here.

Comment 18 Tim Waugh 2007-09-25 14:30:21 UTC
It won't change things when running our shipped bash because we patch out the
upstream behaviour change (bash-cond-rmatch.patch).  But I'm not sure we want to
carry that patch indefinitely.

Comment 19 Bill Nottingham 2007-09-25 15:04:10 UTC
Created attachment 205581 [details]
updated diff

Here's an updated diff. The other tests should be fine, as they're not using
escapes or character classes. (The ones that are just ^something could be
rewritten to use %%, but... eh.)

Comment 20 Bill Nottingham 2007-09-25 15:08:53 UTC
Will be in 8.57-1.

Comment 21 Anssi Hannula 2007-09-25 19:44:46 UTC
Well, [[ "$dst" =~ "^something" ]] does not work in 3.2, as ^ is considered 
literal ^ there.

However, [[ "$dst" =~ ^something ]] works with both 3.2 and 3.1, so shouldn't 
the rest of the tests have their quotes removed at least?

Comment 22 Bill Nottingham 2007-09-25 19:49:05 UTC
? Works fine for me.

[notting@nostromo: ~]$ dev=/dev/foo
[notting@nostromo: ~]$ if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi
foo


Comment 23 Anssi Hannula 2007-09-25 19:57:53 UTC
With 3.1:
$ dev=/dev/foo; if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi
foo
$

With 3.2:
$ dev=/dev/foo; if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi
$

This is also consistent with the FAQ section E14:
ftp://ftp.cwru.edu/pub/bash/FAQ

The relevant part:
In bash-3.2, the shell was changed to internally quote characters in single-
and double-quoted string arguments to the =~ operator, which suppresses the
special meaning of the characters special to regular expression processing
(`.', `[', `\', `(', `), `*', `+', `?', `{', `|', `^', and `$') and forces
them to be matched literally.

Are you sure you were not using the redhat patched bash as Tim indicated?

Comment 24 Bill Nottingham 2007-09-25 20:18:11 UTC
Aha, that test was on an old terminal still running the old shell. Fixed in CVS.