Bug 220087
Summary: | bash - incorrect regexp matching in rawhide version | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> | ||||
Component: | initscripts | Assignee: | Bill Nottingham <notting> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | rawhide | CC: | anssi.hannula, boklm, curtis, kevquinn, rvokal, twaugh | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | 8.57-1 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-09-25 15:08:53 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Michal Jaegermann
2006-12-18 20:27:15 UTC
Seems to be to do with quoting: http://www.mail-archive.com/bug-bash@gnu.org/msg02382.html > Seems to be to do with quoting
It appears to be even worse than that. I just tried
( [[ 0 =~ ^0 ]] && echo match || echo no match )
( [[ 0 =~ \^0 ]] && echo match || echo no match )
With bash-3.1-9.fc5.1 both print "match" - as expected;
but with bash-3.2-1.fc7 you will see "match" and "no match".
Consequences are easy to see.
It is to do with quoting: regular expressions to match against are now treated in the same way as patterns to match against. So in comment #2, the first regular expression is ^0, while the second is \^0 (i.e. ^ loses it's special meaning). Similarly, the quotes are taken to be part of the regular expression in the original test case. This is picked out as one of the intentional changes in the FAQ, so I think we'll have to fix any occurrences we find. The correct way, and the way that works with both old and new versions, is this sort of thing: [[ 0 =~ ^0 ]] && echo match || echo no match i.e. no quoting, just the regexp. I've asked upstream for clarification, since there now seems now way to include character classes in the regex. Re comment #4: This was, on purpose, a trivial example to highlight the issue. Still assuming that interpretation that no quoting is allowed in regexp the, besides of seriously breaking existing scripts, this simply does not work. How do you propose to put a blank in a pattern (to keep things simple)? Even [[:space:]] will fail. And how do you propose to rewrite matching in /etc/sysconfig/network-script/ifup-aliases and /etc/sysconfig/network-script/ifup-ifup-routes? A long list of regexp characters has a totally different meaning to a shell and will get expanded so no quoting is simply not an option. The worst thing is that the breakage caused in who knows in how many scritps, and where, will be silent. A network startup in rawhide pretends that it works, and it even does work in simple cases, while in reality is currently now utterly broken. Even banning regexp matching outright will be vastly saner. Michal: see comment #5, in which I mention the character classes problem. I discovered it while making a patch for initscripts. See my post to the bug-bash mailing list. No upstream response so far. I'll back out the change that was introduced in 3.2 for now. Upstream response: http://www.mail-archive.com/bug-bash@gnu.org/msg02445.html Problem is that bash is relying on undefined behaviour. BSD libc chooses to interpret a back-slashed non-special character in regexes as the character, however glibc chooses to interpret them as a back-slash and the character. > Problem is that bash is relying on undefined behaviour.
If some form of quoting is not available (are different
forms really different internally in bash?) then regular
expressions are not of much use. Many characters which
you would have to employ there, starting with a white-space,
will be interpreted by bash before getting to regexp.
The original report did not use backslash quoting at all.
Patch 10 has been released for bash-3.2, which seems to make the situation better: $ [[ "0 g" =~ ^[[:digit:]]\ [[:lower:]] ]] && echo match || echo no match match Oh, never mind, it still breaks with grouping expressions as in '(#.*)'. In other words, I can now write all the expressions I'd like to in bash-3.2, but not in a way that still works with bash-3.1. :-( I've sent mail to bug-bash about this. Re-closing for now. if it can be useful, here's how we fixed on mandriva : http://svn.mandriva.com/cgi-bin/viewvc.cgi/soft?view=rev&revision=229542 http://qa.mandriva.com/show_bug.cgi?id=32501 Nicolas: that's really great, thanks. Reassigning to initscripts to get those fixes in, so that initscripts can work with upstream bash-3.2 as well as with versions we currently ship. Reassigning to owner. Hm, some of those seem excessive - there's nothing quoted in the expressions that are being changed. I'll attach a modified patch. Moreover, the changes listed don't actually work at all (it doesn't change any expressions from matching to non-matching.) So I'm not seeing the point here. It won't change things when running our shipped bash because we patch out the upstream behaviour change (bash-cond-rmatch.patch). But I'm not sure we want to carry that patch indefinitely. Created attachment 205581 [details]
updated diff
Here's an updated diff. The other tests should be fine, as they're not using
escapes or character classes. (The ones that are just ^something could be
rewritten to use %%, but... eh.)
Will be in 8.57-1. Well, [[ "$dst" =~ "^something" ]] does not work in 3.2, as ^ is considered literal ^ there. However, [[ "$dst" =~ ^something ]] works with both 3.2 and 3.1, so shouldn't the rest of the tests have their quotes removed at least? ? Works fine for me. [notting@nostromo: ~]$ dev=/dev/foo [notting@nostromo: ~]$ if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi foo With 3.1: $ dev=/dev/foo; if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi foo $ With 3.2: $ dev=/dev/foo; if [[ "$dev" =~ "^/dev/" ]] ;then echo foo ; fi $ This is also consistent with the FAQ section E14: ftp://ftp.cwru.edu/pub/bash/FAQ The relevant part: In bash-3.2, the shell was changed to internally quote characters in single- and double-quoted string arguments to the =~ operator, which suppresses the special meaning of the characters special to regular expression processing (`.', `[', `\', `(', `), `*', `+', `?', `{', `|', `^', and `$') and forces them to be matched literally. Are you sure you were not using the redhat patched bash as Tim indicated? Aha, that test was on an old terminal still running the old shell. Fixed in CVS. |