Bug 218312
Summary: | Change of behaviour in escapes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Bastien Nocera <bnocera> |
Component: | sed | Assignee: | Petr Machata <pmachata> |
Status: | CLOSED NOTABUG | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.0 | CC: | mnewsome, tao |
Target Milestone: | --- | Keywords: | Regression |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2006-12-08 21:22:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Bastien Nocera
2006-12-04 16:25:31 UTC
It looks like sed tries to use { as a starting brace for length modifier. Furthermore, with -r modifier on command line, the behavior is inverted. Seems like sed doesn't like the \{ escape in some way. This indeed looks weird, I'll look inside. Thank you; This is a seemingly trivial issue, but it's having a significant impact for the client as it's breaking a number of their scripts and is holding up a update push. Would it be possible to expedite looking into this? Thank you, Frank. Frank/Bastien, I traced the behavior change to upstream update from 4.0.5 to 4.0.6. But RHEL-3 should already have sed-4.0.7-3, and for RHEL-3U8 sed was updated to sed-4.0.7-8.EL3. In other words, the only changes between 3U5 and 3U8 are in our patches. But with those turned off, the problem is still reproducible. So what version are we talking about here? Aha, but when I install the two old packages, the behavior indeed changes. The spec files are identical, apart from changelog entries and additional patches. It must have been something in build system. Unfortunately this was built in beehive, so anything is possible :-/ Anyway, I'll try to track down what happens, and see if the spec maybe doesn't miss some prerequisite. <snip> [root@dl585 ~]# rpm -q sed sed-4.0.7-3 [root@dl585 ~]# echo 'A{A' | sed 's/\{/x/' AxA [root@dl585 ~]# rpm -Uvh sed-4.0.7-5.EL3.x86_64.rpm Preparing... ########################################### [100%] 1:sed ########################################### [100%] [root@dl585 ~]# rpm -q sed sed-4.0.7-5.EL3 [root@dl585 ~]# echo 'A{A' | sed 's/\{/x/' sed: -e expression #1, char 7: Invalid preceding regular expression </snip> Now the interesting part is that if I rebuild the working package on RHEL4, it breaks: [root@dl585 redhat]# rpm -ivh ~/sed-4.0.7-3.src.rpm 1:sed ########################################### [100%] [root@dl585 redhat]# rpmbuild -bb SPECS/sed.spec Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.89556 + umask 022 + cd /usr/src/redhat/BUILD + LANG=C <snip> Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/sed-root Wrote: /usr/src/redhat/RPMS/x86_64/sed-4.0.7-3.x86_64.rpm Wrote: /usr/src/redhat/RPMS/x86_64/sed-debuginfo-4.0.7-3.x86_64.rpm Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.30817 + umask 022 + cd /usr/src/redhat/BUILD + cd sed-4.0.7 + rm -rf /var/tmp/sed-root + exit 0 [root@dl585 redhat]# rpm -Uvh --force /usr/src/redhat/RPMS/x86_64/sed-4.0.7-3.x86_64.rpm Preparing... ########################################### [100%] 1:sed ########################################### [100%] [root@dl585 redhat]# echo 'A{A' | sed 's/\{/x/' sed: -e expression #1, char 7: Invalid preceding regular expression [root@dl585 redhat]# rpm -q sed sed-4.0.7-3 Yes, that's in accordance with my findings. Maybe there was some package installed in time of sed RHEL-3 build, that configury script picked up, and some #ifdef was defined because of this. Later seds probably all be built in such a way, that \{ reports Invalid preceding regular expression. I'm eyeing sources. sed uses glibc's regex matcher, whose flags are defined in regex.h. Without -r, sed uses RE_SYNTAX_POSIX_BASIC, with -r it uses RE_SYNTAX_POSIX_EXTENDED (plus tweaking RE_ICASE, RE_NO_POSIX_BACKTRACKING, and RE_UNMATCHED_RIGHT_PAREN_ORD). The flagsets are compile-time computed and hardwired into the binary. What I think happened is that the precise contents of those flagsets changed between the releases of glibc. Thus sed-4.0.7-3 uses syntax flags 328390 (dec), while sed-4.0.7-8.EL3 uses flags 17105606 (dec). That's how I found for sed-4.0.7-8.EL3, it's similar for older sed: # gdb /bin/sed (gdb) break re_set_syntax Breakpoint 1 at 0x8048e8c (gdb) run 's/\{/x/' x Starting program: /bin/sed 's/\{/x/' x (no debugging symbols found)...(no debugging symbols found)...Breakpoint 1 at 0x48917ed2 Breakpoint 1, 0x48917ed2 in re_set_syntax () from /lib/libc.so.6 (gdb) finish Run till exit from #0 0x48917ed2 in re_set_syntax () from /lib/libc.so.6 0x0804dbd6 in ?? () (gdb) print re_syntax_options $1 = 17105606 That gives us following flag maps: sed-4.0.7-3 328390 0000:0000:0000:0101:0000:0010:1100:0110 sed-4.0.7-8.EL3 17105606 0000:0001:0000:0101:0000:0010:1100:0110 They differ in 25th flag, which, according to glibc sources, is: /* If this bit is set, then \{ cannot be first in an bre or immediately after an alternation or begin-group operator. */ #define RE_CONTEXT_INVALID_DUP (RE_CARET_ANCHORS_HERE << 1) 'b' in 'bre' here probably refers to RE_SYNTAX_POSIX_BASIC. My conclusion is that the change of behavior comes from glibc, and that it's probably glibc being more pedantic about POSIX conformance. I'll ask Ulrich Drepper to drop his opinion on that matter. But I don't think I can do much about it myself. Correction: not Ulrich Drepper, but Jakub Jelinek owns glibc. I'm NEEDINFO'ing him. Jakub, is my reasoning correct? If so, I suppose this should be closed as NOTABUG? Thanks. Another correction, Uli is upstream glibc maintainer, I'm just RHEL/Fedora glibc maintainer. RE_SYNTAX_POSIX_BASIC as part of POSIX compliance fix, '\{' BRE is invalid, see the BRE grammar at http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html They just should use '{', without backslash, which is valid BRE which will DTRT. |