Bug 218312

Summary: Change of behaviour in escapes
Product: Red Hat Enterprise Linux 3 Reporter: Bastien Nocera <bnocera>
Component: sedAssignee: Petr Machata <pmachata>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: mnewsome, tao
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-12-08 21:22:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Bastien Nocera 2006-12-04 16:25:31 UTC
The behaviour of escapes in sed was changed between Update 5 and Update 8,
breaking existing scripts.

On RHEL3U5:
$ echo 'A{A' | sed 's/\{/x/'
AxA

On RHEL3U8:
$ echo  'A{A' | sed 's/\{/x/'
sed.rhel3u8: -e expression #1, char 8: Invalid preceding regular expression
$ echo 'A{A' | sed 's/{/x/'
AxA

On RHEL4, RHEL5 and FC6:
$ echo  'A{A' | sed 's/\{/x/'
sed.rhel3u8: -e expression #1, char 8: Invalid preceding regular expression

Comment 1 Petr Machata 2006-12-04 20:33:32 UTC
It looks like sed tries to use { as a starting brace for length modifier. 
Furthermore, with -r modifier on command line, the behavior is inverted.  Seems
like sed doesn't like the \{ escape in some way.  This indeed looks weird, I'll
look inside.

Comment 2 Frank Hirtz 2006-12-07 22:10:51 UTC
Thank you; This is a seemingly trivial issue, but it's having a significant
impact for the client as it's breaking a number of their scripts and is holding
up a update push. Would it be possible to expedite looking into this?

Thank you,

Frank.

Comment 3 Petr Machata 2006-12-08 17:40:47 UTC
Frank/Bastien, I traced the behavior change to upstream update from 4.0.5 to
4.0.6.  But RHEL-3 should already have sed-4.0.7-3, and for RHEL-3U8 sed was
updated to sed-4.0.7-8.EL3.  In other words, the only changes between 3U5 and
3U8 are in our patches.  But with those turned off, the problem is still
reproducible.  So what version are we talking about here?

Comment 4 Petr Machata 2006-12-08 17:50:57 UTC
Aha, but when I install the two old packages, the behavior indeed changes.

Comment 5 Petr Machata 2006-12-08 18:13:22 UTC
The spec files are identical, apart from changelog entries and additional
patches.  It must have been something in build system.  Unfortunately this was
built in beehive, so anything is possible :-/ Anyway, I'll try to track down
what happens, and see if the spec maybe doesn't miss some prerequisite.

Comment 6 Frank Hirtz 2006-12-08 18:18:36 UTC
<snip>
[root@dl585 ~]# rpm -q sed
sed-4.0.7-3
[root@dl585 ~]# echo  'A{A' | sed 's/\{/x/'
AxA
[root@dl585 ~]# rpm -Uvh sed-4.0.7-5.EL3.x86_64.rpm
Preparing...                ########################################### [100%]
   1:sed                    ########################################### [100%]
[root@dl585 ~]# rpm -q sed sed-4.0.7-5.EL3
[root@dl585 ~]# echo  'A{A' | sed 's/\{/x/' sed: -e expression #1, char 7:
Invalid preceding regular expression
</snip>


Now the interesting part is that if I rebuild the working package on RHEL4, it
breaks:

[root@dl585 redhat]# rpm -ivh ~/sed-4.0.7-3.src.rpm
   1:sed                    ########################################### [100%]
[root@dl585 redhat]# rpmbuild -bb SPECS/sed.spec
Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.89556
+ umask 022
+ cd /usr/src/redhat/BUILD
+ LANG=C
<snip>
Checking for unpackaged file(s): /usr/lib/rpm/check-files /var/tmp/sed-root
Wrote: /usr/src/redhat/RPMS/x86_64/sed-4.0.7-3.x86_64.rpm
Wrote: /usr/src/redhat/RPMS/x86_64/sed-debuginfo-4.0.7-3.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.30817
+ umask 022
+ cd /usr/src/redhat/BUILD
+ cd sed-4.0.7
+ rm -rf /var/tmp/sed-root
+ exit 0
[root@dl585 redhat]# rpm -Uvh --force
/usr/src/redhat/RPMS/x86_64/sed-4.0.7-3.x86_64.rpm
Preparing...                ########################################### [100%]
   1:sed                    ########################################### [100%]
[root@dl585 redhat]# echo  'A{A' | sed 's/\{/x/' sed: -e expression #1, char 7:
Invalid preceding regular expression
[root@dl585 redhat]# rpm -q sed
sed-4.0.7-3


Comment 7 Petr Machata 2006-12-08 18:31:56 UTC
Yes, that's in accordance with my findings. Maybe there was some package
installed in time of sed RHEL-3 build, that configury script picked up, and some
#ifdef was defined because of this.  Later seds probably all be built in such a
way, that \{ reports Invalid preceding regular expression.  I'm eyeing sources.

Comment 8 Petr Machata 2006-12-08 20:56:12 UTC
sed uses glibc's regex matcher, whose flags are defined in regex.h.  Without -r,
sed uses RE_SYNTAX_POSIX_BASIC, with -r it uses RE_SYNTAX_POSIX_EXTENDED (plus
tweaking RE_ICASE, RE_NO_POSIX_BACKTRACKING, and RE_UNMATCHED_RIGHT_PAREN_ORD).
 The flagsets are compile-time computed and hardwired into the binary.  What I
think happened is that the precise contents of those flagsets changed between
the releases of glibc.  Thus sed-4.0.7-3 uses syntax flags 328390 (dec), while
sed-4.0.7-8.EL3 uses flags 17105606 (dec).  That's how I found for
sed-4.0.7-8.EL3, it's similar for older sed:

# gdb /bin/sed
(gdb) break re_set_syntax
Breakpoint 1 at 0x8048e8c
(gdb) run 's/\{/x/' x
Starting program: /bin/sed 's/\{/x/' x
(no debugging symbols found)...(no debugging symbols found)...Breakpoint 1 at
0x48917ed2

Breakpoint 1, 0x48917ed2 in re_set_syntax () from /lib/libc.so.6
(gdb) finish
Run till exit from #0  0x48917ed2 in re_set_syntax () from /lib/libc.so.6
0x0804dbd6 in ?? ()
(gdb) print re_syntax_options
$1 = 17105606

That gives us following flag maps:
sed-4.0.7-3         328390  0000:0000:0000:0101:0000:0010:1100:0110
sed-4.0.7-8.EL3   17105606  0000:0001:0000:0101:0000:0010:1100:0110

They differ in 25th flag, which, according to glibc sources, is:

/* If this bit is set, then \{ cannot be first in an bre or
   immediately after an alternation or begin-group operator.  */
#define RE_CONTEXT_INVALID_DUP (RE_CARET_ANCHORS_HERE << 1)

'b' in 'bre' here probably refers to RE_SYNTAX_POSIX_BASIC.

My conclusion is that the change of behavior comes from glibc, and that it's
probably glibc being more pedantic about POSIX conformance.  I'll ask Ulrich
Drepper to drop his opinion on that matter.  But I don't think I can do much
about it myself.

Comment 9 Petr Machata 2006-12-08 21:00:42 UTC
Correction: not Ulrich Drepper, but Jakub Jelinek owns glibc.  I'm NEEDINFO'ing
him.  Jakub, is my reasoning correct?  If so, I suppose this should be closed as
NOTABUG?  Thanks.

Comment 10 Jakub Jelinek 2006-12-08 21:22:11 UTC
Another correction, Uli is upstream glibc maintainer, I'm just RHEL/Fedora glibc
maintainer.
RE_SYNTAX_POSIX_BASIC as part of POSIX compliance fix, '\{' BRE is invalid, see
the BRE grammar at
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html
They just should use '{', without backslash, which is valid BRE which will DTRT.