Bug 109606

Summary: sed crashes on garbage multibyte sequence
Product: [Fedora] Fedora Reporter: Behdad Esfahbod <behdad>
Component: sedAssignee: Jakub Jelinek <jakub>
Status: CLOSED ERRATA QA Contact: Ben Levenson <benl>
Severity: medium Docs Contact:
Priority: medium    
Version: 1CC: florin
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 4.0.8-2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-12-30 09:47:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Behdad Esfahbod 2003-11-10 05:28:43 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031030 Epiphany/1.0.4

Description of problem:
Here's how you can get a SegFault on a random piece of garbage (if
it's big enough to contain a bad pattern):


[behdad@mces behdad]$ echo $LANG
en_US.UTF-8
[behdad@mces behdad]$ ll classicpoems.sql.gz
-rw-r--r--    1 behdad   behdad   14215947 Nov  9 23:39
classicpoems.sql.gz
[behdad@mces behdad]$ file classicpoems.sql.gz
classicpoems.sql.gz: gzip compressed data, was
"classicpoems.sql", from Unix
[behdad@mces behdad]$ time sed -e 's/./x/g' classicpoems.sql.gz >
/dev/null
Segmentation fault
                                                                     
                                                                     
                                                         
real    0m10.059s
user    0m9.020s
sys     0m0.000s
[behdad@mces behdad]$ gunzip classicpoems.sql.gz
[behdad@mces behdad]$ file classicpoems.sql
classicpoems.sql: UTF-8 Unicode text
[behdad@mces behdad]$ ll classicpoems.sql
-rw-r--r--    1 behdad   behdad   46405046 Nov  9 23:39
classicpoems.sql
[behdad@mces behdad]$ time sed -e 's/./x/g' classicpoems.sql > /dev/null
                                                                     
                                                                     
                                                         
real    1m27.675s
user    1m22.420s
sys     0m0.130s
[behdad@mces behdad]$
                                                                     
                                                                     
                                                         
                                                                     
                                                                     
                                                         
I could reproduce the same problem with other large pieces of
garbage too.


(breaking the report, as bugzilla simply does not work with a long one)



Version-Release number of selected component (if applicable):
sed-4.0.8-1

How reproducible:
Always

Steps to Reproduce:
1.  Find a 100MB file of garbage (win32 binary, ...)
2.  Run sed -e s/./x/g myfile > /dev/null

    

Actual Results:  Crashes with SegFault

Expected Results:  Should not crash.

Additional info:

BTW, what led me to this was doing performance tests.  See yourself
how bad is it currently:

[behdad@mces behdad]$ echo $LANG
en_US.UTF-8
[behdad@mces behdad]$ ll /bin/ls
-rwxr-xr-x    1 root     root        73460 Oct 12 04:50 /bin/ls
[behdad@mces behdad]$ time sed -e 's/./x/g' /bin/ls > /dev/null
                                                                     
                                                                     
                                                         
real    0m4.248s
user    0m3.800s
sys     0m0.000s
[behdad@mces behdad]$ time LANG=C sed -e 's/./x/g' /bin/ls >
/dev/null
                                                                     
                                                                     
                                                         
real    0m0.180s
user    0m0.050s
sys     0m0.000s




And /bin/ls is only 72kb!!!
And, I've got a few hundreds of memory free, so is NOT a memroy
exhausted problem.

Comment 1 Behdad Esfahbod 2003-11-10 05:29:15 UTC
Here is gdb-ed session:
                                                                     
                                                                     
                                                         
                                                                     
                                                                     
                                                         
[behdad@mces behdad]$ gdb sed
GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".
 
(gdb) set args -e s/./x/g classicpoems.sql.gz > /dev/null
(gdb) r
Starting program: /bin/sed -e s/./x/g classicpoems.sql.gz > /dev/null
 
Program received signal SIGSEGV, Segmentation fault.
0x004b877d in malloc_consolidate () from /lib/tls/libc.so.6
(gdb) bt
#0  0x004b877d in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x004b8628 in _int_free () from /lib/tls/libc.so.6
#2  0x004b8983 in _int_realloc () from /lib/tls/libc.so.6
#3  0x004b73bf in realloc () from /lib/tls/libc.so.6
#4  0x0805a862 in extend_buffers (mctx=0x57e300) at regexec.c:3764
#5  0x0805821a in transit_state (err=0xbfefe218, preg=0x827fc18,
mctx=0x30, state=0x827fe40, fl_search=0) at regexec.c:2088
#6  0x0805666e in check_matching (preg=0x827fc18, mctx=0xbfefe280,
fl_search=0, fl_longest_match=1) at regexec.c:1009
#7  0x08055f2b in re_search_internal (preg=0x827fc18, string=0x82813d0
"XXXXXXXXXXSOMEGARBAGE",
    length=1192, start=463, range=729, stop=-161375941, nmatch=1,
pmatch=0x827feb0, eflags=0) at regexec.c:744
#8  0x0805591a in re_search_stub (bufp=0x827fc18, string=0xf661993b
<Address 0xf661993b out of bounds>, length=1192, start=463, range=-1,
stop=-161375941, regs=0x805e0e0, ret_len=0)
    at regexec.c:411
#9  0x0805562e in re_search (bufp=0xf661993b, string=0xf661993b
<Address 0xf661993b out of bounds>, length=-599462864,
start=-161375941, range=-161375941, regs=0xf661993b) at regexec.c:281
#10 0x0804e0c5 in match_regex (regex=0x827fc18, buf=0xf661993b
<Address 0xf661993b out of bounds>, buflen=1192, buf_start_offset=463,
regarray=0x805e0e0, regsize=10) at regex.c:232
#11 0x0804d16b in do_subst (sub=0x827e868) at execute.c:976
#12 0x0804db08 in execute_program (vec=0x827e850, input=0xbff00490) at
execute.c:1307
#13 0x0804de82 in process_files (the_program=0x827e850, argv=0x0) at
execute.c:1524
#14 0x080499aa in main (argc=4, argv=0xbff00584) at sed.c:287
(gdb) f 7
#7  0x08055f2b in re_search_internal (preg=0x827fc18, string=0x82813d0
"XXXXXXXXXXSOMEGARBAGE",
    length=1192, start=463, range=729, stop=-161375941, nmatch=1,
pmatch=0x827feb0, eflags=0) at regexec.c:744
(gdb) p strlen(string)
$8 = 69
(gdb) x/69b string
0x83ed420:      0x30    0xec    0x44    0xdc    0x3b    0x99    0x61 
  0xf6
0x83ed428:      0x1f    0xc5    0x7c    0x7f    0xff    0xd5    0x87 
  0x42
0x83ed430:      0x5d    0x4d    0xe7    0x3d    0xf7    0x07    0x21 
  0x64
0x83ed438:      0xfd    0x22    0x9a    0x3a    0x7e    0x52    0xa9 
  0xca
0x83ed440:      0x86    0xec    0xbb    0x03    0x7b    0x3d    0x7e 
  0xd6
0x83ed448:      0x0f    0x6d    0xf5    0x5a    0x84    0x8a    0xf5 
  0x2c
0x83ed450:      0x06    0x2c    0x39    0xb8    0xb2    0xf2    0x77 
  0x50
0x83ed458:      0xca    0xf0    0x2f    0x45    0xdf    0x0f    0xd2 
  0x04
0x83ed460:      0xe1    0x5f    0xfd    0x02    0x4c
(gdb)


So this should be the suspected string which propbably can serve as a
test data now.


Comment 2 Jakub Jelinek 2003-11-10 08:30:28 UTC
I think you have to provide some sample data you get sed to reproduceably
segfault on. I've tried /bin/ls and /usr/bin/* and did not get it to
segfault on anything.

Comment 3 Behdad Esfahbod 2003-11-10 09:46:19 UTC
I couldn't find a small test case.  Smallest I have is a few
megabytes.  So try it on a huge binary file (~50mb).  I would find
some file available online that can play the fole, and send the URL. 
Also the hexdump above should be the case itself.  I will try and
clarify if it's the case.

Comment 4 Behdad Esfahbod 2003-11-10 10:09:05 UTC
Well, my original test case is here: (14mb)
http://www.cs.toronto.edu/~behdad/classicpoems.sql.gz

Comment 5 Florin Andrei 2003-11-18 02:34:09 UTC
Ah, yes! This looks a lot like the segfaults that plagued me when i
tested Fedora, so that i downgraded to RH9.
I'll re-install Fedora one of these days and apply the glibc update
and test what's been reported in this bugreport. I'll let you know.

Comment 6 Behdad Esfahbod 2003-12-30 09:47:37 UTC
Seems like it's already solved in sed-4.0.8-2.  Both the performance
problem and the crash has been fixed for me.

The changelog in RPM spec file says:

* Fri Nov 14 2003 Jakub Jelinek <jakub> 4.0.8-2
- enable --without-included-regex again
- use fastmap for regex searching


Closing as fixed.  right?

Comment 7 John Flanagan 2004-05-12 01:28:23 UTC
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-212.html