Bug 109606 - sed crashes on garbage multibyte sequence
sed crashes on garbage multibyte sequence
Product: Fedora
Classification: Fedora
Component: sed (Show other bugs)
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
Ben Levenson
Depends On:
  Show dependency treegraph
Reported: 2003-11-10 00:28 EST by Behdad Esfahbod
Modified: 2007-11-30 17:10 EST (History)
1 user (show)

See Also:
Fixed In Version: 4.0.8-2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2003-12-30 04:47:37 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Behdad Esfahbod 2003-11-10 00:28:43 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.1)
Gecko/20031030 Epiphany/1.0.4

Description of problem:
Here's how you can get a SegFault on a random piece of garbage (if
it's big enough to contain a bad pattern):

[behdad@mces behdad]$ echo $LANG
[behdad@mces behdad]$ ll classicpoems.sql.gz
-rw-r--r--    1 behdad   behdad   14215947 Nov  9 23:39
[behdad@mces behdad]$ file classicpoems.sql.gz
classicpoems.sql.gz: gzip compressed data, was
"classicpoems.sql", from Unix
[behdad@mces behdad]$ time sed -e 's/./x/g' classicpoems.sql.gz >
Segmentation fault
real    0m10.059s
user    0m9.020s
sys     0m0.000s
[behdad@mces behdad]$ gunzip classicpoems.sql.gz
[behdad@mces behdad]$ file classicpoems.sql
classicpoems.sql: UTF-8 Unicode text
[behdad@mces behdad]$ ll classicpoems.sql
-rw-r--r--    1 behdad   behdad   46405046 Nov  9 23:39
[behdad@mces behdad]$ time sed -e 's/./x/g' classicpoems.sql > /dev/null
real    1m27.675s
user    1m22.420s
sys     0m0.130s
[behdad@mces behdad]$
I could reproduce the same problem with other large pieces of
garbage too.

(breaking the report, as bugzilla simply does not work with a long one)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.  Find a 100MB file of garbage (win32 binary, ...)
2.  Run sed -e s/./x/g myfile > /dev/null


Actual Results:  Crashes with SegFault

Expected Results:  Should not crash.

Additional info:

BTW, what led me to this was doing performance tests.  See yourself
how bad is it currently:

[behdad@mces behdad]$ echo $LANG
[behdad@mces behdad]$ ll /bin/ls
-rwxr-xr-x    1 root     root        73460 Oct 12 04:50 /bin/ls
[behdad@mces behdad]$ time sed -e 's/./x/g' /bin/ls > /dev/null
real    0m4.248s
user    0m3.800s
sys     0m0.000s
[behdad@mces behdad]$ time LANG=C sed -e 's/./x/g' /bin/ls >
real    0m0.180s
user    0m0.050s
sys     0m0.000s

And /bin/ls is only 72kb!!!
And, I've got a few hundreds of memory free, so is NOT a memroy
exhausted problem.
Comment 1 Behdad Esfahbod 2003-11-10 00:29:15 EST
Here is gdb-ed session:
[behdad@mces behdad]$ gdb sed
GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) set args -e s/./x/g classicpoems.sql.gz > /dev/null
(gdb) r
Starting program: /bin/sed -e s/./x/g classicpoems.sql.gz > /dev/null
Program received signal SIGSEGV, Segmentation fault.
0x004b877d in malloc_consolidate () from /lib/tls/libc.so.6
(gdb) bt
#0  0x004b877d in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x004b8628 in _int_free () from /lib/tls/libc.so.6
#2  0x004b8983 in _int_realloc () from /lib/tls/libc.so.6
#3  0x004b73bf in realloc () from /lib/tls/libc.so.6
#4  0x0805a862 in extend_buffers (mctx=0x57e300) at regexec.c:3764
#5  0x0805821a in transit_state (err=0xbfefe218, preg=0x827fc18,
mctx=0x30, state=0x827fe40, fl_search=0) at regexec.c:2088
#6  0x0805666e in check_matching (preg=0x827fc18, mctx=0xbfefe280,
fl_search=0, fl_longest_match=1) at regexec.c:1009
#7  0x08055f2b in re_search_internal (preg=0x827fc18, string=0x82813d0
    length=1192, start=463, range=729, stop=-161375941, nmatch=1,
pmatch=0x827feb0, eflags=0) at regexec.c:744
#8  0x0805591a in re_search_stub (bufp=0x827fc18, string=0xf661993b
<Address 0xf661993b out of bounds>, length=1192, start=463, range=-1,
stop=-161375941, regs=0x805e0e0, ret_len=0)
    at regexec.c:411
#9  0x0805562e in re_search (bufp=0xf661993b, string=0xf661993b
<Address 0xf661993b out of bounds>, length=-599462864,
start=-161375941, range=-161375941, regs=0xf661993b) at regexec.c:281
#10 0x0804e0c5 in match_regex (regex=0x827fc18, buf=0xf661993b
<Address 0xf661993b out of bounds>, buflen=1192, buf_start_offset=463,
regarray=0x805e0e0, regsize=10) at regex.c:232
#11 0x0804d16b in do_subst (sub=0x827e868) at execute.c:976
#12 0x0804db08 in execute_program (vec=0x827e850, input=0xbff00490) at
#13 0x0804de82 in process_files (the_program=0x827e850, argv=0x0) at
#14 0x080499aa in main (argc=4, argv=0xbff00584) at sed.c:287
(gdb) f 7
#7  0x08055f2b in re_search_internal (preg=0x827fc18, string=0x82813d0
    length=1192, start=463, range=729, stop=-161375941, nmatch=1,
pmatch=0x827feb0, eflags=0) at regexec.c:744
(gdb) p strlen(string)
$8 = 69
(gdb) x/69b string
0x83ed420:      0x30    0xec    0x44    0xdc    0x3b    0x99    0x61 
0x83ed428:      0x1f    0xc5    0x7c    0x7f    0xff    0xd5    0x87 
0x83ed430:      0x5d    0x4d    0xe7    0x3d    0xf7    0x07    0x21 
0x83ed438:      0xfd    0x22    0x9a    0x3a    0x7e    0x52    0xa9 
0x83ed440:      0x86    0xec    0xbb    0x03    0x7b    0x3d    0x7e 
0x83ed448:      0x0f    0x6d    0xf5    0x5a    0x84    0x8a    0xf5 
0x83ed450:      0x06    0x2c    0x39    0xb8    0xb2    0xf2    0x77 
0x83ed458:      0xca    0xf0    0x2f    0x45    0xdf    0x0f    0xd2 
0x83ed460:      0xe1    0x5f    0xfd    0x02    0x4c

So this should be the suspected string which propbably can serve as a
test data now.
Comment 2 Jakub Jelinek 2003-11-10 03:30:28 EST
I think you have to provide some sample data you get sed to reproduceably
segfault on. I've tried /bin/ls and /usr/bin/* and did not get it to
segfault on anything.
Comment 3 Behdad Esfahbod 2003-11-10 04:46:19 EST
I couldn't find a small test case.  Smallest I have is a few
megabytes.  So try it on a huge binary file (~50mb).  I would find
some file available online that can play the fole, and send the URL. 
Also the hexdump above should be the case itself.  I will try and
clarify if it's the case.
Comment 4 Behdad Esfahbod 2003-11-10 05:09:05 EST
Well, my original test case is here: (14mb)
Comment 5 Florin Andrei 2003-11-17 21:34:09 EST
Ah, yes! This looks a lot like the segfaults that plagued me when i
tested Fedora, so that i downgraded to RH9.
I'll re-install Fedora one of these days and apply the glibc update
and test what's been reported in this bugreport. I'll let you know.
Comment 6 Behdad Esfahbod 2003-12-30 04:47:37 EST
Seems like it's already solved in sed-4.0.8-2.  Both the performance
problem and the crash has been fixed for me.

The changelog in RPM spec file says:

* Fri Nov 14 2003 Jakub Jelinek <jakub@redhat.com> 4.0.8-2
- enable --without-included-regex again
- use fastmap for regex searching

Closing as fixed.  right?
Comment 7 John Flanagan 2004-05-11 21:28:23 EDT
An errata has been issued which should help the problem described in this bug report. 
This report is therefore being closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, please follow the link below. You may reopen 
this bug report if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.