Bug 487700 - double free or corruption detected in ps
double free or corruption detected in ps
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: procps (Show other bugs)
5.3
All Linux
high Severity medium
: rc
: 5.5
Assigned To: Daniel Novotny
BaseOS QE
: Patch
Depends On:
Blocks: 499522
  Show dependency treegraph
 
Reported: 2009-02-27 10:00 EST by Olivier Fourdan
Modified: 2015-02-05 23:14 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-03-30 04:06:15 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
reproducer program (1.16 KB, application/octet-stream)
2009-09-02 05:01 EDT, Olivier Fourdan
no flags Details
Proposed patch (655 bytes, patch)
2009-09-02 06:04 EDT, Olivier Fourdan
no flags Details | Diff


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 29770 None None None Never

  None (edit)
Description Olivier Fourdan 2009-02-27 10:00:01 EST
Description of problem:

Customer is using "ps" within a script to monitor the processes and noticed that once "ps" died because of the glibc detecting a double free or corruption.

Version-Release number of selected component (if applicable):

procps-3.2.7-8.1.el5 on x86_64

How reproducible:

Cannot reproduce

Steps to Reproduce:

1. run "ps -e -o user -o pid -o ppid -o args"
  
Actual results:

*** glibc detected *** ps: double free or corruption (out): 0x000000000bacf680 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3912e6f444]
/lib64/libc.so.6(cfree+0x8c)[0x3912e72a6c]
ps[0x402393]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3912e1d8a4]
ps[0x4019f9]

Expected results:

No error

Additional info:

We do not have a core file for that crash, and as far as I know, the problem occurred only once for the customer and we have not found a way to reproduce that.

I tried reproduce the issue using very long cmdline using multibyte characters with teh "ja_JP.UTF-8" locale w/out success.

Customer suspected the length of the cmdline string could be the problem but:

1) The size of the cmdline in /proc/$PID/cmdline is limited to a PAGE_SIZE in kernels's "fs/proc/base.c":

    268 static int proc_pid_cmdline(struct task_struct *task, char * buffer)
    269 {
    270        int res = 0;
    271        unsigned int len;
    272        struct mm_struct *mm = get_task_mm(task);
               ...
    278        len = mm->arg_end - mm->arg_start;
    279
    280        if (len > PAGE_SIZE)
    281                len = PAGE_SIZE;

So the cmdline cannot exceed 4096, and:

2) the code in proc/readproc.c can deal with any size of cmdline anyway.

I suspected also the use of UTF8 (as the locale used is "ja_JP.UTF-8") with longer cmdline but could not produce the problem either.

The backtrace in the log gives:

    *** glibc detected *** ps: double free or corruption (out): 0x000000000bacf680 ***
    ======= Backtrace: =========
    /lib64/libc.so.6[0x3912e6f444]
    /lib64/libc.so.6(cfree+0x8c)[0x3912e72a6c]
    ps[0x402393]
    /lib64/libc.so.6(__libc_start_main+0xf4)[0x3912e1d8a4]
    ps[0x4019f9]
    ======= Memory map: ========
    00400000-00413000 r-xp 00000000 08:02 3124268                            /bin/ps
    00613000-00614000 rw-p 00013000 08:02 3124268                            /bin/ps
    00614000-00634000 rw-p 00614000 00:00 0 
    0bacd000-0baee000 rw-p 0bacd000 00:00 0 
    3912a00000-3912a1a000 r-xp 00000000 08:02 1887841                        /lib64/ld-2.5.so
    3912c19000-3912c1a000 r--p 00019000 08:02 1887841                        /lib64/ld-2.5.so
    3912c1a000-3912c1b000 rw-p 0001a000 08:02 1887841                        /lib64/ld-2.5.so
    3912e00000-3912f46000 r-xp 00000000 08:02 1887842                        /lib64/libc-2.5.so
    3912f46000-3913146000 ---p 00146000 08:02 1887842                        /lib64/libc-2.5.so
    3913146000-391314a000 r--p 00146000 08:02 1887842                        /lib64/libc-2.5.so
    391314a000-391314b000 rw-p 0014a000 08:02 1887842                        /lib64/libc-2.5.so
    391314b000-3913150000 rw-p 391314b000 00:00 0 
    3913200000-3913202000 r-xp 00000000 08:02 1887843                        /lib64/libdl-2.5.so
    3913202000-3913402000 ---p 00002000 08:02 1887843                        /lib64/libdl-2.5.so
    3913402000-3913403000 r--p 00002000 08:02 1887843                        /lib64/libdl-2.5.so
    3913403000-3913404000 rw-p 00003000 08:02 1887843                        /lib64/libdl-2.5.so
    3913600000-391360d000 r-xp 00000000 08:02 1887857                        /lib64/libproc-3.2.7.so
    391360d000-391380d000 ---p 0000d000 08:02 1887857                        /lib64/libproc-3.2.7.so
    391380d000-391380e000 rw-p 0000d000 08:02 1887857                        /lib64/libproc-3.2.7.so
    391380e000-3913822000 rw-p 391380e000 00:00 0 
    3922400000-392240d000 r-xp 00000000 08:02 1887848                        /lib64/libgcc_s-4.1.2-20070626.so.1
    392240d000-392260d000 ---p 0000d000 08:02 1887848                        /lib64/libgcc_s-4.1.2-20070626.so.1
    392260d000-392260e000 rw-p 0000d000 08:02 1887848                        /lib64/libgcc_s-4.1.2-20070626.so.1
    2aaaaaaab000-2aaaaaaac000 rw-p 2aaaaaaab000 00:00 0 
    2aaaaaac7000-2aaaaaaeb000 rw-p 2aaaaaac7000 00:00 0 
    2aaaaaaeb000-2aaaaaaec000 ---p 2aaaaaaeb000 00:00 0 
    2aaaaaaec000-2aaaaaaed000 rw-p 2aaaaaaec000 00:00 0 
    2aaaaab07000-2aaaaab11000 r-xp 00000000 08:02 1887580                    /lib64/libnss_files-2.5.so
    2aaaaab11000-2aaaaad10000 ---p 0000a000 08:02 1887580                    /lib64/libnss_files-2.5.so
    2aaaaad10000-2aaaaad11000 r--p 00009000 08:02 1887580                    /lib64/libnss_files-2.5.so
    2aaaaad11000-2aaaaad12000 rw-p 0000a000 08:02 1887580                    /lib64/libnss_files-2.5.so
    2aaaac000000-2aaaac021000 rw-p 2aaaac000000 00:00 0 
    2aaaac021000-2aaab0000000 ---p 2aaaac021000 00:00 0 
    7fff37085000-7fff3709a000 rw-p 7fff37085000 00:00 0                      [stack]
    ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]


    Signal 6 (ABRT) caught by ps (procps version 3.2.7).
    Please send bug reports to <feedback@lists.sf.net> or <albert@users.sf.net>

Checking the the address that lead to the free() error gives:

    (gdb) list *0x402393
    0x402393 is in main (ps/display.c:343).
    338       case TF_show_proc:                   // normal non-thread output
    339         while(readproc(ptp,&buf)){
    340           if(want_this_proc(&buf)){
    341             show_one_proc(&buf, proc_format_list);
    342           }
    343           if(buf.cmdline) free((void*)*buf.cmdline); // ought to reuse
    344           if(buf.environ) free((void*)*buf.environ); // ought to reuse
    345         }
    346         break;


I checked the code but did not spot any obvious problem. Those buffers are allocated by readproc() that translates to simple_readproc() and simple_readproc() uses file2strvec() to allocate the memory and read the data from the proc file.

I do not see want_this_proc() nor show_one_proc() allocating or deallocating memory so I don't see the problem coming from these functions (although it seems to use non multibyte character aware function such as strlen() to compute output column alignment, but that should not cause any problem other than wrong output alignment).

I've now provided the customer with a slightly modified version of procps that does *not* catch the sigsegv and sigabrt signals so we can have a chance to capture a core file when/if the problem reoccurs, and in parallel, I am also escalating this issue to bugzilla to get Engineering opinion on this as I fail to find what could have caused the memory corruption reported by the glibc in "ps".
Comment 1 RHEL Product and Program Management 2009-03-26 13:27:18 EDT
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".
Comment 3 Daniel Novotny 2009-05-11 07:09:36 EDT
hello, since the problem occured only *once* and the issue tracker is closed, can I close this as WORKSFORME?
Comment 6 Olivier Fourdan 2009-09-02 05:01:10 EDT
Created attachment 359482 [details]
reproducer program

Attaching reproducer and procedure.

To reproduce:

1) Build the two executables create_zombie and dummy_sleep:

   $ make

2) Run "dummy_sleep" in a loop:

   $ for i in `seq 1 1 10000`; do ./create_zombie 2 & done

3) In a separate terminal/console, run ps -eo pid,args in a loop

   $ while $(ps -eo pid,args > log.txt); do /bin/true; done

Actual results:

ps will abort after a few seconds with a: 

  *** glibc detected *** ps: double free or corruption (out) ***

Expected results:

  ps does not abort

Additional info:

The problem is related to the patch from bug#134516 ("ps truncates line to 
2048 characters") and more precisely to that change:

  https://bugzilla.redhat.com/show_bug.cgi?id=134516#c24

Using:

  while ((n = read(fd, buf, sizeof buf - 1)) > 0)

Instead of:

  while ((n = read(fd, buf, sizeof buf - 1)) >= 0)

does not trigger the corruption but I am not entirely sure why...
Comment 8 Olivier Fourdan 2009-09-02 06:04:37 EDT
Created attachment 359498 [details]
Proposed patch

I think what happens is the following:

With "while ((n = read(fd, buf, sizeof buf - 1)) >= 0)", "end_of_file" is set to 1 by:

        if (n < (int)(sizeof buf - 1))
            end_of_file = 1;
 
At the same time, with n = 0, buf[n-1] points to uninitialized data, so the value of buf[n-1] is likely to be not null, therefore the test is false:

        if (end_of_file && buf[n-1])            /* last read char not null */
            buf[n++] = '\0';                    /* so append null-terminator */

So no null-terminator is inserted. And that breaks the computation of the string array entries later in the code.

Adding a test for n == 0 avoids the problem:

        if (end_of_file && (n == 0 || buf[n-1]))/* last read char not null */
            buf[n++] = '\0';                    /* so append null-terminator */

The reproducer works fine with that patch.
Comment 10 Tomas Smetana 2009-09-04 05:41:10 EDT
Same problem present in RHEL-4 (bug #521200). Same patch fixes the problem.
Comment 14 Daniel Novotny 2009-11-19 09:39:15 EST
fixed in procps-3.2.7-12.el5
Comment 18 errata-xmlrpc 2010-03-30 04:06:15 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0200.html

Note You need to log in before you can comment on or make changes to this bug.