+++ This bug was initially created as a clone of Bug #193808 +++ Escalated to Bugzilla from IssueTracker -- Additional comment from tao on 2006-06-01 14:51 EST -- Date: Thu, 25 May 2006 16:29:47 -0700 (PDT) Date-warning: Date header was inserted by norm.llnl.gov From: Matt Wolfe <mwolfe> Subject: strace broke To: woodard9 Message-id: <0IZU00KCOH9NO1.gov> Content-transfer-encoding: 7BIT Ben, Please look at the strace core file I've given you, and log a bug, or tell me what I've done wrong with strace. This problem seems to be reproducable with: % srun -N1 -ppdebug -n2 -Il ircount_icc_9.0.030-g-O0 Also, please tell me exactly how to interpret the glibc message (it may relate to a user problem I'm chasing). Then, how can I get strace to append PID numbers to my -o file when I list more than one PID (-ff doesn't seem to work)? -Matt P.S. I haven't forgotten that I owe you a response from last week's issue. ------------------------------------------------------------------------ alc498{mwolfe}34: ps -ef | grep mwolfe root 5051 5050 0 15:54 ? 00:00:00 login -- mwolfe mwolfe 5052 5051 0 15:54 pts/0 00:00:00 -csh mwolfe 5302 5297 99 16:03 ? 00:00:06 ./ircount_icc_9.0.030-g-O0 mwolfe 5303 5297 99 16:03 ? 00:00:06 ./ircount_icc_9.0.030-g-O0 mwolfe 5306 5052 0 16:03 pts/0 00:00:00 ps -ef mwolfe 5307 5052 0 16:03 pts/0 00:00:00 grep mwolfe alc498{mwolfe}35: strace -ffi -ebrk -omybrkout -p5302 -p5303 Process 5302 attached - interrupt to quit Process 5303 attached - interrupt to quit Process 5303 detached Process 5302 detached *** glibc detected *** double free or corruption (top): 0x0807a5c8 *** Abort (core dumped) ------------------------------------------------------------------------ alc0{mwolfe}30: give -l ben ben has been given: 229376 May 25 16:20 alc498-strace-5308.core 194 May 25 16:24 ircount.c 6017 May 25 16:23 ircount_icc_9.0.030-g-O0 3 files You have given 3 files. This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from tao on 2006-06-01 14:51 EST -- File uploaded: ircount.c This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 it_file 61937 -- Additional comment from tao on 2006-06-01 14:51 EST -- File uploaded: ircount_icc_9.0.030-g-O0 This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 it_file 61938 -- Additional comment from tao on 2006-06-01 14:51 EST -- OK this reproduces easily and is obviously a bug and you don't need any fancy setup. Pick two root processes. Then as a normal user do something like: [ben@quince tmp]$ strace -ff -o output -p 2715 -p 2738 attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted *** glibc detected *** double free or corruption (top): 0x0951a7a8 *** Aborted The problem is tied to the -ff option. If you don't have the -ff option it doesn't happen. Should be an easy fix. I think this has got to be one of the cases where no one has ever tried to use double -p options with -ff before. This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from tao on 2006-06-01 14:51 EST -- [ben@quince tmp]$ gdb strace GNU gdb Red Hat Linux (6.3.0.0-1.96rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) set args -ff -o output -p 2715 -p 2738 (gdb) run Starting program: /usr/bin/strace -ff -o output -p 2715 -p 2738 attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted *** glibc detected *** double free or corruption (top): 0x081c67a8 *** Program received signal SIGABRT, Aborted. 0x00aa47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt #0 0x00aa47a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x00ae47f5 in raise () from /lib/tls/libc.so.6 #2 0x00ae6199 in abort () from /lib/tls/libc.so.6 #3 0x00b184ea in __libc_message () from /lib/tls/libc.so.6 #4 0x00b1ec6f in _int_free () from /lib/tls/libc.so.6 #5 0x00b1efea in free () from /lib/tls/libc.so.6 #6 0x00b0f516 in fclose@@GLIBC_2.1 () from /lib/tls/libc.so.6 #7 0x08049880 in droptcb (tcp=0x81c611c) at strace.c:1163 #8 0x0804b26e in main (argc=8, argv=0xbfe9cf24) at strace.c:473 #9 0x00ad1e23 in __libc_start_main () from /lib/tls/libc.so.6 #10 0x080494c1 in _start () (gdb) This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from tao on 2006-06-01 14:51 EST -- The problem appears to be this section of code. 368 else if ((outf = fopen(outfname, "w")) == NULL) { 369 fprintf(stderr, "%s: can't fopen '%s': %s\\n", 370 progname, outfname, strerror(errno)); 371 exit(1); 372 } 373 It needs some logic to handle the multiple -f options like handing it off to a pipe does: 354 if (followfork > 1) { 355 fprintf(stderr, "\\ 356 %s: piping the output and -ff are mutually exclusive options\\n", 357 progname); 358 exit(1); 359 } This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from tao on 2006-06-01 14:51 EST -- That is not to say that it is sufficient to just bomb out like the pipe option does. It means that the part of the code that opens the up the filehandles and sticks the outf into the tcp structure needs to be iterated through. This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from tao on 2006-06-01 14:51 EST -- File uploaded: strace-ff.patch This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 it_file 62251 -- Additional comment from tao on 2006-06-01 14:51 EST -- Basically, I felt like the temptation to just implement a patch which says, "don't do that" was so great, that I felt that by writing the patch myself, I could reduce the likelyhood that that would happen. Thus here is the patch. It probably could use some additional testing but it does seem to work on my system. Can we send this up to engineering now? I think that there is a high probability that upstream will need pretty much the same patch. Status set to: Waiting on Tech This event sent from IssueTracker by gavin [Support Engineering Group] issue 94541 -- Additional comment from woodard on 2006-06-01 15:05 EST -- Created an attachment (id=130359) fixes the problem for me. This fixes the problem for me. It probably could use a bit more testing than I gave it though. -- Additional comment from ezannoni on 2006-07-31 17:53 EST -- devel ack for rhel4.5 -- Additional comment from pm-rhel on 2006-08-18 11:44 EST -- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. -- Additional comment from borgan on 2006-09-05 18:09 EST -- QE ack for reproducer testcase ... -- Additional comment from roland on 2006-12-04 22:46 EST -- Ben, pls clone as fedora bug that is public and attach your patch there. Ideally, post upstream w/ChangeLog entry to strace-devel. I appreciate help while working w/broken arm. -- Additional comment from jakub on 2006-12-05 06:17 EST -- Done, as #218435.
*** This bug has been marked as a duplicate of 218435 *** *** This bug has been marked as a duplicate of 218435 ***