From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030912 Firebird/0.6.1+ StumbleUpon/1.83 Description of problem: In a gdb sessions attached to a threaded application, I've defined a macro which among other things run the 'bt' command. I try to run thread apply all themacro but the execution soon stops since one of the backtraces seems bogus to gdb and it prints: Previous frame identical to this frame (corrupt stack?) This is because some of the frames don't have debug info (since they are asm code). The above message should not cause gdb to stop processing the macro and the 'thread apply' loop. Not even the next command in the macro is executed, gdb just stops after printing an indication of the the frame. Version-Release number of selected component (if applicable): gdb-5.3.90-0.20030710.3 How reproducible: Always Steps to Reproduce: 1.create some threaded program which has at runtime stack frames which irritate gdb 2.attach to the process or start it in gdb 3.create a macro like define themacro bt x/4w $sp end 4.run 'thread apply all themacro' Actual Results: Stops after the gdb message Expected Results: continue with the next command in the macro and then proceed to the next thread Additional info: I haven't tried to artificially create some code which shows the problem. Might be tricky. I constantly have the problem when debugging nptl, though. Recreating this environment is easy if it is necessary. It is probably not necessary to use a threaded application. Just define a macro where there is another command after bt. If the second command is executed even though the bt command produces the warning I guess the thread apply loop will continue, too.
Please provide a way to reproduce this. If this involves hand generated debug info I suspect that the debug info for the stack is not correct. Where does it barf? Do you have a copy of the debug info, or whatever file creates the problem? If there is an error in the execution of a command set, gdb does bail out, yes. This is expected behavior.
try: gdb -nx --quiet <program> <pid> < script Potential enhancement requests include command sequences such as: forthread th try thread apply $th bt end end and a mechanism to, optionally, not abort when an error occures.
Elena Zannoni wrote: > See Andrew's reply. There are ways to address this enhancement > request. However we need some examples of what's going wrong. And how > this debug info (if any) gets generated. And a copy of/pointer to it, > etc etc. I don't complain about the broken backtrace, this is the applications fault. Yes, some day I might add debug info to the asm code but even then it won't be perfect since some code cannot be annotated (long story). I have below the backtrace of a problem plus the stack dump. Feel free to look at it. Reproducing is also possible. You need a test program IBM provided. I have a simplified version which I can upload. I haven't tried running it on 2way machines. It might not stall for a long time. I'm using a 4p (8 virtual p) machine which causes the stall to appear rapidly. Anyway, all I need is that if a macro contains the 'bt' command and this command aborts with Previous frame identical to this frame (corrupt stack?) the macro execution is not also aborted. I see no reason for this, the following commands don't depend on the bt output. Maybe at least make it selectable via a gdb variable (set macro-stop-on-error no). Here's some program detail. This is the complete backtrace as printed: #0 0x00fc6bf2 in _dl_sysinfo_int80 () at rtld.c:274 #1 0x00f53d1b in __lll_mutex_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i486/lowlevelmutex.S:58 #2 0x0804b704 in std::__ioinit () #3 0x00f585fc in __JCR_LIST__ () from /home/drepper/local/glibc-build/20030916/nptl_test/../nptl/libpthread.so.0 #4 0x000007f8 in ?? () #5 0x00f512ae in _L_mutex_lock_27 () from /home/drepper/local/glibc-build/20030916/nptl_test/../nptl/libpthread.so.0 #6 0x0804b61c in __JCR_LIST__ () #7 0xfefcdc10 in ?? () #8 0x0538fa78 in ?? () #9 0x080495e2 in LDAP::Queue<WorkItem*>::deQueue(WorkItem**) (this=0x0, data=0xfffffffc) at Queue.hpp:127 Previous frame identical to this frame (corrupt stack?) This is the stack content (first 256 words): 0x538fa3c: 0x00f53d1b 0x0804b704 0x00f585fc 0x000007f8 0x538fa4c: 0x00f512ae 0x0804b61c 0xfefcdc10 0x0538fa78 0x538fa5c: 0x080495e2 0x0804b704 0x0000000f 0x0538fa78 0x538fa6c: 0x007921c3 0x0804b61c 0x0000000f 0x0538faa8 0x538fa7c: 0x08049462 0xfefcdc10 0x0538fa98 0x00000000 0x538fa8c: 0x00000000 0x0538fa98 0x08049dda 0x00000000 0x538fa9c: 0x00f585fc 0x00000000 0x00000000 0x0538fac8 0x538faac: 0x080492a7 0x08b307f8 0x00000000 0x00000000 0x538fabc: 0x00000000 0x00f503ba 0x0538fae4 0x0538fb2c 0x538facc: 0x00f503d8 0x08b307f8 0x00000000 0x00000000 0x538fadc: 0x00000000 0x0538fbb0 0x00f585fc 0x00000000 0x538faec: 0x00000000 0x0538fb2c 0x0538fac4 0x00f503ba 0x538fafc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb0c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb1c: 0x00000000 0x00f5033c 0x00000000 0x00000000 0x538fb2c: 0x00000000 0x002b820a 0x0538fbb0 0x00000000 0x538fb3c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb4c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb5c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb6c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fb7c: 0x00000000 0x00000000 0x00000000 0x00308560 0x538fb8c: 0x0538fdfc 0x00000000 0x002ef2c0 0x002efac0 0x538fb9c: 0x002f00c0 0x00000000 0x00000000 0x00000000 0x538fbac: 0x00000000 0x0538fbb0 0x08b30814 0x0538fbb0 0x538fbbc: 0x00000001 0x00fc6bf0 0x00000000 0x00000000 0x538fbcc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fbdc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fbec: 0x00000000 0x019dbbf0 0x0681dbf0 0x000007f8 0x538fbfc: 0x00000000 0x0538fae4 0x00000000 0x00000000 0x538fc0c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc1c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc2c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc3c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc4c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc5c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc6c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc7c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc8c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fc9c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fcac: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fcbc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fccc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fcdc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fcec: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fcfc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd0c: 0x0538fc08 0x00000000 0x00000000 0x00000000 0x538fd1c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd2c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd3c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd4c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd5c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd6c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd7c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fd8c: 0x000007f6 0x00000001 0x00000001 0x31e46568 0x538fd9c: 0x000004ec 0x0538fbb0 0x00000001 0x00000000 0x538fdac: 0x00000000 0x00000000 0x08049298 0x08b307f8 0x538fdbc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fdcc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fddc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fdec: 0x00000000 0x0498f000 0x00a01000 0x00001000 0x538fdfc: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fe0c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fe1c: 0x00000000 0x00000000 0x00000000 0x00000000 0x538fe2c: 0x00000000 0x00000000 0x00000000 0x00000000 You can determine where bt gets its data from. Why it does so is another question. The correct backtrace is #0 0x00fc6bf2 in _dl_sysinfo_int80 () at rtld.c:274 #1 0x00f53d1b in __lll_mutex_lock_wait () #5 0x00f512ae in _L_mutex_lock_27 () #9 0x080495e2 in LDAP::Queue<WorkItem*>::deQueue(WorkItem**) (this=0x0, None of the first three has debug info. The _dl_sysinfo_int80 code has call frame info, so I guess gdb uses it. Note the #9 is at address 0x538fa5c on the stack while I cant find a value for #4 only at address 0x538fbf8. No idea why the bt functions jumps back and forth. Here's the memory layout in case you need to know: 001f8000-00308000 r-xp 00000000 08:02 1633944 /home/drepper-local/glibc-build/20030916/libc.so 00308000-0030b000 rw-p 0010f000 08:02 1633944 /home/drepper-local/glibc-build/20030916/libc.so 0030b000-0030e000 rw-p 00000000 00:00 0 00708000-007b1000 r-xp 00000000 08:02 165545 /usr/lib/libstdc++.so.5.0.3 007b1000-007b6000 rw-p 000a8000 08:02 165545 /usr/lib/libstdc++.so.5.0.3 007b6000-007bb000 rw-p 00000000 00:00 0 00b4a000-00b52000 r-xp 00000000 08:02 620167 /lib/libgcc_s-3.2.3-20030829.so.1 00b52000-00b53000 rw-p 00007000 08:02 620167 /lib/libgcc_s-3.2.3-20030829.so.1 00e0a000-00e2b000 r-xp 00000000 08:02 620276 /lib/tls/libm-2.3.2.so 00e2b000-00e2c000 rw-p 00020000 08:02 620276 /lib/tls/libm-2.3.2.so 00f4b000-00f58000 r-xp 00000000 08:02 1634087 /home/drepper-local/glibc-build/20030916/nptl/libpthread.so 00f58000-00f59000 rw-p 0000c000 08:02 1634087 /home/drepper-local/glibc-build/20030916/nptl/libpthread.so 00f59000-00f5b000 rw-p 00000000 00:00 0 00fc6000-00fda000 r-xp 00000000 08:02 1273299 /home/drepper-local/glibc-build/20030916/elf/ld.so 00fda000-00fdb000 rw-p 00014000 08:02 1273299 /home/drepper-local/glibc-build/20030916/elf/ld.so 00fdb000-00fdc000 ---p 00000000 00:00 0 00fdc000-019dc000 rwxp 00001000 00:00 0 01a73000-01a74000 ---p 00000000 00:00 0 01a74000-02474000 rwxp 00001000 00:00 0 02474000-02475000 ---p 00a01000 00:00 0 02475000-02e75000 rwxp 00a02000 00:00 0 02e75000-02e76000 ---p 01402000 00:00 0 02e76000-03876000 rwxp 01403000 00:00 0 03e28000-03e29000 ---p 00000000 00:00 0 03e29000-04829000 rwxp 00001000 00:00 0 0498f000-04990000 ---p 00000000 00:00 0 04990000-05390000 rwxp 00001000 00:00 0 05390000-05391000 ---p 00a01000 00:00 0 05391000-05d91000 rwxp 00a02000 00:00 0 05e1d000-05e1e000 ---p 00000000 00:00 0 05e1e000-0681e000 rwxp 00001000 00:00 0 06b83000-06b84000 ---p 00000000 00:00 0 06b84000-07584000 rwxp 00001000 00:00 0 07584000-07585000 ---p 00a01000 00:00 0 07585000-07f85000 rwxp 00a02000 00:00 0 08048000-0804b000 r-xp 00000000 08:02 1829000 /home/drepper-local/glibc-build/20030916/nptl_test/nptl_test 0804b000-0804c000 rw-p 00002000 08:02 1829000 /home/drepper-local/glibc-build/20030916/nptl_test/nptl_test 0804c000-0804d000 ---p 00000000 00:00 0 0804d000-08a4d000 rwxp 00001000 00:00 0 08b2f000-08b51000 rw-p 00000000 00:00 0 08b51000-08b52000 ---p 00000000 00:00 0 08b52000-09552000 rwxp 00001000 00:00 0 09552000-09553000 ---p 00a01000 00:00 0 09553000-09f53000 rwxp 00a02000 00:00 0 09f53000-09f54000 ---p 01402000 00:00 0 09f54000-0a954000 rwxp 01403000 00:00 0 0a954000-0a955000 ---p 01e03000 00:00 0 0a955000-0b355000 rwxp 01e04000 00:00 0 0b355000-0b356000 ---p 02804000 00:00 0 0b356000-0bd56000 rwxp 02805000 00:00 0 f6400000-f6421000 rw-p 00012000 00:00 0 f6421000-f6500000 ---p 00033000 00:00 0 f65ee000-f65ef000 rw-p 00000000 00:00 0 f65fe000-f6600000 rw-p 00000000 00:00 0 fefcd000-ff000000 rw-p fffdc000 00:00 0 The application usese 17 threads, therefore the numerous anonymous memory blocks.
A fix has been made for this problem as of gdb 43.2. Errors occurring in a backtrace will not cause a macro command or "thread apply" to stop prematurely.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-561.html