Bug 104632

Summary: thread apply all stops after message from command
Product: Red Hat Enterprise Linux 3 Reporter: Ulrich Drepper <drepper>
Component: gdbAssignee: Jeff Johnston <jjohnstn>
Status: CLOSED ERRATA QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0   
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-08 19:22:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ulrich Drepper 2003-09-18 07:53:13 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030912
Firebird/0.6.1+ StumbleUpon/1.83

Description of problem:
In a gdb sessions attached to a threaded application, I've defined a macro which
among other things run the 'bt' command.  I try to run

  thread apply all themacro

but the execution soon stops since one of the backtraces seems bogus to gdb and
it prints:

Previous frame identical to this frame (corrupt stack?)

This is because some of the frames don't have debug info (since they are asm
code).  The above message should not cause gdb to stop processing the macro and
the 'thread apply' loop.  Not even the next command in the macro is executed,
gdb just stops after printing an indication of the the frame.

Version-Release number of selected component (if applicable):
gdb-5.3.90-0.20030710.3

How reproducible:
Always

Steps to Reproduce:
1.create some threaded program which has at runtime stack frames which irritate gdb
2.attach to the process or start it in gdb
3.create a macro like 
define themacro
bt
x/4w $sp
end
4.run 'thread apply all themacro'
    

Actual Results:  Stops after the gdb message

Expected Results:  continue with the next command in the macro and then proceed
to the next thread

Additional info:

I haven't tried to artificially create some code which shows the problem.  Might
be tricky.  I constantly have the problem when debugging nptl, though. 
Recreating this environment is easy if it is necessary.

It is probably not necessary to use a threaded application.  Just define a macro
where there is another command after bt.  If the second command is executed even
though the bt command produces the warning I guess the thread apply loop will
continue, too.

Comment 2 Elena Zannoni 2003-09-18 17:07:51 UTC
Please provide a way to reproduce this.

If this involves hand generated debug info I suspect that the debug info for the
stack is not correct. Where does it barf? Do you have a copy of the debug info,
or whatever file creates the problem?

If there is an error in the execution of a command set, gdb does bail out, yes.
This is expected behavior.

Comment 3 Andrew Cagney 2003-09-18 20:23:12 UTC
try:

  gdb -nx --quiet <program> <pid> < script

Potential enhancement requests include command sequences such as:

  forthread th
    try
      thread apply $th bt
    end
  end

and a mechanism to, optionally, not abort when an error occures.

Comment 4 Andrew Cagney 2003-09-19 19:21:27 UTC
Elena Zannoni wrote:


> See Andrew's reply. There are ways to address this enhancement
> request.  However we need some examples of what's going wrong. And how
> this debug info (if any) gets generated. And a copy of/pointer to it,
> etc etc.


I don't complain about the broken backtrace, this is the applications
fault.  Yes, some day I might add debug info to the asm code but even
then it won't be perfect since some code cannot be annotated (long
story).  I have below the backtrace of a problem plus the stack dump.
Feel free to look at it.  Reproducing is also possible.  You need a test
program IBM provided.  I have a simplified version which I can upload.
I haven't tried running it on 2way machines.  It might not stall for a
long time.   I'm using a 4p (8 virtual p) machine which causes the stall
to appear rapidly.

Anyway, all I need is that if a macro contains the 'bt' command and this
command aborts with

  Previous frame identical to this frame (corrupt stack?)

the macro execution is not also aborted.  I see no reason for this, the
following commands don't depend on the bt output.  Maybe at least make
it selectable via a gdb variable (set macro-stop-on-error no).


Here's some program detail.  This is the complete backtrace as printed:

#0  0x00fc6bf2 in _dl_sysinfo_int80 () at rtld.c:274
#1  0x00f53d1b in __lll_mutex_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/i386/i486/lowlevelmutex.S:58
#2  0x0804b704 in std::__ioinit ()
#3  0x00f585fc in __JCR_LIST__ ()
   from
/home/drepper/local/glibc-build/20030916/nptl_test/../nptl/libpthread.so.0
#4  0x000007f8 in ?? ()
#5  0x00f512ae in _L_mutex_lock_27 ()
   from
/home/drepper/local/glibc-build/20030916/nptl_test/../nptl/libpthread.so.0
#6  0x0804b61c in __JCR_LIST__ ()
#7  0xfefcdc10 in ?? ()
#8  0x0538fa78 in ?? ()
#9  0x080495e2 in LDAP::Queue<WorkItem*>::deQueue(WorkItem**) (this=0x0,
    data=0xfffffffc) at Queue.hpp:127
Previous frame identical to this frame (corrupt stack?)


This is the stack content (first 256 words):

0x538fa3c:      0x00f53d1b      0x0804b704      0x00f585fc    0x000007f8
0x538fa4c:      0x00f512ae      0x0804b61c      0xfefcdc10    0x0538fa78
0x538fa5c:      0x080495e2      0x0804b704      0x0000000f    0x0538fa78
0x538fa6c:      0x007921c3      0x0804b61c      0x0000000f    0x0538faa8
0x538fa7c:      0x08049462      0xfefcdc10      0x0538fa98    0x00000000
0x538fa8c:      0x00000000      0x0538fa98      0x08049dda    0x00000000
0x538fa9c:      0x00f585fc      0x00000000      0x00000000    0x0538fac8
0x538faac:      0x080492a7      0x08b307f8      0x00000000    0x00000000
0x538fabc:      0x00000000      0x00f503ba      0x0538fae4    0x0538fb2c
0x538facc:      0x00f503d8      0x08b307f8      0x00000000    0x00000000
0x538fadc:      0x00000000      0x0538fbb0      0x00f585fc    0x00000000
0x538faec:      0x00000000      0x0538fb2c      0x0538fac4    0x00f503ba
0x538fafc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb0c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb1c:      0x00000000      0x00f5033c      0x00000000    0x00000000
0x538fb2c:      0x00000000      0x002b820a      0x0538fbb0    0x00000000
0x538fb3c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb4c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb5c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb6c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fb7c:      0x00000000      0x00000000      0x00000000    0x00308560
0x538fb8c:      0x0538fdfc      0x00000000      0x002ef2c0    0x002efac0
0x538fb9c:      0x002f00c0      0x00000000      0x00000000    0x00000000
0x538fbac:      0x00000000      0x0538fbb0      0x08b30814    0x0538fbb0
0x538fbbc:      0x00000001      0x00fc6bf0      0x00000000    0x00000000
0x538fbcc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fbdc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fbec:      0x00000000      0x019dbbf0      0x0681dbf0    0x000007f8
0x538fbfc:      0x00000000      0x0538fae4      0x00000000    0x00000000
0x538fc0c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc1c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc2c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc3c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc4c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc5c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc6c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc7c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc8c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fc9c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fcac:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fcbc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fccc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fcdc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fcec:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fcfc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd0c:      0x0538fc08      0x00000000      0x00000000    0x00000000
0x538fd1c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd2c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd3c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd4c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd5c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd6c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd7c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fd8c:      0x000007f6      0x00000001      0x00000001    0x31e46568
0x538fd9c:      0x000004ec      0x0538fbb0      0x00000001    0x00000000
0x538fdac:      0x00000000      0x00000000      0x08049298    0x08b307f8
0x538fdbc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fdcc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fddc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fdec:      0x00000000      0x0498f000      0x00a01000    0x00001000
0x538fdfc:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fe0c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fe1c:      0x00000000      0x00000000      0x00000000    0x00000000
0x538fe2c:      0x00000000      0x00000000      0x00000000    0x00000000


You can determine where bt gets its data from.  Why it does so is
another question.

The correct backtrace is

#0  0x00fc6bf2 in _dl_sysinfo_int80 () at rtld.c:274
#1  0x00f53d1b in __lll_mutex_lock_wait ()
#5  0x00f512ae in _L_mutex_lock_27 ()
#9  0x080495e2 in LDAP::Queue<WorkItem*>::deQueue(WorkItem**) (this=0x0,

None of the first three has debug info.  The _dl_sysinfo_int80 code has
call frame info, so I guess gdb uses it.  Note the #9 is at address
0x538fa5c on the stack while I cant find a value for #4 only at address
0x538fbf8.  No idea why the bt functions jumps back and forth.

Here's the memory layout in case you need to know:

001f8000-00308000 r-xp 00000000 08:02 1633944
/home/drepper-local/glibc-build/20030916/libc.so
00308000-0030b000 rw-p 0010f000 08:02 1633944
/home/drepper-local/glibc-build/20030916/libc.so
0030b000-0030e000 rw-p 00000000 00:00 0
00708000-007b1000 r-xp 00000000 08:02 165545     /usr/lib/libstdc++.so.5.0.3
007b1000-007b6000 rw-p 000a8000 08:02 165545     /usr/lib/libstdc++.so.5.0.3
007b6000-007bb000 rw-p 00000000 00:00 0
00b4a000-00b52000 r-xp 00000000 08:02 620167
/lib/libgcc_s-3.2.3-20030829.so.1
00b52000-00b53000 rw-p 00007000 08:02 620167
/lib/libgcc_s-3.2.3-20030829.so.1
00e0a000-00e2b000 r-xp 00000000 08:02 620276     /lib/tls/libm-2.3.2.so
00e2b000-00e2c000 rw-p 00020000 08:02 620276     /lib/tls/libm-2.3.2.so
00f4b000-00f58000 r-xp 00000000 08:02 1634087
/home/drepper-local/glibc-build/20030916/nptl/libpthread.so
00f58000-00f59000 rw-p 0000c000 08:02 1634087
/home/drepper-local/glibc-build/20030916/nptl/libpthread.so
00f59000-00f5b000 rw-p 00000000 00:00 0
00fc6000-00fda000 r-xp 00000000 08:02 1273299
/home/drepper-local/glibc-build/20030916/elf/ld.so
00fda000-00fdb000 rw-p 00014000 08:02 1273299
/home/drepper-local/glibc-build/20030916/elf/ld.so
00fdb000-00fdc000 ---p 00000000 00:00 0
00fdc000-019dc000 rwxp 00001000 00:00 0
01a73000-01a74000 ---p 00000000 00:00 0
01a74000-02474000 rwxp 00001000 00:00 0
02474000-02475000 ---p 00a01000 00:00 0
02475000-02e75000 rwxp 00a02000 00:00 0
02e75000-02e76000 ---p 01402000 00:00 0
02e76000-03876000 rwxp 01403000 00:00 0
03e28000-03e29000 ---p 00000000 00:00 0
03e29000-04829000 rwxp 00001000 00:00 0
0498f000-04990000 ---p 00000000 00:00 0
04990000-05390000 rwxp 00001000 00:00 0
05390000-05391000 ---p 00a01000 00:00 0
05391000-05d91000 rwxp 00a02000 00:00 0
05e1d000-05e1e000 ---p 00000000 00:00 0
05e1e000-0681e000 rwxp 00001000 00:00 0
06b83000-06b84000 ---p 00000000 00:00 0
06b84000-07584000 rwxp 00001000 00:00 0
07584000-07585000 ---p 00a01000 00:00 0
07585000-07f85000 rwxp 00a02000 00:00 0
08048000-0804b000 r-xp 00000000 08:02 1829000
/home/drepper-local/glibc-build/20030916/nptl_test/nptl_test
0804b000-0804c000 rw-p 00002000 08:02 1829000
/home/drepper-local/glibc-build/20030916/nptl_test/nptl_test
0804c000-0804d000 ---p 00000000 00:00 0
0804d000-08a4d000 rwxp 00001000 00:00 0
08b2f000-08b51000 rw-p 00000000 00:00 0
08b51000-08b52000 ---p 00000000 00:00 0
08b52000-09552000 rwxp 00001000 00:00 0
09552000-09553000 ---p 00a01000 00:00 0
09553000-09f53000 rwxp 00a02000 00:00 0
09f53000-09f54000 ---p 01402000 00:00 0
09f54000-0a954000 rwxp 01403000 00:00 0
0a954000-0a955000 ---p 01e03000 00:00 0
0a955000-0b355000 rwxp 01e04000 00:00 0
0b355000-0b356000 ---p 02804000 00:00 0
0b356000-0bd56000 rwxp 02805000 00:00 0
f6400000-f6421000 rw-p 00012000 00:00 0
f6421000-f6500000 ---p 00033000 00:00 0
f65ee000-f65ef000 rw-p 00000000 00:00 0
f65fe000-f6600000 rw-p 00000000 00:00 0
fefcd000-ff000000 rw-p fffdc000 00:00 0


The application usese 17 threads, therefore the numerous anonymous
memory blocks.



Comment 5 Jeff Johnston 2004-11-08 19:22:39 UTC
A fix has been made for this problem as of gdb 43.2.  Errors occurring
in a backtrace will not cause a macro command or "thread apply" to
stop prematurely.

Comment 6 John Flanagan 2004-12-21 19:36:54 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-561.html