Red Hat Bugzilla – Bug 102535
hang in ptrace for gdb traceback
Last modified: 2007-11-30 17:06:57 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529
Description of problem:
The system hangs when asking gdb for a particular traceback while running a
particular process. Both from X11/Gnome desktop and from console virtual
terminal, there is no response to keyboard input, including <CTRL>C, attempts to
switch virtual terminal using <CTRL><ALT><Fn>, <CTRL><ALT><DEL>, and any
combination of <SHIFT> <CTRL> <ALT> <SysRq [PrintScreen]>. The system does
repond to a ping from eth0, but not to ssh login (and sshd is running and has
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Boot taroon, up2date as of 2003-08-16 0400 GMT (2100 PDT Friday).
2. Invoke gdb on proprietary application, plant breakpoint, run application, hit
breakpoint several times, ask for traceback "bt".
Actual Results: No response to keyboard. Does respond to ping, but not to ssh
login. On virtual console, screensaver timeout does activate [blanks screen],
and screen cannot be restored by pressing any key. Must hardware reset and reboot.
Expected Results: Ordinary traceback from gdb.
I also tried running "strace gdb my_app" on a text virtual console,
and was able to type the last 8 lines into another system; see attachment.
I didn't do anything special to activate SysRq, so please tell me if I need to
do something here.
Created attachment 93686 [details]
tail of output from "strace gdb my_app"
The underscore '_' on the last line marks the position of the text cursor.
[Hand typed from screen, but believed accurate.]
We will not be able to debug this without a test case that we can try ourselves.
Please try to reproduce the problem without requiring your proprietary binaries.
While I try to prepare a portable testcase, please say whether SysRq (activated
by me) would be helpful, and how to do so before I cause the hang.
Do not make state changes to bug reports, please.
Seeing the dump from SysRq-T may be helpful. If SysRq handling is enabled
(verify by using it before invoking the bug) then it tells us something whether
it works in the wedge state or not.
You might also try booting with nmi_watchdog=1.
I apologize [for a random click, I guess: click-to-focus-and-type disease]. I
had no intention of changing anything other than making Additional Comments as a
request for guidance on how to use SysRq. I do not find a straightforward
recipe for how to use SysRq.
I was doubly sure not to make any extraneous clicks in posting that last
Additional Comment, but I see that the bug is now in ASSIGNED state. So if I as
originator respond to NEEDINFO by making Additional Comments, then it looks to
me like the bug transitions to ASSIGNED automatically.
See Documentation/sysrq.txt in kernel sources.
You need to make sure it's enabled with:
echo 1 > /proc/sys/kernel/sysrq
and then on the console you can press Alt+SysRq+letter (all at once I think)
where the useful letters are p and t to print some info, h for help, and
b to reboot. On a serial console you send a break and then type the letter.
Created attachment 93742 [details]
last two processes shown by Alt+SysRq+t
gdb (current) and the process being traced. Hung after entering "bt" in gdb
while stopped at breakpoint.
This was hand typed from a text console virtual terminal into another system,
and hand verified by checking down the column of fields. I'll see if I can get
a null modem cable to enable machine copying.
I'm also looking into making the portable test case...
Created attachment 93755 [details]
portable testcase which hangs system when run under gdb
Using gdb to examine a page with protection "---p" causes the system to hang.
OK, see attachment of a few minutes ago for the crash a-la-carte.
This bug persists in
(RedHat Enterprise Linux Version 3, 21-Oct-2003 16:01, rpmbuild --target i686,
booted on AMD athlon, rest of system is up2date taroon-beta2.) In my
experience, the bug constitutes a local denial-of-service vulnerability. Any
authorized local user can run the testcase above, with the immediate result that
the system becomes unusable by all local users. My system responds to /bin/ping
over ethernet, but it seems to me that user-level process scheduling is hung (on
a uniprocessor, at least.)
A fix for this problem has been committed to the RHEL 3
Update 1 patch pool today. The first kernel build with
this fix (only available for internal Red Hat testing)
will be version 2.4.21-4.10.EL.
An errata has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.