102535 – hang in ptrace for gdb traceback

Bug 102535 - hang in ptrace for gdb traceback

Summary: hang in ptrace for gdb traceback

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	athlon
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Roland McGrath
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	101028
TreeView+	depends on / blocked

Reported:	2003-08-16 21:05 UTC by John Reiser
Modified:	2007-11-30 22:06 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-04-16 22:09:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
tail of output from "strace gdb my_app" (505 bytes, text/plain) 2003-08-16 21:07 UTC, John Reiser	no flags	Details
last two processes shown by Alt+SysRq+t (808 bytes, text/plain) 2003-08-19 05:05 UTC, John Reiser	no flags	Details
portable testcase which hangs system when run under gdb (336 bytes, text/plain) 2003-08-19 18:36 UTC, John Reiser	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2004:017	0	normal	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 1	2004-01-13 05:00:00 UTC

Description John Reiser 2003-08-16 21:05:22 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529

Description of problem:
The system hangs when asking gdb for a particular traceback while running a
particular process.  Both from X11/Gnome desktop and from console virtual
terminal, there is no response to keyboard input, including <CTRL>C, attempts to
switch virtual terminal using <CTRL><ALT><Fn>, <CTRL><ALT><DEL>, and any
combination of <SHIFT> <CTRL> <ALT> <SysRq [PrintScreen]>.  The system does
repond to a ping from eth0, but not to ssh login (and sshd is running and has
responded before.)

Version-Release number of selected component (if applicable):
kernel-2.4.21-1.1931.2.393.ent.athlon.rpm

How reproducible:
Always

Steps to Reproduce:
1. Boot taroon, up2date as of 2003-08-16 0400 GMT (2100 PDT Friday).
2. Invoke gdb on proprietary application, plant breakpoint, run application, hit
breakpoint several times, ask for traceback "bt".
3.
    

Actual Results:  No response to keyboard.  Does respond to ping, but not to ssh
login. On virtual console, screensaver timeout does activate [blanks screen],
and screen cannot be restored by pressing any key.  Must hardware reset and reboot.

Expected Results:  Ordinary traceback from gdb.


Additional info:

I also tried running "strace gdb my_app" on a text virtual console,
and was able to type the last 8 lines into another system;  see attachment. 

I didn't do anything special to activate SysRq, so please tell me if I need to
do something here.

Comment 1 John Reiser 2003-08-16 21:07:28 UTC

Created attachment 93686 [details]
tail of output from "strace gdb my_app"

The underscore '_' on the last line marks the position of the text cursor.

[Hand typed from screen, but believed accurate.]

Comment 3 Roland McGrath 2003-08-18 22:00:41 UTC

We will not be able to debug this without a test case that we can try ourselves.
Please try to reproduce the problem without requiring your proprietary binaries.

Comment 4 John Reiser 2003-08-18 22:14:18 UTC

While I try to prepare a portable testcase, please say whether SysRq (activated
by me) would be helpful, and how to do so before I cause the hang.

Comment 5 Roland McGrath 2003-08-18 22:28:15 UTC

Do not make state changes to bug reports, please.

Seeing the dump from SysRq-T may be helpful.  If SysRq handling is enabled
(verify by using it before invoking the bug) then it tells us something whether
it works in the wedge state or not.
You might also try booting with nmi_watchdog=1.

Comment 6 John Reiser 2003-08-19 00:34:54 UTC

I apologize [for a random click, I guess: click-to-focus-and-type disease]. I
had no intention of changing anything other than making Additional Comments as a
request for guidance on how to use SysRq.  I do not find a straightforward
recipe for how to use SysRq.

Comment 7 John Reiser 2003-08-19 00:37:12 UTC

I was doubly sure not to make any extraneous clicks in posting that last
Additional Comment, but I see that the bug is now in ASSIGNED state.  So if I as
originator respond to NEEDINFO by making Additional Comments, then it looks to
me like the bug transitions to ASSIGNED automatically.

Comment 8 Roland McGrath 2003-08-19 00:42:16 UTC

See Documentation/sysrq.txt in kernel sources.
You need to make sure it's enabled with:
     echo 1 > /proc/sys/kernel/sysrq
and then on the console you can press Alt+SysRq+letter (all at once I think)
where the useful letters are p and t to print some info, h for help, and
b to reboot.  On a serial console you send a break and then type the letter.

Comment 9 John Reiser 2003-08-19 05:05:52 UTC

Created attachment 93742 [details]
last two processes shown by Alt+SysRq+t

gdb (current) and the process being traced.  Hung after entering "bt" in gdb
while stopped at breakpoint.

This was hand typed from a text console virtual terminal into another system,
and hand verified by checking down the column of fields.  I'll see if I can get
a null modem cable to enable machine copying.

I'm also looking into making the portable test case...

Comment 12 John Reiser 2003-08-19 18:36:15 UTC

Created attachment 93755 [details]
portable testcase which hangs system when run under gdb

Using gdb to examine a page with protection "---p" causes the system to hang.

Comment 13 John Reiser 2003-08-19 18:38:10 UTC

OK, see attachment of a few minutes ago for the crash a-la-carte.

Comment 14 John Reiser 2003-10-23 04:50:42 UTC

This bug persists in
http://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS/kernel-2.4.21-4.EL.src.rpm
(RedHat Enterprise Linux Version 3, 21-Oct-2003 16:01, rpmbuild --target i686,
booted on AMD athlon, rest of system is up2date taroon-beta2.)   In my
experience, the bug constitutes a local denial-of-service vulnerability.  Any
authorized local user can run the testcase above, with the immediate result that
the system becomes unusable by all local users.  My system responds to /bin/ping
over ethernet, but it seems to me that user-level process scheduling is hung (on
a uniprocessor, at least.)

Comment 16 Ernie Petrides 2003-11-06 11:34:22 UTC

A fix for this problem has been committed to the RHEL 3
Update 1 patch pool today.  The first kernel build with
this fix (only available for internal Red Hat testing)
will be version 2.4.21-4.10.EL.

Comment 17 Ernie Petrides 2004-12-03 01:33:24 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html

Note You need to log in before you can comment on or make changes to this bug.