Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 3 product line. The current stable release is 3.9. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 102535

Summary: hang in ptrace for gdb traceback
Product: Red Hat Enterprise Linux 3 Reporter: John Reiser <jreiser>
Component: kernelAssignee: Roland McGrath <roland>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: mingo, petrides, roland
Target Milestone: ---   
Target Release: ---   
Hardware: athlon   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-16 22:09:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 101028    
Attachments:
Description Flags
tail of output from "strace gdb my_app"
none
last two processes shown by Alt+SysRq+t
none
portable testcase which hangs system when run under gdb none

Description John Reiser 2003-08-16 21:05:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529

Description of problem:
The system hangs when asking gdb for a particular traceback while running a
particular process.  Both from X11/Gnome desktop and from console virtual
terminal, there is no response to keyboard input, including <CTRL>C, attempts to
switch virtual terminal using <CTRL><ALT><Fn>, <CTRL><ALT><DEL>, and any
combination of <SHIFT> <CTRL> <ALT> <SysRq [PrintScreen]>.  The system does
repond to a ping from eth0, but not to ssh login (and sshd is running and has
responded before.)

Version-Release number of selected component (if applicable):
kernel-2.4.21-1.1931.2.393.ent.athlon.rpm

How reproducible:
Always

Steps to Reproduce:
1. Boot taroon, up2date as of 2003-08-16 0400 GMT (2100 PDT Friday).
2. Invoke gdb on proprietary application, plant breakpoint, run application, hit
breakpoint several times, ask for traceback "bt".
3.
    

Actual Results:  No response to keyboard.  Does respond to ping, but not to ssh
login. On virtual console, screensaver timeout does activate [blanks screen],
and screen cannot be restored by pressing any key.  Must hardware reset and reboot.

Expected Results:  Ordinary traceback from gdb.


Additional info:

I also tried running "strace gdb my_app" on a text virtual console,
and was able to type the last 8 lines into another system;  see attachment. 

I didn't do anything special to activate SysRq, so please tell me if I need to
do something here.

Comment 1 John Reiser 2003-08-16 21:07:28 UTC
Created attachment 93686 [details]
tail of output from "strace gdb my_app"

The underscore '_' on the last line marks the position of the text cursor.

[Hand typed from screen, but believed accurate.]

Comment 3 Roland McGrath 2003-08-18 22:00:41 UTC
We will not be able to debug this without a test case that we can try ourselves.
Please try to reproduce the problem without requiring your proprietary binaries.

Comment 4 John Reiser 2003-08-18 22:14:18 UTC
While I try to prepare a portable testcase, please say whether SysRq (activated
by me) would be helpful, and how to do so before I cause the hang.


Comment 5 Roland McGrath 2003-08-18 22:28:15 UTC
Do not make state changes to bug reports, please.

Seeing the dump from SysRq-T may be helpful.  If SysRq handling is enabled
(verify by using it before invoking the bug) then it tells us something whether
it works in the wedge state or not.
You might also try booting with nmi_watchdog=1.


Comment 6 John Reiser 2003-08-19 00:34:54 UTC
I apologize [for a random click, I guess: click-to-focus-and-type disease]. I
had no intention of changing anything other than making Additional Comments as a
request for guidance on how to use SysRq.  I do not find a straightforward
recipe for how to use SysRq.

Comment 7 John Reiser 2003-08-19 00:37:12 UTC
I was doubly sure not to make any extraneous clicks in posting that last
Additional Comment, but I see that the bug is now in ASSIGNED state.  So if I as
originator respond to NEEDINFO by making Additional Comments, then it looks to
me like the bug transitions to ASSIGNED automatically.

Comment 8 Roland McGrath 2003-08-19 00:42:16 UTC
See Documentation/sysrq.txt in kernel sources.
You need to make sure it's enabled with:
     echo 1 > /proc/sys/kernel/sysrq
and then on the console you can press Alt+SysRq+letter (all at once I think)
where the useful letters are p and t to print some info, h for help, and
b to reboot.  On a serial console you send a break and then type the letter.


Comment 9 John Reiser 2003-08-19 05:05:52 UTC
Created attachment 93742 [details]
last two processes shown by Alt+SysRq+t

gdb (current) and the process being traced.  Hung after entering "bt" in gdb
while stopped at breakpoint.

This was hand typed from a text console virtual terminal into another system,
and hand verified by checking down the column of fields.  I'll see if I can get
a null modem cable to enable machine copying.

I'm also looking into making the portable test case...

Comment 12 John Reiser 2003-08-19 18:36:15 UTC
Created attachment 93755 [details]
portable testcase which hangs system when run under gdb

Using gdb to examine a page with protection "---p" causes the system to hang.

Comment 13 John Reiser 2003-08-19 18:38:10 UTC
OK, see attachment of a few minutes ago for the crash a-la-carte.


Comment 14 John Reiser 2003-10-23 04:50:42 UTC
This bug persists in
http://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS/kernel-2.4.21-4.EL.src.rpm
(RedHat Enterprise Linux Version 3, 21-Oct-2003 16:01, rpmbuild --target i686,
booted on AMD athlon, rest of system is up2date taroon-beta2.)   In my
experience, the bug constitutes a local denial-of-service vulnerability.  Any
authorized local user can run the testcase above, with the immediate result that
the system becomes unusable by all local users.  My system responds to /bin/ping
over ethernet, but it seems to me that user-level process scheduling is hung (on
a uniprocessor, at least.)


Comment 16 Ernie Petrides 2003-11-06 11:34:22 UTC
A fix for this problem has been committed to the RHEL 3
Update 1 patch pool today.  The first kernel build with
this fix (only available for internal Red Hat testing)
will be version 2.4.21-4.10.EL.


Comment 17 Ernie Petrides 2004-12-03 01:33:24 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2004-017.html