From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529 Description of problem: The system hangs when asking gdb for a particular traceback while running a particular process. Both from X11/Gnome desktop and from console virtual terminal, there is no response to keyboard input, including <CTRL>C, attempts to switch virtual terminal using <CTRL><ALT><Fn>, <CTRL><ALT><DEL>, and any combination of <SHIFT> <CTRL> <ALT> <SysRq [PrintScreen]>. The system does repond to a ping from eth0, but not to ssh login (and sshd is running and has responded before.) Version-Release number of selected component (if applicable): kernel-2.4.21-1.1931.2.393.ent.athlon.rpm How reproducible: Always Steps to Reproduce: 1. Boot taroon, up2date as of 2003-08-16 0400 GMT (2100 PDT Friday). 2. Invoke gdb on proprietary application, plant breakpoint, run application, hit breakpoint several times, ask for traceback "bt". 3. Actual Results: No response to keyboard. Does respond to ping, but not to ssh login. On virtual console, screensaver timeout does activate [blanks screen], and screen cannot be restored by pressing any key. Must hardware reset and reboot. Expected Results: Ordinary traceback from gdb. Additional info: I also tried running "strace gdb my_app" on a text virtual console, and was able to type the last 8 lines into another system; see attachment. I didn't do anything special to activate SysRq, so please tell me if I need to do something here.
Created attachment 93686 [details] tail of output from "strace gdb my_app" The underscore '_' on the last line marks the position of the text cursor. [Hand typed from screen, but believed accurate.]
We will not be able to debug this without a test case that we can try ourselves. Please try to reproduce the problem without requiring your proprietary binaries.
While I try to prepare a portable testcase, please say whether SysRq (activated by me) would be helpful, and how to do so before I cause the hang.
Do not make state changes to bug reports, please. Seeing the dump from SysRq-T may be helpful. If SysRq handling is enabled (verify by using it before invoking the bug) then it tells us something whether it works in the wedge state or not. You might also try booting with nmi_watchdog=1.
I apologize [for a random click, I guess: click-to-focus-and-type disease]. I had no intention of changing anything other than making Additional Comments as a request for guidance on how to use SysRq. I do not find a straightforward recipe for how to use SysRq.
I was doubly sure not to make any extraneous clicks in posting that last Additional Comment, but I see that the bug is now in ASSIGNED state. So if I as originator respond to NEEDINFO by making Additional Comments, then it looks to me like the bug transitions to ASSIGNED automatically.
See Documentation/sysrq.txt in kernel sources. You need to make sure it's enabled with: echo 1 > /proc/sys/kernel/sysrq and then on the console you can press Alt+SysRq+letter (all at once I think) where the useful letters are p and t to print some info, h for help, and b to reboot. On a serial console you send a break and then type the letter.
Created attachment 93742 [details] last two processes shown by Alt+SysRq+t gdb (current) and the process being traced. Hung after entering "bt" in gdb while stopped at breakpoint. This was hand typed from a text console virtual terminal into another system, and hand verified by checking down the column of fields. I'll see if I can get a null modem cable to enable machine copying. I'm also looking into making the portable test case...
Created attachment 93755 [details] portable testcase which hangs system when run under gdb Using gdb to examine a page with protection "---p" causes the system to hang.
OK, see attachment of a few minutes ago for the crash a-la-carte.
This bug persists in http://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS/kernel-2.4.21-4.EL.src.rpm (RedHat Enterprise Linux Version 3, 21-Oct-2003 16:01, rpmbuild --target i686, booted on AMD athlon, rest of system is up2date taroon-beta2.) In my experience, the bug constitutes a local denial-of-service vulnerability. Any authorized local user can run the testcase above, with the immediate result that the system becomes unusable by all local users. My system responds to /bin/ping over ethernet, but it seems to me that user-level process scheduling is hung (on a uniprocessor, at least.)
A fix for this problem has been committed to the RHEL 3 Update 1 patch pool today. The first kernel build with this fix (only available for internal Red Hat testing) will be version 2.4.21-4.10.EL.
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2004-017.html