Bug 189607

Summary: pstack can cause process to suspend
Product: Red Hat Enterprise Linux 3 Reporter: Bastien Nocera <bnocera>
Component: gdbAssignee: Jan Kratochvil <jan.kratochvil>
Status: CLOSED ERRATA QA Contact: Jay Turner <jturner>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0CC: cagney, srevivo
Target Milestone: ---Keywords: Regression, Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2007-0469 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-06-11 17:50:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 233746, 233852, 233853    
Attachments:
Description Flags
signal_test.c
none
a modified testcase none

Description Bastien Nocera 2006-04-21 15:45:15 UTC
kernel-2.4.21-32.ELsmp

First terminal:
- compile signal_test:
gcc -Wall -g -o signal_test signal_test.c
- run signal_test

Second terminal:
- killall -ALRM signal_test
- while true ; do pstack `pidof signal_test` > /dev/null ; done

Let it run for a little while (about 5 minutes in my tests), and as soon as you
type "Ctrl+C" in the second terminal, signal_test will be in "stopped" state.

This line will show on the first terminal:
[1]+  Stopped                 ./signal_test

This also happens on RHEL4.

Comment 1 Bastien Nocera 2006-04-21 15:45:16 UTC
Created attachment 128091 [details]
signal_test.c

Comment 2 Ernie Petrides 2006-04-21 20:13:34 UTC
RHEL3 is now closed.

Comment 3 Daniel Riek 2006-12-18 23:17:21 UTC
According to Samsung in IT 96713 this is a regression introduced in U7. So
setting the Regression keyword.

The report seems to be pre-U8 and we suspect that this might be a dupe of 196938
which is fixed in RHSA-2006-0437... Can someone please verify that?

Comment 4 RHEL Program Management 2006-12-18 23:48:30 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 7 Michal Schmidt 2007-02-07 13:06:12 UTC
I can easily reproduce the bug on RHEL3 U8. It occurs on SMP only, it never
triggered on UP.
Even if I install any of the older RHEL3 kernels, the bug is always perfectly
reproducible (I tried kernel-smp packages from RHEL3 GOLD,U4,U5,U6,U7). I also
tried the testcase with older versions of gdb and pstack packages (from
GOLD,U1,U2,...,U7) - the bug triggers with these too.
The bug is reproducible in RHEL4 and FC6 too.
So it is not a dupe of 196938. And I don't think it's a regression.
It seems to be fixed in current RHEL5. Further testing showed that it is not the
kernel that matters, but gdb. On RHEL5 the bug is reproducible with
gdb-6.5-13.el5 and seems to be fixed with gdb-6.5-14.el5.

Comment 8 Jan Kratochvil 2007-02-07 13:55:11 UTC

*** This bug has been marked as a duplicate of 197584 ***

Comment 9 Michal Schmidt 2007-03-08 16:41:33 UTC
Created attachment 149583 [details]
a modified testcase

Comment 10 Michal Schmidt 2007-03-08 16:43:06 UTC
Jan, if I'm reading bug 197584 correctly, this is supposed to be fixed in gdb
6.3.0.0-1.135. However I can reproduce this bug even with that gdb version. So
either 197584 was not fixed completely or this is not really its duplicate.
I use a somewhat modified version of Bastien's signal_test testcase (attached
before this comment).
Compile with:
gcc -Wall -g -o signal_test signal_test.c
In the first terminal run:
./signal_test
Watch it print out a sequence of numbers.
In the second terminal run:
killall -ALRM signal_test
Notice the sequence stops being printed. That's OK, signal_test is now running
the signal handler which keeps respawning itself over and over. Run:
pstack `/sbin/pidof signal_test`
See if you get:
[1]+  Stopped                 ./signal_test
on the first terminal. On my system there's about 50% chance of that happening.
Try the pstack a few more times if you're unlucky.
ps confirms that signal_test is now in 'T' state. Interestingly, 'fg' causes
signal_test to get out of the signal handler and resume printing the number
sequence.

This is not reproducible with gdb-6.5-16 from RHEL5 recompiled on RHEL3.

Comment 11 Jan Kratochvil 2007-03-08 17:11:54 UTC
Thanks for the updated testcase. To be rechecked. It may be an incomplete Bug
197584 fix.


Comment 18 Red Hat Bugzilla 2007-06-11 17:50:57 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2007-0469.html