Bug 708346 - sosreport hangs the system when multiple SIGTERMs received
Summary: sosreport hangs the system when multiple SIGTERMs received
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: sos
Version: 5.5
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Bryn M. Reeves
QA Contact: David Kutálek
URL:
Whiteboard:
Depends On:
Blocks: 782064
TreeView+ depends on / blocked
 
Reported: 2011-05-27 11:21 UTC by Pavel Moravec
Modified: 2018-11-26 18:33 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: sosreport 1.7 by default uses a threaded execution model to invoke plugin modules. Consequence: This can lead to situations in which a keyboard interrupt (Ctrl-C) fails to terminate the program due to improper synchronization between the main and child threads Fix: The sosreport command now defaults to running in single threaded mode (previously enabled by running the command with --no-multithread). A new option "--multithread" is now present that will restore the old behavior. Result: As a result sosreport now behaves more consistently when keyboard interrupts or other signals are received NOTE: Not sure of the best place to mention this but we do still have some problems in this area (unfortunately the 1.7 branch of sos has some pretty big problems here but we are tied to this release now for RHEL4/5). There is a simple workaround - if sos does not respond to a Ctrl-C the user may issue (in an interactive shell running sosreport): Ctrl-Z kill %N [ N is the number of the sosreport job - usually 1] If this does not work: kill -9 %N [ N is the number of the sosreport job - usually 1] Should this be documented as a separate known issue (CCWR)?
Clone Of:
Environment:
Last Closed: 2012-02-21 03:25:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Legacy) 55275 0 None None None Never
Red Hat Product Errata RHSA-2012:0153 0 normal SHIPPED_LIVE Low: sos security, bug fix, and enhancement update 2012-02-21 07:25:08 UTC

Description Pavel Moravec 2011-05-27 11:21:41 UTC
Description of problem:
When a user runs sosreport and wants to halt it, pressing Ctrl+C does not terminate it. Pressing it multiple times causes extra threads are invoked with possible result in congesting the op.system by the sosreport processes.


Version-Release number of selected component (if applicable):
sos-1.7-9_49_el5
(fixed in sos-2.2)


How reproducible:
almost 100%


Steps to Reproduce:
1. Run sosreport (regardless option --no-multithread set or not)
2. When plugins started to be executed, press Ctrl+C
3. sosreport continues to run, press Ctrl+C again (several times)
4. Monitor number of "sosreport" processes - they are increasing

  
Actual results:
sosreport usually does not terminate, number of sosreport processes is increasing everytime Ctrl+C is pressed. In an extreme case, whole machine is frozen by managing the sosreport rocesses.


Expected results:
After very first (or let say after several) pressing of Ctrl+C, sosreport should terminate. No sosreport process can run on background later on.


Additional info:
This is FIXED in RHEL 6.0. The fix is apparently based on removing multithreading option (default one in 5.x) as python process signals in main thread only. But just that does not suffice..

Comment 2 Bryn M. Reeves 2011-06-15 16:54:47 UTC
I don't necessarily think this is related to threading (and I'd be cautious about saying what the "fix" was - I suspect this behaviour just "went away" at some point upstream).

I'll need to look at this to see what we can do about RHEL5's behaviour. In the meantime I think we need to advise users to be cautious about sending multiple SIGINTs (which is what Ctrl-C sends iirc).

Comment 3 Bryn M. Reeves 2011-10-31 17:00:15 UTC
Actually I think there may be a simple fix for this. I've got some changes pending (waiting for test env. to finish some other work right now) to the signal handling for sos.

I think the problem here is that sos's termination handling is currently only wired up to SIGTERM but as I mentioned in comment #2 that is not the signal that ctrl-c actually sends to processes. This is most likely also why ctrl-z kill %1 works (kill delivers SIGTERM by default).

Although I'm still unable to reproduce the "hang" I think that this should take care of that case also.

Comment 4 Bryn M. Reeves 2011-11-09 12:10:08 UTC
Still can't reproduce the hang and fixing this is a lot more complicated than I'd hoped; all the plugin code exception use has to be reviewed and much of it needs to be rewritten. I don't think that that is suitable for a RHEL5 update.

At this point I'm considering deleting the multithreading code entirely from RHEL5's sos (leaving --no-multithread as a noop). Benchmarking consistently shows a very small benefit to the threading anyway (~20s gain over a 5-6m run).

Comment 5 Pavel Moravec 2011-11-09 12:34:12 UTC
I failed to reproduce it as well now on RHEL 5.7. Previous (successfull) reproductions were done on some older RHEL (5.4 most probably) where I was able to overflow the system very easily. So most probably this has been already fixed coincidentally.

Together with Bryn's last comment, I agree to close this BZ with "can't reproduce" / "works for me" reason.

Comment 6 Bryn M. Reeves 2011-11-09 12:40:49 UTC
I'd still consider deleting multithreading in RHEL5 sos; I don't see any advantage to it and it causes real problems for users. It would also give us an opportunity to fix cleanup handling (which appears to be broken right now: temporary locations are not reliably cleaned up on abnormal termination).

Comment 18 Bryn M. Reeves 2012-01-25 16:47:08 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: sosreport 1.7 by default uses a threaded execution model to invoke plugin
modules. 

Consequence: This can lead to situations in which a keyboard interrupt (Ctrl-C) fails to
terminate the program due to improper synchronization between the main and
child threads

Fix: The sosreport command now defaults to running in single threaded mode
(previously enabled by running the command with --no-multithread). A new option
"--multithread" is now present that will restore the old behavior.

Result: As a result sosreport now behaves more consistently when keyboard interrupts
or other signals are received


NOTE: Not sure of the best place to mention this but we do still have some
problems in this area (unfortunately the 1.7 branch of sos has some pretty big
problems here but we are tied to this release now for RHEL4/5). There is a
simple workaround - if sos does not respond to a Ctrl-C the user may issue (in
an interactive shell running sosreport):

Ctrl-Z
kill %N  [ N is the number of the sosreport job - usually 1]
If this does not work:
kill -9 %N  [ N is the number of the sosreport job - usually 1]

Should this be documented as a separate known issue (CCWR)?

Comment 19 errata-xmlrpc 2012-02-21 03:25:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0153.html


Note You need to log in before you can comment on or make changes to this bug.