Bug 708346
Summary: | sosreport hangs the system when multiple SIGTERMs received | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Pavel Moravec <pmoravec> |
Component: | sos | Assignee: | Bryn M. Reeves <bmr> |
Status: | CLOSED ERRATA | QA Contact: | David Kutálek <dkutalek> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.5 | CC: | agk, bmr, gavin, lmiksik, prc, rdassen |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: sosreport 1.7 by default uses a threaded execution model to invoke plugin
modules.
Consequence: This can lead to situations in which a keyboard interrupt (Ctrl-C) fails to
terminate the program due to improper synchronization between the main and
child threads
Fix: The sosreport command now defaults to running in single threaded mode
(previously enabled by running the command with --no-multithread). A new option
"--multithread" is now present that will restore the old behavior.
Result: As a result sosreport now behaves more consistently when keyboard interrupts
or other signals are received
NOTE: Not sure of the best place to mention this but we do still have some
problems in this area (unfortunately the 1.7 branch of sos has some pretty big
problems here but we are tied to this release now for RHEL4/5). There is a
simple workaround - if sos does not respond to a Ctrl-C the user may issue (in
an interactive shell running sosreport):
Ctrl-Z
kill %N [ N is the number of the sosreport job - usually 1]
If this does not work:
kill -9 %N [ N is the number of the sosreport job - usually 1]
Should this be documented as a separate known issue (CCWR)?
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-02-21 03:25:06 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 782064 |
Description
Pavel Moravec
2011-05-27 11:21:41 UTC
I don't necessarily think this is related to threading (and I'd be cautious about saying what the "fix" was - I suspect this behaviour just "went away" at some point upstream). I'll need to look at this to see what we can do about RHEL5's behaviour. In the meantime I think we need to advise users to be cautious about sending multiple SIGINTs (which is what Ctrl-C sends iirc). Actually I think there may be a simple fix for this. I've got some changes pending (waiting for test env. to finish some other work right now) to the signal handling for sos. I think the problem here is that sos's termination handling is currently only wired up to SIGTERM but as I mentioned in comment #2 that is not the signal that ctrl-c actually sends to processes. This is most likely also why ctrl-z kill %1 works (kill delivers SIGTERM by default). Although I'm still unable to reproduce the "hang" I think that this should take care of that case also. Still can't reproduce the hang and fixing this is a lot more complicated than I'd hoped; all the plugin code exception use has to be reviewed and much of it needs to be rewritten. I don't think that that is suitable for a RHEL5 update. At this point I'm considering deleting the multithreading code entirely from RHEL5's sos (leaving --no-multithread as a noop). Benchmarking consistently shows a very small benefit to the threading anyway (~20s gain over a 5-6m run). I failed to reproduce it as well now on RHEL 5.7. Previous (successfull) reproductions were done on some older RHEL (5.4 most probably) where I was able to overflow the system very easily. So most probably this has been already fixed coincidentally. Together with Bryn's last comment, I agree to close this BZ with "can't reproduce" / "works for me" reason. I'd still consider deleting multithreading in RHEL5 sos; I don't see any advantage to it and it causes real problems for users. It would also give us an opportunity to fix cleanup handling (which appears to be broken right now: temporary locations are not reliably cleaned up on abnormal termination). Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: sosreport 1.7 by default uses a threaded execution model to invoke plugin modules. Consequence: This can lead to situations in which a keyboard interrupt (Ctrl-C) fails to terminate the program due to improper synchronization between the main and child threads Fix: The sosreport command now defaults to running in single threaded mode (previously enabled by running the command with --no-multithread). A new option "--multithread" is now present that will restore the old behavior. Result: As a result sosreport now behaves more consistently when keyboard interrupts or other signals are received NOTE: Not sure of the best place to mention this but we do still have some problems in this area (unfortunately the 1.7 branch of sos has some pretty big problems here but we are tied to this release now for RHEL4/5). There is a simple workaround - if sos does not respond to a Ctrl-C the user may issue (in an interactive shell running sosreport): Ctrl-Z kill %N [ N is the number of the sosreport job - usually 1] If this does not work: kill -9 %N [ N is the number of the sosreport job - usually 1] Should this be documented as a separate known issue (CCWR)? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0153.html |