Description of problem:
When a user runs sosreport and wants to halt it, pressing Ctrl+C does not terminate it. Pressing it multiple times causes extra threads are invoked with possible result in congesting the op.system by the sosreport processes.
Version-Release number of selected component (if applicable):
(fixed in sos-2.2)
Steps to Reproduce:
1. Run sosreport (regardless option --no-multithread set or not)
2. When plugins started to be executed, press Ctrl+C
3. sosreport continues to run, press Ctrl+C again (several times)
4. Monitor number of "sosreport" processes - they are increasing
sosreport usually does not terminate, number of sosreport processes is increasing everytime Ctrl+C is pressed. In an extreme case, whole machine is frozen by managing the sosreport rocesses.
After very first (or let say after several) pressing of Ctrl+C, sosreport should terminate. No sosreport process can run on background later on.
This is FIXED in RHEL 6.0. The fix is apparently based on removing multithreading option (default one in 5.x) as python process signals in main thread only. But just that does not suffice..
I don't necessarily think this is related to threading (and I'd be cautious about saying what the "fix" was - I suspect this behaviour just "went away" at some point upstream).
I'll need to look at this to see what we can do about RHEL5's behaviour. In the meantime I think we need to advise users to be cautious about sending multiple SIGINTs (which is what Ctrl-C sends iirc).
Actually I think there may be a simple fix for this. I've got some changes pending (waiting for test env. to finish some other work right now) to the signal handling for sos.
I think the problem here is that sos's termination handling is currently only wired up to SIGTERM but as I mentioned in comment #2 that is not the signal that ctrl-c actually sends to processes. This is most likely also why ctrl-z kill %1 works (kill delivers SIGTERM by default).
Although I'm still unable to reproduce the "hang" I think that this should take care of that case also.
Still can't reproduce the hang and fixing this is a lot more complicated than I'd hoped; all the plugin code exception use has to be reviewed and much of it needs to be rewritten. I don't think that that is suitable for a RHEL5 update.
At this point I'm considering deleting the multithreading code entirely from RHEL5's sos (leaving --no-multithread as a noop). Benchmarking consistently shows a very small benefit to the threading anyway (~20s gain over a 5-6m run).
I failed to reproduce it as well now on RHEL 5.7. Previous (successfull) reproductions were done on some older RHEL (5.4 most probably) where I was able to overflow the system very easily. So most probably this has been already fixed coincidentally.
Together with Bryn's last comment, I agree to close this BZ with "can't reproduce" / "works for me" reason.
I'd still consider deleting multithreading in RHEL5 sos; I don't see any advantage to it and it causes real problems for users. It would also give us an opportunity to fix cleanup handling (which appears to be broken right now: temporary locations are not reliably cleaned up on abnormal termination).
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Cause: sosreport 1.7 by default uses a threaded execution model to invoke plugin
Consequence: This can lead to situations in which a keyboard interrupt (Ctrl-C) fails to
terminate the program due to improper synchronization between the main and
Fix: The sosreport command now defaults to running in single threaded mode
(previously enabled by running the command with --no-multithread). A new option
"--multithread" is now present that will restore the old behavior.
Result: As a result sosreport now behaves more consistently when keyboard interrupts
or other signals are received
NOTE: Not sure of the best place to mention this but we do still have some
problems in this area (unfortunately the 1.7 branch of sos has some pretty big
problems here but we are tied to this release now for RHEL4/5). There is a
simple workaround - if sos does not respond to a Ctrl-C the user may issue (in
an interactive shell running sosreport):
kill %N [ N is the number of the sosreport job - usually 1]
If this does not work:
kill -9 %N [ N is the number of the sosreport job - usually 1]
Should this be documented as a separate known issue (CCWR)?
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.