Bug 368261

Summary: sos do not get stuck on dead-IO
Product: Red Hat Enterprise Linux 5 Reporter: Navid Sheikhol-Eslami <navid>
Component: sosAssignee: Adam Stokes <astokes>
Status: CLOSED DUPLICATE QA Contact:
Severity: low Docs Contact:
Priority: high    
Version: 5.5CC: agk, bmr
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-14 17:01:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Navid Sheikhol-Eslami 2007-11-06 14:10:06 UTC
Description of problem:

A common problem of sysreport and SoS is that it would hang whenever a processes
requested I/O on a dead device.

SoS now has a configurable time-out for each command, after which it will stop
waiting for a process and continue (trying to SIGKILL the child process, if
possible).

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:

SoS waits (sometimes forever) for its child.

Expected results:

SoS continues and generates a report, even if partial.

Additional info:

Comment 1 Navid Sheikhol-Eslami 2007-11-22 10:16:08 UTC
*** Bug 394761 has been marked as a duplicate of this bug. ***

Comment 2 Navid Sheikhol-Eslami 2007-11-22 10:25:39 UTC
*** Bug 374751 has been marked as a duplicate of this bug. ***

Comment 3 Bryn M. Reeves 2007-11-26 10:01:32 UTC
This bug is talking about dead devices but there are numerous other ways that
sos or the tools it runs can end up blocking. E.g. bug 394761 is about lsof &
stuck NFS mounts - depending on the mount options that are used, a SIGKILL is
not going to help there.

Comment 4 Bryn M. Reeves 2007-11-26 10:06:28 UTC
When sosreport is stuck like this, issuing Ctrl-C produces a message that
suggests it will terminate, but this again hangs:

 Progress [###################100%##################][6682:11/6682:11]
SIGTERM received, multiple threads detected, waiting for all threads to exit


Comment 5 Navid Sheikhol-Eslami 2007-11-26 12:58:17 UTC
The fix mentioned in this BZ implements a time-out: when that is reached a
SIGKILL is sent to the child process and the plugin continues. Even if SIGKILL
is unsuccessful, the plugin returns allowing sos to continue.

-- Navid

Comment 9 Bryn M. Reeves 2011-01-14 17:01:46 UTC
Closing this dup of bug 657372; the other bugzilla is more general (although the problem itself is still more general again) and has more information.

*** This bug has been marked as a duplicate of bug 657372 ***