Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 916937

Summary: Set global{locking_type=0} when calling lvm2 commands
Product: Red Hat Enterprise Linux 5 Reporter: Miguel Perez Colino <miguel>
Component: sosAssignee: Bryn M. Reeves <bmr>
Status: CLOSED ERRATA QA Contact: David Kutálek <dkutalek>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 5.8CC: agk, bmr, dkutalek, gavin, lmiksik
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sos-1.7-9.72.el5 Doc Type: Enhancement
Doc Text:
no docs needed
Story Points: ---
Clone Of:
: 1102282 (view as bug list) Environment:
Last Closed: 2014-09-16 00:31:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1102282    

Description Miguel Perez Colino 2013-03-01 09:43:37 UTC
Description of problem:
App "sosreport" cannot be launched safely in batch mode, as it will continue if a plugin has an error.
Adding a "--check" option will allow to launch a pre-flight check of plugins automatically in scripts.
Adding an "exit 1" to batch mode, if plugin check fails, will make it safer. 

Version-Release number of selected component (if applicable):
# rpm -q sos
sos-1.7-9.62.el5

How reproducible:
Run "sosreport" in batch mode when a process is in a D state

Steps to Reproduce:
1. Create a daily cron job in order to have periodic sosreports
2. Wait until a process in the machine enters a state D
3. See the collision of sosreport against the process
  
Actual results:
sosreport hang or service crash

Expected results:
sosreport exits with errorlevel 0

Additional info:

# diff sosreport-check /usr/sbin/sosreport
160,162d159
< __cmdParser__.add_option("--check", action="store_true", \
<                      dest="check", default=False, \
<                      help="perform plugin check only")
572,578d568
<         else:
<             print _("Exiting")
<             sys.exit(1)
<     else:
<         print _("Plugin Test OK")
<         if __cmdLineOpts__.check:
<             sys.exit(0)

Comment 1 Bryn M. Reeves 2013-03-01 10:54:52 UTC
Running sos with a process in D state does not cause plugins to fail. It causes a warning to be printed that is mostly misleading and unhelpful to customers (and that was removed from later versions many years ago).

These steps:

  Steps to Reproduce:
  1. Create a daily cron job in order to have periodic sosreports
  2. Wait until a process in the machine enters a state D
  3. See the collision of sosreport against the process
  
  Actual results:
  sosreport hang or service crash

Do not result in any problems for me. Please be more specific about what you are trying to solve here; e.g. what processes you observe causing such problems.

Comment 2 Miguel Perez Colino 2013-03-01 17:10:21 UTC
In RHEL 5, with sosreport 1.7 (no higher version available), and when running in batch mode, no warning is printed, and sosreport still runs.

My customer claims that, when they launch sosreport because they are having an issue, and the warning about a process in state D appears, after accepting to continue, the program hangs up and sometimes can crash the machine. (Running SAP and Oracle 11g).

Thay do not want to have a cron job that runs "/usr/sbin/sosreport -a -v --no-progressbar --no-multithread --batch --name=XXXXX --tmp-dir=/var/log/sosreport" because it may cause problems with the current behavior.

In my humble opinion the problem to be solved here is to have a "batch" mode that behaves in a safe way, which means that if there is a problem with one plugin during checks, the program exits and the report does not get generated. 

> Do not result in any problems for me.
It is clear that we are not running the program under the same circumstances. I'll try to gather more information and add it to this RFE, even when what I want to resolve with this bug is not the system crash, but the behavior of sosreport in batch mode.

> Please be more specific about what you are trying to solve here
As I wrote before, I want to solve the behavior of sosreport when running in batch mode.

Comment 3 Bryn M. Reeves 2013-03-04 10:24:41 UTC
Please include logs (ps ax --forest when the problem is happening, sosreport -vvv output and any panic/oops/warn/bug messages generated during a "system crash") and steps to reproduce (the steps in comment #0 are not effective so some important detail has been omitted).

You haven't yet demonstrated that there is a problem with the behaviour of sosreport when run in batch mode (as evidenced by the fact that the steps do not reproduce the problem when run on a typical RHEL installation when one or more processes is in un-interruptible sleep).
 
If there is a problem with some process when sos runs then we should fix it and not paper over it with hacks.

Comment 8 Bryn M. Reeves 2013-03-06 16:42:20 UTC
Warnings about D state processes are just that - warnings, not errors. They should never prevent the tool from running (and have been removed upstream and in RHEL6 because of the level of confusion they have caused).

So this bug actually appears to be a very specific case; LVM2 tools hanging when run under sos due to cluster locking problems when clvmd is in use.

We can address that by changing the manner in which sos invokes the LVM2 tools - we never modify metadata so there is no need for the tools to request any locks at all (and in fact as your customer has seen this could cause problems for sos and potentially other users of the clustered volume manager) - in fact this is a change we probably should have made some time ago.

I will implement this upstream and clone the bug for RHEL6.

If the customer is able to reproduce I'd be happy to provide packages for testing.

Comment 11 Miguel Perez Colino 2013-03-07 10:11:40 UTC
> Warnings about D state processes are just that - warnings, not errors.
OK. Understood. This makes complete sense. Thanks a lot.

May I propose a "--safe-batch" option that exists in case of warnings? :-)

> I will implement this upstream and clone the bug for RHEL6.
Great, thanks again!. I'll keep the customer informed.

Comment 18 RHEL Program Management 2013-07-26 17:03:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 21 Bryn M. Reeves 2014-05-28 14:21:14 UTC
Upstream: https://github.com/sosreport/sos/commit/dd478c2

Comment 25 errata-xmlrpc 2014-09-16 00:31:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1200.html