Bug 916937 - Set global{locking_type=0} when calling lvm2 commands
Summary: Set global{locking_type=0} when calling lvm2 commands
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: sos
Version: 5.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Bryn M. Reeves
QA Contact: David Kutálek
URL:
Whiteboard:
Depends On:
Blocks: 1102282
TreeView+ depends on / blocked
 
Reported: 2013-03-01 09:43 UTC by Miguel Perez Colino
Modified: 2014-09-16 00:31 UTC (History)
5 users (show)

Fixed In Version: sos-1.7-9.72.el5
Doc Type: Enhancement
Doc Text:
no docs needed
Clone Of:
: 1102282 (view as bug list)
Environment:
Last Closed: 2014-09-16 00:31:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1200 0 normal SHIPPED_LIVE sos bug fix update 2014-09-16 04:17:05 UTC

Description Miguel Perez Colino 2013-03-01 09:43:37 UTC
Description of problem:
App "sosreport" cannot be launched safely in batch mode, as it will continue if a plugin has an error.
Adding a "--check" option will allow to launch a pre-flight check of plugins automatically in scripts.
Adding an "exit 1" to batch mode, if plugin check fails, will make it safer. 

Version-Release number of selected component (if applicable):
# rpm -q sos
sos-1.7-9.62.el5

How reproducible:
Run "sosreport" in batch mode when a process is in a D state

Steps to Reproduce:
1. Create a daily cron job in order to have periodic sosreports
2. Wait until a process in the machine enters a state D
3. See the collision of sosreport against the process
  
Actual results:
sosreport hang or service crash

Expected results:
sosreport exits with errorlevel 0

Additional info:

# diff sosreport-check /usr/sbin/sosreport
160,162d159
< __cmdParser__.add_option("--check", action="store_true", \
<                      dest="check", default=False, \
<                      help="perform plugin check only")
572,578d568
<         else:
<             print _("Exiting")
<             sys.exit(1)
<     else:
<         print _("Plugin Test OK")
<         if __cmdLineOpts__.check:
<             sys.exit(0)

Comment 1 Bryn M. Reeves 2013-03-01 10:54:52 UTC
Running sos with a process in D state does not cause plugins to fail. It causes a warning to be printed that is mostly misleading and unhelpful to customers (and that was removed from later versions many years ago).

These steps:

  Steps to Reproduce:
  1. Create a daily cron job in order to have periodic sosreports
  2. Wait until a process in the machine enters a state D
  3. See the collision of sosreport against the process
  
  Actual results:
  sosreport hang or service crash

Do not result in any problems for me. Please be more specific about what you are trying to solve here; e.g. what processes you observe causing such problems.

Comment 2 Miguel Perez Colino 2013-03-01 17:10:21 UTC
In RHEL 5, with sosreport 1.7 (no higher version available), and when running in batch mode, no warning is printed, and sosreport still runs.

My customer claims that, when they launch sosreport because they are having an issue, and the warning about a process in state D appears, after accepting to continue, the program hangs up and sometimes can crash the machine. (Running SAP and Oracle 11g).

Thay do not want to have a cron job that runs "/usr/sbin/sosreport -a -v --no-progressbar --no-multithread --batch --name=XXXXX --tmp-dir=/var/log/sosreport" because it may cause problems with the current behavior.

In my humble opinion the problem to be solved here is to have a "batch" mode that behaves in a safe way, which means that if there is a problem with one plugin during checks, the program exits and the report does not get generated. 

> Do not result in any problems for me.
It is clear that we are not running the program under the same circumstances. I'll try to gather more information and add it to this RFE, even when what I want to resolve with this bug is not the system crash, but the behavior of sosreport in batch mode.

> Please be more specific about what you are trying to solve here
As I wrote before, I want to solve the behavior of sosreport when running in batch mode.

Comment 3 Bryn M. Reeves 2013-03-04 10:24:41 UTC
Please include logs (ps ax --forest when the problem is happening, sosreport -vvv output and any panic/oops/warn/bug messages generated during a "system crash") and steps to reproduce (the steps in comment #0 are not effective so some important detail has been omitted).

You haven't yet demonstrated that there is a problem with the behaviour of sosreport when run in batch mode (as evidenced by the fact that the steps do not reproduce the problem when run on a typical RHEL installation when one or more processes is in un-interruptible sleep).
 
If there is a problem with some process when sos runs then we should fix it and not paper over it with hacks.

Comment 8 Bryn M. Reeves 2013-03-06 16:42:20 UTC
Warnings about D state processes are just that - warnings, not errors. They should never prevent the tool from running (and have been removed upstream and in RHEL6 because of the level of confusion they have caused).

So this bug actually appears to be a very specific case; LVM2 tools hanging when run under sos due to cluster locking problems when clvmd is in use.

We can address that by changing the manner in which sos invokes the LVM2 tools - we never modify metadata so there is no need for the tools to request any locks at all (and in fact as your customer has seen this could cause problems for sos and potentially other users of the clustered volume manager) - in fact this is a change we probably should have made some time ago.

I will implement this upstream and clone the bug for RHEL6.

If the customer is able to reproduce I'd be happy to provide packages for testing.

Comment 11 Miguel Perez Colino 2013-03-07 10:11:40 UTC
> Warnings about D state processes are just that - warnings, not errors.
OK. Understood. This makes complete sense. Thanks a lot.

May I propose a "--safe-batch" option that exists in case of warnings? :-)

> I will implement this upstream and clone the bug for RHEL6.
Great, thanks again!. I'll keep the customer informed.

Comment 18 RHEL Program Management 2013-07-26 17:03:43 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 21 Bryn M. Reeves 2014-05-28 14:21:14 UTC
Upstream: https://github.com/sosreport/sos/commit/dd478c2

Comment 25 errata-xmlrpc 2014-09-16 00:31:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1200.html


Note You need to log in before you can comment on or make changes to this bug.