Bug 722754 - [vdsm][error-handling][lvm-conf]vdsm should add disable_after_error_count in lvm.conf
Summary: [vdsm][error-handling][lvm-conf]vdsm should add disable_after_error_count in ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm
Version: 6.3
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Federico Simoncelli
QA Contact: Moran Goldboim
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-17 11:31 UTC by Moran Goldboim
Modified: 2011-12-06 07:31 UTC (History)
7 users (show)

Fixed In Version: vdsm-4.9-87
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 07:31:45 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2011:1782 0 normal SHIPPED_LIVE new packages: vdsm 2011-12-06 11:55:51 UTC

Description Moran Goldboim 2011-07-17 11:31:52 UTC
Description of problem:
in case of multipath path error lvm commands are taking a long time (in my deployment it took 4.5 minutes) adding disable_after_error_count parameter to lvm.conf should put the device as disabled after a specific amount of errors has happened and make lvm and vdsm respond quicker.

Version-Release number of selected component (if applicable):
vdsm-4.9-81.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1.make a multipath path faulty
2.lvm pvs vgs takes around 4.5 -5 minutes (50 SDs connected)
3.
  
Actual results:


Expected results:


Additional info:
changing this config value on lvm.conf makes operations to run much faster.

Comment 2 Dan Kenigsberg 2011-07-17 11:59:22 UTC
How and when would the device be re-enabled?

Comment 3 Federico Simoncelli 2011-07-20 13:41:11 UTC
Some additional notes, I just talked with Moran and "multipath path error" should be intended as "the storage is unreachable through all its paths".
Looking at the disable_after_error_count code introduced with the patch:

http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commitdiff;h=74b228ee945934c3b979cbb70a29b3a721f5c683

The error_count is a one-shot value for each lvm command.
Summarizing: using disable_after_error_count has no side effects (eg: it's not permanently disabling a device) and would improve the lvm responsiveness when one storage is completely unreachable.

Comment 4 Dan Kenigsberg 2011-07-21 07:00:15 UTC
which value should we put in disable_after_error_count so we do not have to many false negatives? Moran, this question is directly also to your team :-)

Comment 5 Federico Simoncelli 2011-07-21 07:55:30 UTC
BZ#722754 Limit lvm retries to broken devices
Change-Id: I74dfdea05943f72c7b89eba42246fc8f26bf0035

http://gerrit.usersys.redhat.com/730

Comment 7 Tomas Dosek 2011-08-10 06:12:16 UTC
Verified - vdsm-4.9-91 - disable_after_error_count parameter is now set to 3.

Comment 8 errata-xmlrpc 2011-12-06 07:31:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html


Note You need to log in before you can comment on or make changes to this bug.