Hide Forgot
Description of problem: in case of multipath path error lvm commands are taking a long time (in my deployment it took 4.5 minutes) adding disable_after_error_count parameter to lvm.conf should put the device as disabled after a specific amount of errors has happened and make lvm and vdsm respond quicker. Version-Release number of selected component (if applicable): vdsm-4.9-81.el6.x86_64 How reproducible: always Steps to Reproduce: 1.make a multipath path faulty 2.lvm pvs vgs takes around 4.5 -5 minutes (50 SDs connected) 3. Actual results: Expected results: Additional info: changing this config value on lvm.conf makes operations to run much faster.
How and when would the device be re-enabled?
Some additional notes, I just talked with Moran and "multipath path error" should be intended as "the storage is unreachable through all its paths". Looking at the disable_after_error_count code introduced with the patch: http://sources.redhat.com/git/gitweb.cgi?p=lvm2.git;a=commitdiff;h=74b228ee945934c3b979cbb70a29b3a721f5c683 The error_count is a one-shot value for each lvm command. Summarizing: using disable_after_error_count has no side effects (eg: it's not permanently disabling a device) and would improve the lvm responsiveness when one storage is completely unreachable.
which value should we put in disable_after_error_count so we do not have to many false negatives? Moran, this question is directly also to your team :-)
BZ#722754 Limit lvm retries to broken devices Change-Id: I74dfdea05943f72c7b89eba42246fc8f26bf0035 http://gerrit.usersys.redhat.com/730
Verified - vdsm-4.9-91 - disable_after_error_count parameter is now set to 3.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2011-1782.html