Bug 481227

Summary: multipath.conf: polling_interval is misleading in the docs
Product: Red Hat Enterprise Linux 5 Reporter: Shane Bradley <sbradley>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: agk, bmarzins, bmr, christophe.varoqui, cward, dwysocha, edamato, egoggin, heinzm, junichi.nomura, kueda, lmb, mbroz, prockai, sghosh, tao, tranlan
Target Milestone: rcKeywords: Documentation, FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 11:48:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch to update docs on polling_interval none

Description Shane Bradley 2009-01-22 21:27:07 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121622 Fedora/3.0.5-1.fc10 Firefox/3.0.5

After reviewing the code and doing some testing I have noticed that
polling_interval did not work as expected. I had reviewed the
description of the option for multipath.conf and it conflicted with
the results that I had got

testing device-mapper-multipath on RHEL4/RHEL5.

$ cat /usr/share/doc/device-mapper-multipath-0.4.7/multipath.conf.annotated
#       # name    : polling_interval
#       # scope   : multipathd
#       # desc    : interval between two path checks in seconds
#       # default : 5
#       #
#       polling_interval 10

---------

The behaviour that I had expected based on the option's description above:
check path 1
wait polling_interval
check path 2
wait polling_interval
check path 1
wait polling_interval
check path 2
wait polling_interval

However after testing the results that I got was(with multipathd -v4):
example:
check path 1
check path 2
wait polling_interval
check path 1
check path 2
wait polling_interval

---------

The behaviour I seen in RHEL4 and RHEL5 was working as design after
reviewing the code and talking to a couple engineers.

The problem it seems is how I was reading the description of the
option. Most users read the word path as being "path" to a multipath
device and not path as in all possible paths to all possible mpaths.

From my results in testing and talking with some engineers the
"polling_interval" option actually means:

"The interval between checking all possible paths for all multipath
paths"

I believe the man page and sample config files need to be updated to
reflect a more accurate and simpler description.


Reproducible: Always

Steps to Reproduce:
None
Actual Results:  
None

Expected Results:  
None

Comment 1 Shane Bradley 2009-01-29 16:53:27 UTC
Thread with patch attached:
https://www.redhat.com/archives/dm-devel/2009-January/msg00197.html

Comment 2 Shane Bradley 2009-01-29 16:54:17 UTC
Created attachment 330376 [details]
patch to update docs on polling_interval

Comment 3 Ben Marzinski 2009-05-05 18:50:07 UTC
I can see how the original wording could confuse a person, however the wording in you attachment is not correct.  multipathd doesn't always check all the paths every polling interval.   A path is checked every polling_interval seconds after it was added.  If paths were added at different times, they may not be checked at the same time.  Also, If a path is usable, the time between path checks will gradually increase to (4 * polling_interval).  This is because it is much more important to recover a failed path, than it is to preemptively fail an active path.  If any IO attempts to use a path that is broken, but marked active, the kernel will automatically switch the path to a failed state.  However the kernel is not able to try a failed path until multipathd has marked it as active again (well, that is not completely true, but it's close enough).  This change in actual time between checks is yet another reason why different paths won't always be checked at the same time. Here is the change I made. Let me know if you think it is still problematic

Index: multipath-tools-rhel5_4/multipath.conf.annotated
===================================================================
--- multipath-tools-rhel5_4.orig/multipath.conf.annotated
+++ multipath-tools-rhel5_4/multipath.conf.annotated
@@ -28,7 +28,9 @@
 #      #
 #      # name    : polling_interval
 #      # scope   : multipathd
-#      # desc    : interval between two path checks in seconds
+#      # desc    : How often a path's state is checked, in seconds.  For
+#      #           paths that are usable, the time between checks will
+#      #           gradually increase to (4 * polling_interval).
 #      # default : 5
 #      #
 #      polling_interval 10

Comment 5 Chris Ward 2009-07-03 18:21:49 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 7 errata-xmlrpc 2009-09-02 11:48:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1377.html