Bug 651389

Summary: [NetApp 6.1 bug] Unable to set dev_loss_tmo to more than 600 through multipath.conf without setting fast_io_fail_tmo
Product: Red Hat Enterprise Linux 6 Reporter: Rajashekhar M A <rajashekhar.a>
Component: device-mapper-multipathAssignee: Ben Marzinski <bmarzins>
Status: CLOSED ERRATA QA Contact: Storage QE <storage-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: agk, bdonahue, bmarzins, christophe.varoqui, coughlan, cward, dwysocha, egoggin, heinzm, junichi.nomura, kueda, lmb, mbroz, mgoodwin, prockai, tranlan, xdl-redhat-bugzilla
Target Milestone: rcKeywords: OtherQA
Target Release: 6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: device-mapper-multipath-0.4.9-32.el6 Doc Type: Bug Fix
Doc Text:
Previously, if you set dev_loss_tmo to a value greater than 600 in multipath.conf without setting the fast_io_fail_tmo value, the multipathd daemon failed to apply the setting. With this update, the multipathd daemon sets dev_loss_tmo for values over 600 correctly, as long as fast_io_fail_tmo is also set in the /etc/multipath.conf file.
Story Points: ---
Clone Of:
: 705854 (view as bug list) Environment:
Last Closed: 2011-05-19 14:12:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 705854    

Description Rajashekhar M A 2010-11-09 13:34:35 UTC
Description of problem:

When dev_loss_tmo is configured to a value of more than 600 in multipath.conf (without setting the fast_io_fail_tmo in multipath.conf), restarting the multipath daemon fails to set the parameter accordingly. Also, setting fast_io_fail_tmo and dev_loss_tmo together in the conf file will not help. After restarting the daemon, running "multipath" also does not help.

To make this work, first fast_io_fail_tmo alone has to be set in multipath.conf and the daemon should be restared. Then, dev_loss_tmo configuration has to be added to the multipath.conf and again, the daemon should be restarted.

This issue is known as per the comment - https://bugzilla.redhat.com/show_bug.cgi?id=634589#c30 .

Secondly, if dev_loss_tmo is not allowed to be set beyond 600 seconds without setting fast_io_fail_tmo, multipath or multipathd daemon should throw a warning/error message or log the failure in syslog.


Version-Release number of selected component (if applicable):

RHEL6.0 GA (2.6.32-71.el6)
device-mapper-multipath-0.4.9-31.el6


How reproducible:
Always


Steps to Reproduce:
1. Configure the following settings in defaults section of multipath.conf:
       fast_io_fail_tmo 10
       dev_loss_tmo    601
2. Restart the multipathd daemon (or reboot the host).
3. Check the dev_loss_tmo values for the remote FC ports in sysfs:

    # ls /sys/class/fc_remote_ports/rport-*\:*-*/dev_loss_tmo
    /sys/class/fc_remote_ports/rport-0:0-0/dev_loss_tmo    
    /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo    
    /sys/class/fc_remote_ports/rport-0:0-1/dev_loss_tmo    
    /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo

    # cat /sys/class/fc_remote_ports/rport-*\:*-*/dev_loss_tmo
    30
    30
    30
    30

Actual results:
The dev_loss_tmo parameter is not set properly.

Expected results:
The parameter should be set as expected.

Additional info:

The full multipath.conf looks like below:

defaults {
        fast_io_fail_tmo  10
        dev_loss_tmo      601
}

blacklist {
    devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
    devnode "^hd[a-z]"
    devnode "^dcssblk[0-9]*"
    wwid    35000cca0070ad359
}

devices {
        device{
               vendor "NETAPP"
               path_grouping_policy group_by_prio
               features "1 queue_if_no_path"
               prio "ontap"
               path_checker directio
               failback immediate
               hardware_handler "0"
               rr_weight uniform
               rr_min_io   128
               product "LUN.*"
               getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
        }
}

Comment 2 Ben Marzinski 2010-11-22 04:51:46 UTC
multipath will now set dev_loss_tmo for values over 600 correctly, as long as fast_io_fail_tmo is also set in /etc/multipath.conf

Comment 5 Chris Ward 2011-04-06 11:07:30 UTC
~~ Partners and Customers ~~

This bug was included in RHEL 6.1 Beta. Please confirm the status of this request as soon as possible.

If you're having problems accessing 6.1 bits, are delayed in your test execution or find in testing that the request was not addressed adequately, please let us know.

Thanks!

Comment 6 Eva Kopalova 2011-05-02 13:51:38 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, if you set dev_loss_tmo to a value greater than 600 in multipath.conf without setting the fast_io_fail_tmo value, the multipathd daemon failed to apply the setting. With this update, the multipathd daemon sets dev_loss_tmo for values over 600 correctly, as long as fast_io_fail_tmo is also set in the /etc/multipath.conf file.

Comment 7 Rajashekhar M A 2011-05-10 14:24:05 UTC
Verified this with RC1. Setting dev_loss_tmo to more than 600 now works fine if fast_io_fail_tmo is also set in /etc/multipath.conf. 

But if we don't set fast_io_fail_tmo in multipath.conf, the dev_loss_tmo gets configured silently with the default value (I think it's 30) without the user being warned or notified. Can multipathd throw a warning so that the user does not get confused?

Comment 8 errata-xmlrpc 2011-05-19 14:12:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0725.html

Comment 9 Mark Goodwin 2012-11-28 04:53:58 UTC
*** Bug 816790 has been marked as a duplicate of this bug. ***