Bug 994277
| Summary: | multipath: fix handling of transport-offline states | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | mchristie |
| Component: | device-mapper-multipath | Assignee: | Ben Marzinski <bmarzins> |
| Status: | CLOSED ERRATA | QA Contact: | yanfu,wang <yanwang> |
| Severity: | low | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.5 | CC: | abisogia, acathrow, agk, bdonahue, bmarzins, dwysocha, heinzm, jraju, loberman, msnitzer, prajnoha, prockai, sauchter, xiaoli, yanwang, zkabelac |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | device-mapper-multipath-0.4.9-69.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Multipath wasn't reserving enough space to hold the "transport-offline" value when it checked the paths sysfs state. Also it was running the checker on paths in the "quiesce" state.
Consequence: Multipath would issue a warning message that it couldn't read the sysfs file for paths in the
"transport-offline" state, and would unnecessarily fail paths in the "quiesce" state.
Fix: Multipath allocates enough space for the "transport-offline" state, and puts paths in the "quiesce" state to pending.
Result: Multipath no longer issues warning messages for paths in the "transport-offline" state, and no longer fails paths in the "quiesce" state.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-11-21 07:51:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
mchristie
2013-08-06 23:39:17 UTC
Patch applied. Thanks. QA, To test this just login to a iscsi target, create a multipath device using the iscsi paths, then pull a cable for longer than the iscsi replacement/recovery timeout setting (default is 2 minutes but modifyable in /etc/iscsi/iscsid.conf and the isccsiadm -m node -o update command for existing targgets). When the iscsi replacement/recovery timeout has expired then you should see session recovery timed out after %d secs in /var/log/messages, and if you cat /sys/devices/platform/host3/session1/target3:0:0/3:0:0:0/state it will say transport-offline. In /var/log/messages then you will then see these messages start to appear: Jul 17 10:00:52 IONr8RED2950 multipathd: overflow in attribute '/sys/devices/platform/host3/session1/target3:0:0/3:0:0:0/state' With the fix those messages should not appear. Reproduced on device-mapper-multipath-0.4.9-64.el6: test setting up a multipath device on top of an iscsi device: [root@storageqe-17 ~]# multipath -l mpathc (1IET 00010001) dm-6 IET,VIRTUAL-DISK size=500M features='0' hwhandler='0' wp=rw `-+- policy='round-robin 0' prio=0 status=active `- 10:0:0:1 sdd 8:48 active undef running ... [root@storageqe-17 ~]# cat /sys/devices/platform/host10/session3/target10\:0\:0/10\:0\:0\:1/state running update the iscsi replacement/recovery timeout setting: [root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1 |grep timeout node.session.timeo.replacement_timeout = 120 [root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1 -o update -n node.session.timeo.replacement_timeout -v 180 [root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1 |grep timeout node.session.timeo.replacement_timeout = 180 Down network in iscsi target: [root@storageqe-19 ~]# /etc/init.d/network stop Shutting down interface eth0: [ OK ] Shutting down loopback interface: [ OK ] When the iscsi replacement/recovery timeout has expired, got below expected message: Oct 14 03:09:04 storageqe-17 kernel: session3: session recovery timed out after 180 secs Oct 14 03:09:05 storageqe-17 iscsid: connect to 10.16.67.51:3260 failed (No route to host) Oct 14 03:09:05 storageqe-17 kernel: sd 10:0:0:1: rejecting I/O to offline device Oct 14 03:09:05 storageqe-17 kernel: device-mapper: multipath: Failing path 8:48. Oct 14 03:09:05 storageqe-17 multipathd: overflow in attribute '/sys/devices/platform/host10/session3/target10:0:0/10:0:0:1/state' Oct 14 03:09:05 storageqe-17 multipathd: mpathc: sdd - directio checker reports path is down Oct 14 03:09:05 storageqe-17 multipathd: checker failed path 8:48 in map mpathc Oct 14 03:09:05 storageqe-17 multipathd: mpathc: remaining active paths: 0 Oct 14 03:09:10 storageqe-17 kernel: sd 10:0:0:1: rejecting I/O to offline device Oct 14 03:09:10 storageqe-17 multipathd: overflow in attribute '/sys/devices/platform/host10/session3/target10:0:0/10:0:0:1/state' [root@storageqe-17 ~]# cat /sys/devices/platform/host10/session3/target10\:0\:0/10\:0\:0\:1/state transport-offline Verified on the fixed version without above problem. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1574.html |