Bug 994277 - multipath: fix handling of transport-offline states
multipath: fix handling of transport-offline states
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
6.5
Unspecified Unspecified
unspecified Severity low
: rc
: ---
Assigned To: Ben Marzinski
yanfu,wang
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-08-06 19:39 EDT by mchristie
Modified: 2014-03-02 18:41 EST (History)
16 users (show)

See Also:
Fixed In Version: device-mapper-multipath-0.4.9-69.el6
Doc Type: Bug Fix
Doc Text:
Cause: Multipath wasn't reserving enough space to hold the "transport-offline" value when it checked the paths sysfs state. Also it was running the checker on paths in the "quiesce" state. Consequence: Multipath would issue a warning message that it couldn't read the sysfs file for paths in the "transport-offline" state, and would unnecessarily fail paths in the "quiesce" state. Fix: Multipath allocates enough space for the "transport-offline" state, and puts paths in the "quiesce" state to pending. Result: Multipath no longer issues warning messages for paths in the "transport-offline" state, and no longer fails paths in the "quiesce" state.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 02:51:07 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description mchristie 2013-08-06 19:39:17 EDT
Description of problem:

The iscsi layer uses a long iscsi device state, transport-offline, and the multipath tools state buffer reading code cannot handle that large a buffer. This is a request to bring in this patch:

https://www.redhat.com/archives/dm-devel/2013-February/msg00058.html

from upstream.

Without this patch the logs fill up with messages about not being able to read the file when the path is down.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 2 Ben Marzinski 2013-08-13 12:26:02 EDT
Patch applied. Thanks.
Comment 3 mchristie 2013-08-13 12:59:26 EDT
QA,

To test this just login to a iscsi target, create a multipath device using the iscsi paths, then pull a cable for longer than the iscsi replacement/recovery timeout setting (default is 2 minutes but modifyable in /etc/iscsi/iscsid.conf and the isccsiadm -m node -o update command for existing targgets).

When the iscsi replacement/recovery timeout has expired then you should see

session recovery timed out after %d secs

in /var/log/messages, and if you cat

/sys/devices/platform/host3/session1/target3:0:0/3:0:0:0/state

it will say transport-offline.


In /var/log/messages then you will then see these messages start to appear:

Jul 17 10:00:52 IONr8RED2950 multipathd: overflow in attribute '/sys/devices/platform/host3/session1/target3:0:0/3:0:0:0/state'

With the fix those messages should not appear.
Comment 5 yanfu,wang 2013-10-14 03:23:33 EDT
Reproduced on device-mapper-multipath-0.4.9-64.el6:
test setting up a multipath device on top of an iscsi device:
[root@storageqe-17 ~]# multipath -l
mpathc (1IET     00010001) dm-6 IET,VIRTUAL-DISK
size=500M features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  `- 10:0:0:1 sdd 8:48 active undef running
...
[root@storageqe-17 ~]# cat /sys/devices/platform/host10/session3/target10\:0\:0/10\:0\:0\:1/state 
running

update the iscsi replacement/recovery timeout setting:
[root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1 |grep timeout
node.session.timeo.replacement_timeout = 120
[root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1  -o update -n node.session.timeo.replacement_timeout -v 180
[root@storageqe-17 ~]# iscsiadm -m node -T iqn.2013-09.com.redhat:target1 |grep timeout
node.session.timeo.replacement_timeout = 180

Down network in iscsi target:
[root@storageqe-19 ~]# /etc/init.d/network stop
Shutting down interface eth0:  [  OK  ]
Shutting down loopback interface:  [  OK  ]

When the iscsi replacement/recovery timeout has expired, got below expected message:
Oct 14 03:09:04 storageqe-17 kernel: session3: session recovery timed out after 180 secs
Oct 14 03:09:05 storageqe-17 iscsid: connect to 10.16.67.51:3260 failed (No route to host)
Oct 14 03:09:05 storageqe-17 kernel: sd 10:0:0:1: rejecting I/O to offline device
Oct 14 03:09:05 storageqe-17 kernel: device-mapper: multipath: Failing path 8:48.
Oct 14 03:09:05 storageqe-17 multipathd: overflow in attribute '/sys/devices/platform/host10/session3/target10:0:0/10:0:0:1/state'
Oct 14 03:09:05 storageqe-17 multipathd: mpathc: sdd - directio checker reports path is down
Oct 14 03:09:05 storageqe-17 multipathd: checker failed path 8:48 in map mpathc
Oct 14 03:09:05 storageqe-17 multipathd: mpathc: remaining active paths: 0
Oct 14 03:09:10 storageqe-17 kernel: sd 10:0:0:1: rejecting I/O to offline device
Oct 14 03:09:10 storageqe-17 multipathd: overflow in attribute '/sys/devices/platform/host10/session3/target10:0:0/10:0:0:1/state'

[root@storageqe-17 ~]# cat /sys/devices/platform/host10/session3/target10\:0\:0/10\:0\:0\:1/state 
transport-offline

Verified on the fixed version without above problem.
Comment 9 errata-xmlrpc 2013-11-21 02:51:07 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1574.html

Note You need to log in before you can comment on or make changes to this bug.