Description of problem:
dm-multipath with multibus path grouping and iSCSI devices over bnx2
fails high-availability tests. The test simply shuts down one of the
two network links (with ifdown), and checks that I/O connectivity
is still working on the surviving path. Config diagram :
| Box1 | Host with iscsi-initiator (server)
192.168.1.2 | | 192.168.2.2
192.168.1.20 | | 192.168.2.20
| Box2 | Host with iscsi-target (storage)
/dev/mapper/mpath2p1 (sda/sdb), configured for "multibus"
When the HA test is run repeatedly, it eventually fails. Investigations
so far have shown the tests do NOT fail if ANY of the following are used
instead of the above config :
(a) Intel NICs are used instead of Broadcomm, or
(b) Broadcomm NICs are used and the cable is pulled (rather than ifdown), or
(c) "failover" path_grouping_policy is used instead of "multibus", or
(d) bonding is used instead of dm-multipath+iSCSI
Despite the abundant workarounds available, the customer wants the bug
identified and fixed. It's unclear so far whether this is a bnx2 bug,
or a dm-multipath bug, or an iSCSI bug, or some kind of pathological
Version-Release number of selected component (if applicable):
The bug is apparently timing related, but always reproducible with
Broadcomm NICs and dm-multipath "multibus" path grouping. It is
not reproducible with any of the workarounds (a)-(d) listed above.
Steps to Reproduce:
Customer has provided detailed config and test recipes - I can
include them here if needed, though the description above should
alternate path to iSCSI device not available when one NIC interface is ifdown'ed
alternate path to iSCSI device available
There are some scsi reservation conflict errors present in some of
the logs - still trying to determine if these are related somehow.
Some of the reports show Lifekeeper HA is installed, which may be
related - customer claims the problem still occurs even if Lifekeeper
is not installed.
Can you please post /etc/multipath.conf and the result of running
# multipath -ll
When you reproduce this again, can you please attach the output from /var/log/messages, as well as the commands you ran to reproduce this.
(and the timestamps from when you ran the various commands would be
Are you using broadcom's offload driver bnx2i or just bnx2+iscsi_tcp?
When you perform the tests could you also get the output of
iscsiadm -m session -P 3
so we can see of the iscsi sessions logged in properly.
(In reply to comment #2)
> Are you using broadcom's offload driver bnx2i or just bnx2+iscsi_tcp?
In the only sosreport I have so far, it looks like they're using
the offload driver :
# egrep -i 'iscsi|bnx' lsmod
iscsi_tcp 58305 42
libiscsi 63553 2 ib_iser,iscsi_tcp
bnx2i 102176 0
scsi_transport_iscsi 67153 5 ib_iser,iscsi_tcp,libiscsi,bnx2i
cnic 74648 1 bnx2i
bnx2 214408 0
scsi_mod 196569 13 mptctl,usb_storage,ib_iser,iscsi_tcp,libiscsi,bnx2i,scsi_transport_iscsi,scsi_dh,sr_mod,sg,libata,megaraid_sas,sd_mod
> When you perform the tests could you also get the output of
> iscsiadm -m session -P 3
> so we can see of the iscsi sessions logged in properly.
Masaki-san, can you please ask the customer for the iscsiadm info too,
in addition to the sosreport, thanks.
For Ben's request for multipath -ll and /var/log/messages after having
run the tests and reproduced the bug: the sosreport will already have
all of this information.
It appears that the problem here is that iscsi isn't failing back soon enough.
Looking at the multipath messages, I see:
Jun 23 19:16:19 nabtesco multipathd: sdc: readsector0 checker reports path is
Jun 23 19:16:24 nabtesco multipathd: sdc: readsector0 checker reports path is
Jun 23 19:16:29 nabtesco multipathd: sdc: readsector0 checker reports path is
Jun 23 19:18:29 nabtesco multipathd: sdb: readsector0 checker reports path is
So there's a 2 minute gap where we don't do any checking. That is keeping multipathd from reinitializing sdc when it should. This is happening because multipath is stuck waiting for sdb to fail.
Again, looking at the logs:
Jun 23 19:18:29 nabtesco kernel: session1: iscsi: session recovery timed out
after 120 secs
Jun 23 19:18:29 nabtesco kernel: iscsi: cmd 0x28 is not queued (8)
Jun 23 19:18:29 nabtesco kernel: iscsi: cmd 0x2a is not queued (8)
Jun 23 19:18:29 nabtesco last message repeated 2 times
Jun 23 19:18:29 nabtesco kernel: sd 5:0:0:0: SCSI error: return code = 0x00010000
Jun 23 19:18:29 nabtesco kernel: end_request: I/O error, dev sdb, sector 69
Jun 23 19:18:29 nabtesco kernel: device-mapper: multipath: Failing path 8:16.
Looking at the first comment, you can see that the iscsi session waited 120 seconds to fail.
To fix this, you need to edit /etc/iscsi/iscsi.conf to change
node.session.timeo.replacement_timeout = 120
to something shorter, say
node.session.timeo.replacement_timeout = 10
For more information on configuring iscsi devices to work well with multipath,
see section 8.1 of /usr/share/doc/iscsi-initiator-utils-18.104.22.1688/README