Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 606544 - dm-multipath multibus with iSCSI over bnx2 fails ifdown HA tests
dm-multipath multibus with iSCSI over bnx2 fails ifdown HA tests
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: device-mapper-multipath (Show other bugs)
5.3
All Linux
urgent Severity high
: rc
: ---
Assigned To: Ben Marzinski
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-21 18:02 EDT by Mark Goodwin
Modified: 2011-02-23 23:55 EST (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-02-23 23:55:47 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Mark Goodwin 2010-06-21 18:02:49 EDT
Description of problem:

dm-multipath with multibus path grouping and iSCSI devices over bnx2
fails high-availability tests. The test simply shuts down one of the
two network links (with ifdown), and checks that I/O connectivity
is still working on the surviving path. Config diagram :

          +----------+
          |   Box1   | Host with iscsi-initiator (server)
          +--+----+--+
192.168.1.2  |    | 192.168.2.2
             |    |
             |    |
192.168.1.20 |    | 192.168.2.20
          +--+----+--+
          |   Box2   | Host with iscsi-target (storage)
          +----------+
          /dev/mapper/mpath2p1 (sda/sdb), configured for "multibus"

When the HA test is run repeatedly, it eventually fails. Investigations
so far have shown the tests do NOT fail if ANY of the following are used
instead of the above config :

(a) Intel NICs are used instead of Broadcomm, or
(b) Broadcomm NICs are used and the cable is pulled (rather than ifdown), or
(c) "failover" path_grouping_policy is used instead of "multibus", or
(d) bonding is used instead of dm-multipath+iSCSI

Despite the abundant workarounds available, the customer wants the bug
identified and fixed. It's unclear so far whether this is a bnx2 bug,
or a dm-multipath bug, or an iSCSI bug, or some kind of pathological
combination.

Version-Release number of selected component (if applicable):
	RHEL5.3 2.6.18-128.7.1.el5
	device-mapper-1.02.28-2.el5-x86_64
	device-mapper-multipath-0.4.7-30.el5-x86_64
	iscsi-initiator-utils-6.2.0.868-0.18.el5-x86_64

How reproducible:
The bug is apparently timing related, but always reproducible with
Broadcomm NICs and dm-multipath "multibus" path grouping. It is
not reproducible with any of the workarounds (a)-(d) listed above.

Steps to Reproduce:
Customer has provided detailed config and test recipes - I can
include them here if needed, though the description above should
be sufficient.
  
Actual results:
alternate path to iSCSI device not available when one NIC interface is ifdown'ed

Expected results:
alternate path to iSCSI device available

Additional info:
There are some scsi reservation conflict errors present in some of
the logs - still trying to determine if these are related somehow.
Some of the reports show Lifekeeper HA is installed, which may be
related - customer claims the problem still occurs even if Lifekeeper
is not installed.
Comment 1 Ben Marzinski 2010-06-22 11:09:46 EDT
Can you please post /etc/multipath.conf and the result of running

# multipath -ll

When you reproduce this again, can you please attach the output from /var/log/messages, as well as the commands you ran to reproduce this.
(and the timestamps from when you ran the various commands would be
very helpful)
Comment 2 Mike Christie 2010-06-22 14:44:02 EDT
Are you using broadcom's offload driver bnx2i or just bnx2+iscsi_tcp?

When you perform the tests could you also get the output of

iscsiadm -m session -P 3

so we can see of the iscsi sessions logged in properly.
Comment 3 Mark Goodwin 2010-06-22 20:19:59 EDT
(In reply to comment #2)
> Are you using broadcom's offload driver bnx2i or just bnx2+iscsi_tcp?
> 

In the only sosreport I have so far, it looks like they're using
the offload driver :

# egrep -i 'iscsi|bnx' lsmod
iscsi_tcp              58305  42 
libiscsi               63553  2 ib_iser,iscsi_tcp
bnx2i                 102176  0 
scsi_transport_iscsi    67153  5 ib_iser,iscsi_tcp,libiscsi,bnx2i
cnic                   74648  1 bnx2i
bnx2                  214408  0 
scsi_mod              196569  13 mptctl,usb_storage,ib_iser,iscsi_tcp,libiscsi,bnx2i,scsi_transport_iscsi,scsi_dh,sr_mod,sg,libata,megaraid_sas,sd_mod

> When you perform the tests could you also get the output of
> iscsiadm -m session -P 3
> so we can see of the iscsi sessions logged in properly.    

Masaki-san, can you please ask the customer for the iscsiadm info too,
in addition to the sosreport, thanks. 

For Ben's request for multipath -ll and /var/log/messages after having
run the tests and reproduced the bug: the sosreport will already have
all of this information.

Cheers
-- Mark
Comment 21 Ben Marzinski 2010-07-01 17:43:55 EDT
It appears that the problem here is that iscsi isn't failing back soon enough.


Looking at the multipath messages, I see:

Jun 23 19:16:19 nabtesco multipathd: sdc: readsector0 checker reports path is
down 
Jun 23 19:16:24 nabtesco multipathd: sdc: readsector0 checker reports path is
down 
Jun 23 19:16:29 nabtesco multipathd: sdc: readsector0 checker reports path is
down 
Jun 23 19:18:29 nabtesco multipathd: sdb: readsector0 checker reports path is
down 

So there's a 2 minute gap where we don't do any checking.  That is keeping multipathd from reinitializing sdc when it should.  This is happening because multipath is stuck waiting for sdb to fail.

Again, looking at the logs:

Jun 23 19:18:29 nabtesco kernel:  session1: iscsi: session recovery timed out
after 120 secs
Jun 23 19:18:29 nabtesco kernel: iscsi: cmd 0x28 is not queued (8)
Jun 23 19:18:29 nabtesco kernel: iscsi: cmd 0x2a is not queued (8)
Jun 23 19:18:29 nabtesco last message repeated 2 times
Jun 23 19:18:29 nabtesco kernel: sd 5:0:0:0: SCSI error: return code = 0x00010000
Jun 23 19:18:29 nabtesco kernel: end_request: I/O error, dev sdb, sector 69
Jun 23 19:18:29 nabtesco kernel: device-mapper: multipath: Failing path 8:16.

Looking at the first comment, you can see that the iscsi session waited 120 seconds to fail.

To fix this, you need to edit /etc/iscsi/iscsi.conf to change

node.session.timeo.replacement_timeout = 120

to something shorter, say

node.session.timeo.replacement_timeout = 10

For more information on configuring iscsi devices to work well with multipath,
see section 8.1 of /usr/share/doc/iscsi-initiator-utils-6.2.0.868/README

Note You need to log in before you can comment on or make changes to this bug.