Bug 589174 - fix iscsi/iser functioning and failover time under DM multipath
Summary: fix iscsi/iser functioning and failover time under DM multipath
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Barry Donahue
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 534151
TreeView+ depends on / blocked
 
Reported: 2010-05-05 15:08 UTC by Or Gerlitz
Modified: 2010-11-11 16:13 UTC (History)
4 users (show)

(edit)
Clone Of:
(edit)
Last Closed: 2010-11-11 16:13:49 UTC


Attachments (Terms of Use)
patch that fixes the problem (15.13 KB, patch)
2010-05-06 08:56 UTC, Or Gerlitz
no flags Details | Diff

Description Or Gerlitz 2010-05-05 15:08:02 UTC
Description of problem:

The current iser disconnection flow actually blocking the iscsid daemon till
a disconnect or timeout event are delivered by the underlying IB CM, where this would take up to 100 seconds when that target isn't reachable. As a result, the whole iscsi stack isn't responsive during that time and DM multi-path failover time includes waiting for the timeout. 

The proposed patch @ http://marc.info/?l=linux-rdma&m=127306994909954 fixes that. For clarity and under the upstream kernel conventions, it was sent as a patch series, but this patch is the core, for RHEL6 I will attach it here as one patch.

Version-Release number of selected component (if applicable): 

please apply to RHEL6

How reproducible:

Always!

Steps to Reproduce:

1. discover the same disk/lun through two different paths 
2. start the DM multipath daemon
3. issue IO over the multipath device (e.g dm-0)
4. take down an HW element (e.g HCA/Switch) port used by one of the paths
  
Actual results:

fail-over will be slow, about 130 seconds

Expected results:

under the settings recommended by the multipath section of the README provided by the iscsi-initiator-utils rpm, failover would take about 30 seconds, this happens when the patch is applied.

Additional info:

see example fail-over times before/after the patch and some more details at the patch description

Comment 2 RHEL Product and Program Management 2010-05-05 16:14:34 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Or Gerlitz 2010-05-06 08:56:56 UTC
Created attachment 411882 [details]
patch that fixes the problem

validated with RHEL6 beta (Santiago), kernel 2.6.32-19.el6.x86_64 and DM multipath
being device-mapper-multipath-0.4.9-12.el6.x86_64 

I added the proposed upstream patches into one patch, its 99% the 3rd patch, the other two are really tiny.

[PATCH 1/3] ib/iser: add event handler
http://marc.info/?l=linux-rdma&m=127306981509687
[PATCH 2/3] ib/iser: remove buggy back-pointer setting
http://marc.info/?l=linux-rdma&m=127306983909726
[PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing
http://marc.info/?l=linux-rdma&m=127306994909954

Comment 5 Barry Donahue 2010-05-24 13:02:21 UTC
We can set up  multipath to the Equallogic array. This will have to be done manually on a system with multiple NICs. We can't actually bounce the switch port but we could do a cable pull at the NIC. So, I guess we could do it.

Comment 6 Mike Christie 2010-05-24 20:17:59 UTC
You actually need infinniband to test it, so the EQL target will not work. You need scsi-target-utils and then 2 boxes with ib cards.

We are working closely with the Voltaire guys. They have been testing iser in RHEL6, and this patch was made by them as a result of their testing.

They have said they will re-test (they made and tested the patch before they sent it) the patch when it gets merged.

Comment 7 Or Gerlitz 2010-05-25 07:41:34 UTC
(In reply to comment #6)
> They have said they will re-test (they made and tested the patch before they
> sent it) the patch when it gets merged

sure, once we have access to RHEL kernel that has the patch merged, I will test multi-pathing and update here

Comment 10 Or Gerlitz 2010-05-25 17:54:31 UTC
(In reply to comment #8)
> Patch(es) available on kernel-2.6.32-29.el6    

The latest kernel @ http://people.redhat.com/arozansk/el6 is 2.6.32-19.el6 ... how do I get the -29 kernel?

Comment 12 Barry Donahue 2010-09-24 18:28:37 UTC
Any feedback from Voltaire on this BZ?

Comment 13 Or Gerlitz 2010-09-25 21:57:26 UTC
(In reply to comment #12)
> Any feedback from Voltaire on this BZ?

Yes, I tested with -30 (i.e 2.6.32-30.el6) and multi-pathing worked well with iser

Comment 14 releng-rhel@redhat.com 2010-11-11 16:13:49 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.