Bug 589174 - fix iscsi/iser functioning and failover time under DM multipath
fix iscsi/iser functioning and failover time under DM multipath
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
low Severity high
: rc
: ---
Assigned To: Mike Christie
Barry Donahue
:
Depends On:
Blocks: 534151
  Show dependency treegraph
 
Reported: 2010-05-05 11:08 EDT by Or Gerlitz
Modified: 2010-11-11 11:13 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-11 11:13:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch that fixes the problem (15.13 KB, patch)
2010-05-06 04:56 EDT, Or Gerlitz
no flags Details | Diff

  None (edit)
Description Or Gerlitz 2010-05-05 11:08:02 EDT
Description of problem:

The current iser disconnection flow actually blocking the iscsid daemon till
a disconnect or timeout event are delivered by the underlying IB CM, where this would take up to 100 seconds when that target isn't reachable. As a result, the whole iscsi stack isn't responsive during that time and DM multi-path failover time includes waiting for the timeout. 

The proposed patch @ http://marc.info/?l=linux-rdma&m=127306994909954 fixes that. For clarity and under the upstream kernel conventions, it was sent as a patch series, but this patch is the core, for RHEL6 I will attach it here as one patch.

Version-Release number of selected component (if applicable): 

please apply to RHEL6

How reproducible:

Always!

Steps to Reproduce:

1. discover the same disk/lun through two different paths 
2. start the DM multipath daemon
3. issue IO over the multipath device (e.g dm-0)
4. take down an HW element (e.g HCA/Switch) port used by one of the paths
  
Actual results:

fail-over will be slow, about 130 seconds

Expected results:

under the settings recommended by the multipath section of the README provided by the iscsi-initiator-utils rpm, failover would take about 30 seconds, this happens when the patch is applied.

Additional info:

see example fail-over times before/after the patch and some more details at the patch description
Comment 2 RHEL Product and Program Management 2010-05-05 12:14:34 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.
Comment 3 Or Gerlitz 2010-05-06 04:56:56 EDT
Created attachment 411882 [details]
patch that fixes the problem

validated with RHEL6 beta (Santiago), kernel 2.6.32-19.el6.x86_64 and DM multipath
being device-mapper-multipath-0.4.9-12.el6.x86_64 

I added the proposed upstream patches into one patch, its 99% the 3rd patch, the other two are really tiny.

[PATCH 1/3] ib/iser: add event handler
http://marc.info/?l=linux-rdma&m=127306981509687
[PATCH 2/3] ib/iser: remove buggy back-pointer setting
http://marc.info/?l=linux-rdma&m=127306983909726
[PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing
http://marc.info/?l=linux-rdma&m=127306994909954
Comment 5 Barry Donahue 2010-05-24 09:02:21 EDT
We can set up  multipath to the Equallogic array. This will have to be done manually on a system with multiple NICs. We can't actually bounce the switch port but we could do a cable pull at the NIC. So, I guess we could do it.
Comment 6 Mike Christie 2010-05-24 16:17:59 EDT
You actually need infinniband to test it, so the EQL target will not work. You need scsi-target-utils and then 2 boxes with ib cards.

We are working closely with the Voltaire guys. They have been testing iser in RHEL6, and this patch was made by them as a result of their testing.

They have said they will re-test (they made and tested the patch before they sent it) the patch when it gets merged.
Comment 7 Or Gerlitz 2010-05-25 03:41:34 EDT
(In reply to comment #6)
> They have said they will re-test (they made and tested the patch before they
> sent it) the patch when it gets merged

sure, once we have access to RHEL kernel that has the patch merged, I will test multi-pathing and update here
Comment 10 Or Gerlitz 2010-05-25 13:54:31 EDT
(In reply to comment #8)
> Patch(es) available on kernel-2.6.32-29.el6    

The latest kernel @ http://people.redhat.com/arozansk/el6 is 2.6.32-19.el6 ... how do I get the -29 kernel?
Comment 12 Barry Donahue 2010-09-24 14:28:37 EDT
Any feedback from Voltaire on this BZ?
Comment 13 Or Gerlitz 2010-09-25 17:57:26 EDT
(In reply to comment #12)
> Any feedback from Voltaire on this BZ?

Yes, I tested with -30 (i.e 2.6.32-30.el6) and multi-pathing worked well with iser
Comment 14 releng-rhel@redhat.com 2010-11-11 11:13:49 EST
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.