Bug 589174

Summary: fix iscsi/iser functioning and failover time under DM multipath
Product: Red Hat Enterprise Linux 6 Reporter: Or Gerlitz <ogerlitz>
Component: kernelAssignee: Mike Christie <mchristi>
Status: CLOSED CURRENTRELEASE QA Contact: Barry Donahue <bdonahue>
Severity: high Docs Contact:
Priority: low    
Version: 6.0CC: bdonahue, mchristi, qcai, yuji.furui
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-11 16:13:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 534151    
Attachments:
Description Flags
patch that fixes the problem none

Description Or Gerlitz 2010-05-05 15:08:02 UTC
Description of problem:

The current iser disconnection flow actually blocking the iscsid daemon till
a disconnect or timeout event are delivered by the underlying IB CM, where this would take up to 100 seconds when that target isn't reachable. As a result, the whole iscsi stack isn't responsive during that time and DM multi-path failover time includes waiting for the timeout. 

The proposed patch @ http://marc.info/?l=linux-rdma&m=127306994909954 fixes that. For clarity and under the upstream kernel conventions, it was sent as a patch series, but this patch is the core, for RHEL6 I will attach it here as one patch.

Version-Release number of selected component (if applicable): 

please apply to RHEL6

How reproducible:

Always!

Steps to Reproduce:

1. discover the same disk/lun through two different paths 
2. start the DM multipath daemon
3. issue IO over the multipath device (e.g dm-0)
4. take down an HW element (e.g HCA/Switch) port used by one of the paths
  
Actual results:

fail-over will be slow, about 130 seconds

Expected results:

under the settings recommended by the multipath section of the README provided by the iscsi-initiator-utils rpm, failover would take about 30 seconds, this happens when the patch is applied.

Additional info:

see example fail-over times before/after the patch and some more details at the patch description

Comment 2 RHEL Program Management 2010-05-05 16:14:34 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 3 Or Gerlitz 2010-05-06 08:56:56 UTC
Created attachment 411882 [details]
patch that fixes the problem

validated with RHEL6 beta (Santiago), kernel 2.6.32-19.el6.x86_64 and DM multipath
being device-mapper-multipath-0.4.9-12.el6.x86_64 

I added the proposed upstream patches into one patch, its 99% the 3rd patch, the other two are really tiny.

[PATCH 1/3] ib/iser: add event handler
http://marc.info/?l=linux-rdma&m=127306981509687
[PATCH 2/3] ib/iser: remove buggy back-pointer setting
http://marc.info/?l=linux-rdma&m=127306983909726
[PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing
http://marc.info/?l=linux-rdma&m=127306994909954

Comment 5 Barry Donahue 2010-05-24 13:02:21 UTC
We can set up  multipath to the Equallogic array. This will have to be done manually on a system with multiple NICs. We can't actually bounce the switch port but we could do a cable pull at the NIC. So, I guess we could do it.

Comment 6 Mike Christie 2010-05-24 20:17:59 UTC
You actually need infinniband to test it, so the EQL target will not work. You need scsi-target-utils and then 2 boxes with ib cards.

We are working closely with the Voltaire guys. They have been testing iser in RHEL6, and this patch was made by them as a result of their testing.

They have said they will re-test (they made and tested the patch before they sent it) the patch when it gets merged.

Comment 7 Or Gerlitz 2010-05-25 07:41:34 UTC
(In reply to comment #6)
> They have said they will re-test (they made and tested the patch before they
> sent it) the patch when it gets merged

sure, once we have access to RHEL kernel that has the patch merged, I will test multi-pathing and update here

Comment 10 Or Gerlitz 2010-05-25 17:54:31 UTC
(In reply to comment #8)
> Patch(es) available on kernel-2.6.32-29.el6    

The latest kernel @ http://people.redhat.com/arozansk/el6 is 2.6.32-19.el6 ... how do I get the -29 kernel?

Comment 12 Barry Donahue 2010-09-24 18:28:37 UTC
Any feedback from Voltaire on this BZ?

Comment 13 Or Gerlitz 2010-09-25 21:57:26 UTC
(In reply to comment #12)
> Any feedback from Voltaire on this BZ?

Yes, I tested with -30 (i.e 2.6.32-30.el6) and multi-pathing worked well with iser

Comment 14 releng-rhel@redhat.com 2010-11-11 16:13:49 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.