Description of problem:
The current iser disconnection flow actually blocking the iscsid daemon till
a disconnect or timeout event are delivered by the underlying IB CM, where this would take up to 100 seconds when that target isn't reachable. As a result, the whole iscsi stack isn't responsive during that time and DM multi-path failover time includes waiting for the timeout.
The proposed patch @ http://marc.info/?l=linux-rdma&m=127306994909954 fixes that. For clarity and under the upstream kernel conventions, it was sent as a patch series, but this patch is the core, for RHEL6 I will attach it here as one patch.
Version-Release number of selected component (if applicable):
please apply to RHEL6
Steps to Reproduce:
1. discover the same disk/lun through two different paths
2. start the DM multipath daemon
3. issue IO over the multipath device (e.g dm-0)
4. take down an HW element (e.g HCA/Switch) port used by one of the paths
fail-over will be slow, about 130 seconds
under the settings recommended by the multipath section of the README provided by the iscsi-initiator-utils rpm, failover would take about 30 seconds, this happens when the patch is applied.
see example fail-over times before/after the patch and some more details at the patch description
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release. Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release. This request is not yet committed for
Created attachment 411882 [details]
patch that fixes the problem
validated with RHEL6 beta (Santiago), kernel 2.6.32-19.el6.x86_64 and DM multipath
I added the proposed upstream patches into one patch, its 99% the 3rd patch, the other two are really tiny.
[PATCH 1/3] ib/iser: add event handler
[PATCH 2/3] ib/iser: remove buggy back-pointer setting
[PATCH 3/3] ib/iser: enhance disconnection logic for multi-pathing
We can set up multipath to the Equallogic array. This will have to be done manually on a system with multiple NICs. We can't actually bounce the switch port but we could do a cable pull at the NIC. So, I guess we could do it.
You actually need infinniband to test it, so the EQL target will not work. You need scsi-target-utils and then 2 boxes with ib cards.
We are working closely with the Voltaire guys. They have been testing iser in RHEL6, and this patch was made by them as a result of their testing.
They have said they will re-test (they made and tested the patch before they sent it) the patch when it gets merged.
(In reply to comment #6)
> They have said they will re-test (they made and tested the patch before they
> sent it) the patch when it gets merged
sure, once we have access to RHEL kernel that has the patch merged, I will test multi-pathing and update here
(In reply to comment #8)
> Patch(es) available on kernel-2.6.32-29.el6
The latest kernel @ http://people.redhat.com/arozansk/el6 is 2.6.32-19.el6 ... how do I get the -29 kernel?
Any feedback from Voltaire on this BZ?
(In reply to comment #12)
> Any feedback from Voltaire on this BZ?
Yes, I tested with -30 (i.e 2.6.32-30.el6) and multi-pathing worked well with iser
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.