Description of problem: kernel panics after invoking "multipath -v2" with 64 iSCSI paths established via 8 ethernet ports to 8 volumes on an EqualLogic array. Version-Release number of selected component (if applicable): RHEL5U3 How reproducible: Always Steps to Reproduce: 1. Create 8 volumes on EqualLogic array 2. Login in to each volume from each port (iscsiadm) 3. Invoke multipath using "multipath -v2" Actual results: kernel panic Expected results: manage multiple paths without kernel panic. Additional info:
Created attachment 340994 [details] multipath config file. Adding multipath config for reference.
Created attachment 340996 [details] console log of panic console log of panic.
Created attachment 341466 [details] do not bug when nop is sent while cleaning up session The session recovery code would set the stop bits, then suspend the recv path, so if a nop was being sent in response to a nop from the recv path we might hit the bug on in the conn send pdu path. Please try the kernel here http://people.redhat.com/dzickus/el5/141.el5/ with this patch (this is the proposed update to the iscsi layer for 5.4): http://people.redhat.com/mchristi/iscsi/rhel5.4/v3/0001-RHEL-5.4-update-iscsi-layer-and-drivers.patch And then apply the patch that I attached to this bugzilla handle-nops-that-race-with-recovery.patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Thanks for the quick turn. I want be able to get to it probably until the end of the week. The array's are tied up with something else right now. Rob
(In reply to comment #3) I do have the same problem, but on a different architecture. I am using x86_64 and the system panics with exactly the same message. The hardware I'm using are EqualLogic Arrays and Dell R610 servers. I tried to build a kernel RPM with the herein supplied patches, but it failed while testing the ABI whitelists. So I installed the compiled kernel by hand and rebootet into it. Since then the login process to the arrays does not panic anymore.
Thanks for testing. Last night I made some rpms here: http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/ It has this fix and some other fixes for issues we found upstream that I plan on submitting for RHEL 5.4.
(In reply to comment #7) > Last night I made some rpms here: > http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/ > It has this fix and some other fixes for issues we found upstream that I plan > on submitting for RHEL 5.4. Well, the kernel boots and I did let it run during the last weekend on the machines and all is still working fine so far. But there was a panic during the first reboot of the machines, that I could not catch (looping panic messages of he OOM killer) and I do not know where that originated. After a second reboot all was fine and I was not able to reproduce the panics.
Looks good here Mike. I'm able to multipath 64 paths using 8 nics and iface.
Is 2.6.18-157.el5PAE the recommended fix for this? I'm currently experiencing this same problem on 2.6.18-128.1.10.el5PAE with PE1950's and an EL-PS70E.
Joey, RHEL5.4 will have this fix (2.6.18-157.el is a test kernel for 5.4). As for released errata, this was included in kernel-2.6.18-128.1.14: http://rhn.redhat.com/errata/RHSA-2009-1106.html Can you try updating to this and let us know if the problem persists?
I can confirm that 2.6.18-128.1.14.el5PAE fixes this problem (else it wouldn't have been in the errata, right? ;) ). 2.6.18-128.1.16.el5PAE also tests good.
I've tested an EqualLogic array against several U4 kernels. I have also tested the U4 iSCSI patches and iscsi-utils from U4 on a RHEL5U3 kernel. The issue that I was experiencing has been resovled.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html