Bug 497411 - kernel BUG at drivers/scsi/libiscsi.c:301!
Summary: kernel BUG at drivers/scsi/libiscsi.c:301!
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: i686
OS: Linux
urgent
medium
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 502916
TreeView+ depends on / blocked
 
Reported: 2009-04-23 19:21 UTC by rob_thomas
Modified: 2018-10-20 02:02 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 08:09:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
multipath config file. (1.09 KB, application/octet-stream)
2009-04-23 19:23 UTC, rob_thomas
no flags Details
console log of panic (46.19 KB, application/octet-stream)
2009-04-23 19:24 UTC, rob_thomas
no flags Details
do not bug when nop is sent while cleaning up session (1.36 KB, patch)
2009-04-27 18:06 UTC, Mike Christie
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description rob_thomas 2009-04-23 19:21:13 UTC
Description of problem:
kernel panics after invoking "multipath -v2" with 64 iSCSI paths established via 8 ethernet ports to 8 volumes on an EqualLogic array.

Version-Release number of selected component (if applicable):
RHEL5U3

How reproducible:
Always

Steps to Reproduce:
1.  Create 8 volumes on EqualLogic array
2.  Login in to each volume from each port (iscsiadm)
3.  Invoke multipath using "multipath -v2"
  
Actual results:
kernel panic

Expected results:
manage multiple paths without kernel panic.

Additional info:

Comment 1 rob_thomas 2009-04-23 19:23:12 UTC
Created attachment 340994 [details]
multipath config file.

Adding multipath config for reference.

Comment 2 rob_thomas 2009-04-23 19:24:17 UTC
Created attachment 340996 [details]
console log of panic

console log of panic.

Comment 3 Mike Christie 2009-04-27 18:06:49 UTC
Created attachment 341466 [details]
do not bug when nop is sent while cleaning up session

The session recovery code would set the stop bits, then suspend the recv path, so if a nop was being sent in response to a nop from the recv path we might hit the bug on in the conn send pdu path.


Please try the kernel here
http://people.redhat.com/dzickus/el5/141.el5/

with this patch (this is the proposed update to the iscsi layer for 5.4):
http://people.redhat.com/mchristi/iscsi/rhel5.4/v3/0001-RHEL-5.4-update-iscsi-layer-and-drivers.patch

And then apply the patch that I attached to this bugzilla
handle-nops-that-race-with-recovery.patch

Comment 4 RHEL Program Management 2009-04-27 18:29:05 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 5 rob_thomas 2009-04-27 19:28:14 UTC
Thanks for the quick turn.  I want be able to get to it probably until the end of the week.  The array's are tied up with something else right now.

Rob

Comment 6 Daniel Schlenzig 2009-05-14 07:34:42 UTC
(In reply to comment #3)

I do have the same problem, but on a different architecture. I am using x86_64 and the system panics with exactly the same message. The hardware I'm using are EqualLogic Arrays and Dell R610 servers.

I tried to build a kernel RPM with the herein supplied patches, but it failed while testing the ABI whitelists. So I installed the compiled kernel by hand and rebootet into it. Since then the login process to the arrays does not panic anymore.

Comment 7 Mike Christie 2009-05-14 14:42:57 UTC
Thanks for testing.

Last night I made some rpms here:
http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/
It has this fix and some other fixes for issues we found upstream that I plan on submitting for RHEL 5.4.

Comment 8 Daniel Schlenzig 2009-05-18 08:19:26 UTC
(In reply to comment #7)

> Last night I made some rpms here:
> http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/
> It has this fix and some other fixes for issues we found upstream that I plan
> on submitting for RHEL 5.4.  
Well, the kernel boots and I did let it run during the last weekend on the machines and all is still working fine so far. But there was a panic during the first reboot of the machines, that I could not catch (looping panic messages of he OOM killer) and I do not know where that originated. After a second reboot all was fine and I was not able to reproduce the panics.

Comment 9 rob_thomas 2009-05-19 20:44:20 UTC
Looks good here Mike.  I'm able to multipath 64 paths using 8 nics and iface.

Comment 20 Joey Trungale 2009-07-07 18:27:23 UTC
Is 2.6.18-157.el5PAE the recommended fix for this?  I'm currently experiencing this same problem on 2.6.18-128.1.10.el5PAE with PE1950's and an EL-PS70E.

Comment 21 Evan McNabb 2009-07-07 18:44:31 UTC
Joey,

RHEL5.4 will have this fix (2.6.18-157.el is a test kernel for 5.4). As for released errata, this was included in kernel-2.6.18-128.1.14:

http://rhn.redhat.com/errata/RHSA-2009-1106.html

Can you try updating to this and let us know if the problem persists?

Comment 22 Joey Trungale 2009-07-07 20:29:18 UTC
I can confirm that 2.6.18-128.1.14.el5PAE fixes this problem (else it wouldn't have been in the errata, right? ;) ). 2.6.18-128.1.16.el5PAE also tests good.

Comment 23 rob_thomas 2009-07-31 14:08:12 UTC
I've tested an EqualLogic array against several U4 kernels.  I have also tested the U4 iSCSI patches and iscsi-utils from U4 on a RHEL5U3 kernel.  The issue that I was experiencing has been resovled.

Comment 25 errata-xmlrpc 2009-09-02 08:09:55 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.