Bug 497411 - kernel BUG at drivers/scsi/libiscsi.c:301!
kernel BUG at drivers/scsi/libiscsi.c:301!
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
i686 Linux
urgent Severity medium
: rc
: ---
Assigned To: Mike Christie
Red Hat Kernel QE team
: ZStream
Depends On:
Blocks: 502916
  Show dependency treegraph
 
Reported: 2009-04-23 15:21 EDT by rob_thomas
Modified: 2010-10-23 05:11 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:09:55 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
multipath config file. (1.09 KB, application/octet-stream)
2009-04-23 15:23 EDT, rob_thomas
no flags Details
console log of panic (46.19 KB, application/octet-stream)
2009-04-23 15:24 EDT, rob_thomas
no flags Details
do not bug when nop is sent while cleaning up session (1.36 KB, patch)
2009-04-27 14:06 EDT, Mike Christie
no flags Details | Diff

  None (edit)
Description rob_thomas 2009-04-23 15:21:13 EDT
Description of problem:
kernel panics after invoking "multipath -v2" with 64 iSCSI paths established via 8 ethernet ports to 8 volumes on an EqualLogic array.

Version-Release number of selected component (if applicable):
RHEL5U3

How reproducible:
Always

Steps to Reproduce:
1.  Create 8 volumes on EqualLogic array
2.  Login in to each volume from each port (iscsiadm)
3.  Invoke multipath using "multipath -v2"
  
Actual results:
kernel panic

Expected results:
manage multiple paths without kernel panic.

Additional info:
Comment 1 rob_thomas 2009-04-23 15:23:12 EDT
Created attachment 340994 [details]
multipath config file.

Adding multipath config for reference.
Comment 2 rob_thomas 2009-04-23 15:24:17 EDT
Created attachment 340996 [details]
console log of panic

console log of panic.
Comment 3 Mike Christie 2009-04-27 14:06:49 EDT
Created attachment 341466 [details]
do not bug when nop is sent while cleaning up session

The session recovery code would set the stop bits, then suspend the recv path, so if a nop was being sent in response to a nop from the recv path we might hit the bug on in the conn send pdu path.


Please try the kernel here
http://people.redhat.com/dzickus/el5/141.el5/

with this patch (this is the proposed update to the iscsi layer for 5.4):
http://people.redhat.com/mchristi/iscsi/rhel5.4/v3/0001-RHEL-5.4-update-iscsi-layer-and-drivers.patch

And then apply the patch that I attached to this bugzilla
handle-nops-that-race-with-recovery.patch
Comment 4 RHEL Product and Program Management 2009-04-27 14:29:05 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 5 rob_thomas 2009-04-27 15:28:14 EDT
Thanks for the quick turn.  I want be able to get to it probably until the end of the week.  The array's are tied up with something else right now.

Rob
Comment 6 Daniel Schlenzig 2009-05-14 03:34:42 EDT
(In reply to comment #3)

I do have the same problem, but on a different architecture. I am using x86_64 and the system panics with exactly the same message. The hardware I'm using are EqualLogic Arrays and Dell R610 servers.

I tried to build a kernel RPM with the herein supplied patches, but it failed while testing the ABI whitelists. So I installed the compiled kernel by hand and rebootet into it. Since then the login process to the arrays does not panic anymore.
Comment 7 Mike Christie 2009-05-14 10:42:57 EDT
Thanks for testing.

Last night I made some rpms here:
http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/
It has this fix and some other fixes for issues we found upstream that I plan on submitting for RHEL 5.4.
Comment 8 Daniel Schlenzig 2009-05-18 04:19:26 EDT
(In reply to comment #7)

> Last night I made some rpms here:
> http://people.redhat.com/mchristi/iscsi/rhel5.4/test/kernels/
> It has this fix and some other fixes for issues we found upstream that I plan
> on submitting for RHEL 5.4.  
Well, the kernel boots and I did let it run during the last weekend on the machines and all is still working fine so far. But there was a panic during the first reboot of the machines, that I could not catch (looping panic messages of he OOM killer) and I do not know where that originated. After a second reboot all was fine and I was not able to reproduce the panics.
Comment 9 rob_thomas 2009-05-19 16:44:20 EDT
Looks good here Mike.  I'm able to multipath 64 paths using 8 nics and iface.
Comment 20 Joey Trungale 2009-07-07 14:27:23 EDT
Is 2.6.18-157.el5PAE the recommended fix for this?  I'm currently experiencing this same problem on 2.6.18-128.1.10.el5PAE with PE1950's and an EL-PS70E.
Comment 21 Evan McNabb 2009-07-07 14:44:31 EDT
Joey,

RHEL5.4 will have this fix (2.6.18-157.el is a test kernel for 5.4). As for released errata, this was included in kernel-2.6.18-128.1.14:

http://rhn.redhat.com/errata/RHSA-2009-1106.html

Can you try updating to this and let us know if the problem persists?
Comment 22 Joey Trungale 2009-07-07 16:29:18 EDT
I can confirm that 2.6.18-128.1.14.el5PAE fixes this problem (else it wouldn't have been in the errata, right? ;) ). 2.6.18-128.1.16.el5PAE also tests good.
Comment 23 rob_thomas 2009-07-31 10:08:12 EDT
I've tested an EqualLogic array against several U4 kernels.  I have also tested the U4 iSCSI patches and iscsi-utils from U4 on a RHEL5U3 kernel.  The issue that I was experiencing has been resovled.
Comment 25 errata-xmlrpc 2009-09-02 04:09:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.