651287 – [Broadcom 5.6 bug] cnic: Panic in uio_release()

Bug 651287 - [Broadcom 5.6 bug] cnic: Panic in uio_release()

Summary: [Broadcom 5.6 bug] cnic: Panic in uio_release()

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.6
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	5.6
Assignee:	Mike Christie
QA Contact:	Network QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-11-09 07:07 UTC by Michael Chan
Modified:	2011-01-13 22:00 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2011-01-13 22:00:10 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
[PATCH 1/4] cnic: Fine-tune ring init code (3.46 KB, patch) 2010-11-18 09:46 UTC, Michael Chan	no flags	Details \| Diff
[PATCH 2/4] cnic: Add cnic_free_uio() (1.83 KB, patch) 2010-11-18 09:47 UTC, Michael Chan	no flags	Details \| Diff
[PATCH 3/4] cnic: Add cnic_uio_dev struct (17.19 KB, patch) 2010-11-18 09:48 UTC, Michael Chan	no flags	Details \| Diff
[PATCH 4/4] cnic: Decouple uio close from cnic shutdown (5.28 KB, patch) 2010-11-18 09:49 UTC, Michael Chan	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2011:0017	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update	2011-01-13 10:37:42 UTC

Description Michael Chan 2010-11-09 07:07:10 UTC

Description of problem: Panic in uio_release() when doing repeated ifup/ifdown in a loop with iSCSI offload connections


Version-Release number of selected component (if applicable):


How reproducible: Happens in about 30 minutes


Steps to Reproduce:
1. Login to 40 or so iSCSI targets
2. Run continuous ifup/ifdown script
3.
  
Actual results:

PID: 10559  TASK: ffff81024eb1f0c0  CPU: 0   COMMAND: "brcm_iscsiuio"
 #0 [ffff8102289e5cc0] crash_kexec at ffffffff800af83a
 #1 [ffff8102289e5d80] __die at ffffffff80065117
 #2 [ffff8102289e5dc0] die at ffffffff8006c73a
 #3 [ffff8102289e5df0] do_general_protection at ffffffff8006555f
 #4 [ffff8102289e5e30] error_exit at ffffffff8005dde9
    [exception RIP: uio_release+25]
    RIP: ffffffff88501240  RSP: ffff8102289e5ee8  RFLAGS: 00010246
    RAX: ffffffff88501227  RBX: ffff81022a066d80  RCX: 0000000000000000
    RDX: ffff81023e655458  RSI: ffff8102328b6480  RDI: 00010102464c457f
    RBP: ffff8102408df520   R8: 0000000000000000   R9: 000000004c672940
    R10: 0000000000000000  R11: 0000000000000202  R12: 0000000000000000
    R13: ffff81023e655458  R14: ffff81024eac6d80  R15: ffff81022ea34d20
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff8102289e5f00] __fput at ffffffff80012b17
 #6 [ffff8102289e5f40] filp_close at ffffffff80023c46
 #7 [ffff8102289e5f60] sys_close at ffffffff8001e126
 #8 [ffff8102289e5f80] tracesys at ffffffff8005d28d (via system_call)
    RIP: 0000003f73a0d987  RSP: 000000004c671eb0  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: ffffffff8005d28d  RCX: ffffffffffffffff
    RDX: 0000000000000002  RSI: 0000000000000000  RDI: 0000000000000009
    RBP: 0000000009753bf0   R8: 000000000000000a   R9: 000000004c672940
    R10: 0000000000000000  R11: 0000000000000202  R12: 00002aaaaaac1000
    R13: 00000000097534e0  R14: ffffffff8001e126  R15: ffff8102328b6480
    ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b




Expected results:


Additional info:

The panic is caused by cnic unregistering the uio device before brcmiscsi_uio has closed the uio device in userspace.  Userspace can run slowly and the wait in the cnic driver may not be long enough.

3 moderately sized upstream patches should fix this issue:

commit	a3ceeeb8f11d74f26e3dfca40ded911a82402db5

    cnic: Decouple uio close from cnic shutdown 

commit	cd801536c236e287f1d3eeee428abf9ffd523ede

    cnic: Add cnic_uio_dev struct

commit	c06c0462250a5dbc9e58d00caab4cd7e6675128c

    cnic: Add cnic_free_uio()

Comment 1 Andrius Benokraitis 2010-11-15 18:33:32 UTC

Mike - if this is agreeable to you, can you give it a devel_ack?

Comment 2 Mike Christie 2010-11-16 00:06:41 UTC

Michael,

Please attached a tested patchset (one patch per change) to this bz. I will send it for 5.6. Thanks.

Comment 3 RHEL Program Management 2010-11-16 00:09:14 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Michael Chan 2010-11-18 09:46:07 UTC

Created attachment 461250 [details]
[PATCH 1/4] cnic: Fine-tune ring init code

Comment 5 Michael Chan 2010-11-18 09:47:40 UTC

Created attachment 461251 [details]
[PATCH 2/4] cnic: Add cnic_free_uio()

Comment 6 Michael Chan 2010-11-18 09:48:36 UTC

Created attachment 461252 [details]
[PATCH 3/4] cnic: Add cnic_uio_dev struct

Comment 7 Michael Chan 2010-11-18 09:49:53 UTC

Created attachment 461253 [details]
[PATCH 4/4] cnic: Decouple uio close from cnic shutdown

Comment 13 Jarod Wilson 2010-11-23 17:05:57 UTC

in kernel-2.6.18-233.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 15 edwardn 2010-11-23 22:53:56 UTC

The issue has been verified with kernel-2.6.18-233.el5.

Comment 17 errata-xmlrpc 2011-01-13 22:00:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html

Note You need to log in before you can comment on or make changes to this bug.