Description of problem: Panic in uio_release() when doing repeated ifup/ifdown in a loop with iSCSI offload connections Version-Release number of selected component (if applicable): How reproducible: Happens in about 30 minutes Steps to Reproduce: 1. Login to 40 or so iSCSI targets 2. Run continuous ifup/ifdown script 3. Actual results: PID: 10559 TASK: ffff81024eb1f0c0 CPU: 0 COMMAND: "brcm_iscsiuio" #0 [ffff8102289e5cc0] crash_kexec at ffffffff800af83a #1 [ffff8102289e5d80] __die at ffffffff80065117 #2 [ffff8102289e5dc0] die at ffffffff8006c73a #3 [ffff8102289e5df0] do_general_protection at ffffffff8006555f #4 [ffff8102289e5e30] error_exit at ffffffff8005dde9 [exception RIP: uio_release+25] RIP: ffffffff88501240 RSP: ffff8102289e5ee8 RFLAGS: 00010246 RAX: ffffffff88501227 RBX: ffff81022a066d80 RCX: 0000000000000000 RDX: ffff81023e655458 RSI: ffff8102328b6480 RDI: 00010102464c457f RBP: ffff8102408df520 R8: 0000000000000000 R9: 000000004c672940 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: ffff81023e655458 R14: ffff81024eac6d80 R15: ffff81022ea34d20 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #5 [ffff8102289e5f00] __fput at ffffffff80012b17 #6 [ffff8102289e5f40] filp_close at ffffffff80023c46 #7 [ffff8102289e5f60] sys_close at ffffffff8001e126 #8 [ffff8102289e5f80] tracesys at ffffffff8005d28d (via system_call) RIP: 0000003f73a0d987 RSP: 000000004c671eb0 RFLAGS: 00000202 RAX: ffffffffffffffda RBX: ffffffff8005d28d RCX: ffffffffffffffff RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000009 RBP: 0000000009753bf0 R8: 000000000000000a R9: 000000004c672940 R10: 0000000000000000 R11: 0000000000000202 R12: 00002aaaaaac1000 R13: 00000000097534e0 R14: ffffffff8001e126 R15: ffff8102328b6480 ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b Expected results: Additional info: The panic is caused by cnic unregistering the uio device before brcmiscsi_uio has closed the uio device in userspace. Userspace can run slowly and the wait in the cnic driver may not be long enough. 3 moderately sized upstream patches should fix this issue: commit a3ceeeb8f11d74f26e3dfca40ded911a82402db5 cnic: Decouple uio close from cnic shutdown commit cd801536c236e287f1d3eeee428abf9ffd523ede cnic: Add cnic_uio_dev struct commit c06c0462250a5dbc9e58d00caab4cd7e6675128c cnic: Add cnic_free_uio()
Mike - if this is agreeable to you, can you give it a devel_ack?
Michael, Please attached a tested patchset (one patch per change) to this bz. I will send it for 5.6. Thanks.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 461250 [details] [PATCH 1/4] cnic: Fine-tune ring init code
Created attachment 461251 [details] [PATCH 2/4] cnic: Add cnic_free_uio()
Created attachment 461252 [details] [PATCH 3/4] cnic: Add cnic_uio_dev struct
Created attachment 461253 [details] [PATCH 4/4] cnic: Decouple uio close from cnic shutdown
in kernel-2.6.18-233.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
The issue has been verified with kernel-2.6.18-233.el5.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html