Bug 476897 - kernel panics when attempting to rmmod the bnx2 module while it is in use.
Summary: kernel panics when attempting to rmmod the bnx2 module while it is in use.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 5.4
Assignee: Andy Gospodarek
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On: 475567
Blocks: RHEL5u3_relnotes 458757 483701 483784 485920 502021
TreeView+ depends on / blocked
 
Reported: 2008-12-17 20:17 UTC by Mike Gahagan
Modified: 2014-06-29 23:00 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If jumbo frames are enabled on your system, a kernel panic will occur if you attempt to unload the bnx2 module.
Clone Of:
Environment:
Last Closed: 2009-09-02 08:14:22 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
bnx2 patch (598 bytes, patch)
2008-12-18 02:17 UTC, Michael Chan
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Mike Gahagan 2008-12-17 20:17:37 UTC
Description of problem:
From problem reported during verification of bz 470625
https://bugzilla.redhat.com/show_bug.cgi?id=470625#c18


We found an issue while under Xen. A kernel panic will occur if the following
conditions are met:

1) Jumbo frames are enabled on both ports of a 5709C
2) Both interfaces are brought up.
3) Driver is unloaded from memory (rmmod)
   Kernel panic will occur

Version-Release number of selected component (if applicable):
RHEL 5.3 snapshot 6 (likely others as well)

How reproducible:
not certain, likely to be pretty reproduceable.

Steps to Reproduce:
1.see description
2.
3.
  
Actual results:
attempts to unload modules in use should fail rather than panic'ing the system

Expected results:
attempt to unload module in use should fail, system remains up and running.

Additional info:
very likely to be a module refcounting bug in this driver according to Neil.

Comment 1 Neil Horman 2008-12-17 20:23:25 UTC
Joe, as per your last comment in bz 470625, do you have a backtrace of the panic that you saw there?

Comment 2 Andy Gospodarek 2008-12-17 20:45:48 UTC
I'd like to see that backtrace, too.  I was thinking this was related to the freeing of dummy_netdevs, but don't see any immediate problems.

Comment 3 Joe T 2008-12-17 22:34:05 UTC
We didn't have the trace captured, so it had to be reproduced:

Red Hat Enterprise Linux Server release 5.3 Beta (Tikanga)
Kernel 2.6.18-126.el5xen on an x86_64


login: root
Password: 
Last login: Wed Dec 17 21:29:49 on tty1
[root@RHEL53b ~]# ethtool -i eth4
driver: bnx2
version: 1.7.9-1
firmware-version: 1.9.6
bus-info: 0000:04:00.0
[root@RHEL53b ~]# ifconfig eth4 mtu 9000 up
[root@RHEL53b ~]# dhclient eth4
Internet Systems Consortium DHCP Client V3.0.5-RedHat
Copyright 2004-2006 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/

Listening on LPF/eth4/00:1f:29:e6:d8:56
Sending on   LPF/eth4/00:1f:29:e6:d8:56
Sending on   Socket/fallback
DHCPREQUEST on eth4 to 255.255.255.255 port 67
DHCPREQUEST on eth4 to 255.255.255.255 port 67
DHCPACK from 172.16.10.100
bound to 172.16.99.158 -- renewal in 40276 seconds.
[root@RHEL53b ~]# lspci | grep 04:00
04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
[root@RHEL53b ~]# rmmod bnx2
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6
PGD 557e4067 PUD 56859067 PMD 0 
Oops: 0002 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:03.0/class
CPU 0 
Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand powernow_k8 freq_table dm_mirror dm_log dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport pcspkr serio_raw hpilo i2c_piix4 serial_core i2c_core bnx2 ide_cd cdrom shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 3414, comm: rmmod Not tainted 2.6.18-126.el5xen #1
RIP: e030:[<ffffffff8027cc53>]  [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6
RSP: e02b:ffff880057849cf8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffff8068fa40 R09: 0000000000000000
R10: ffff880057849cf8 R11: 0000000000000048 R12: 0000000000000001
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
FS:  00002af1a13aa6e0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process rmmod (pid: 3414, threadinfo ffff880057848000, task ffff880069b54100)
Stack:  ffff880057849d40  0000000000000001  0000000000000000  0000000000007ff0 
 ffffffff8068ea40  0000000000000001  0000000000000000  0000000000007ff0 
 0000000000000000  ffffffff804eac00 
Call Trace:
 [<ffffffff80271292>] dma_free_coherent+0x69/0x77
 [<ffffffff8810b054>] :bnx2:bnx2_free_mem+0x12d/0x228
 [<ffffffff8810cc70>] :bnx2:bnx2_close+0x59/0x7a
 [<ffffffff80410d11>] dev_close+0x53/0x72
 [<ffffffff80410db9>] unregister_netdevice+0x89/0x21b
 [<ffffffff80410f5c>] unregister_netdev+0x11/0x17
 [<ffffffff8810e072>] :bnx2:bnx2_remove_one+0x30/0x8e
 [<ffffffff80346fc8>] pci_device_remove+0x24/0x3a
 [<ffffffff803a08a4>] __device_release_driver+0x9f/0xc3
 [<ffffffff803a0c44>] driver_detach+0xad/0x101
 [<ffffffff8039fe62>] bus_remove_driver+0x6d/0x90
 [<ffffffff803a0ccb>] driver_unregister+0xd/0x16
 [<ffffffff80347157>] pci_unregister_driver+0x10/0x5f
 [<ffffffff8029f9c2>] sys_delete_module+0x196/0x1c5
 [<ffffffff8025f2f9>] tracesys+0xab/0xb6


Code: f3 aa 48 c7 c7 00 31 53 80 e8 8f 6d fe ff 49 89 c3 48 b8 ff 
RIP  [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6
 RSP <ffff880057849cf8>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception
 (XEN) Domain 0 crashed: rebooting machine in 5 seconds.

Comment 4 Michael Chan 2008-12-18 02:17:49 UTC
Created attachment 327295 [details]
bnx2 patch

This problem is likely caused by a bug in bnx2's bnx2_free_rx_mem() and the attached patch should fix it.  I'll do more testing and will post the patch upstream.  Thanks.

Comment 6 Linda Wang 2008-12-18 03:57:34 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
On RHEL5.3, when removing the bnx2 driver module,a kernel panic will occur.

Comment 7 Andy Gospodarek 2008-12-18 15:38:06 UTC
That release note language seems a bit scary.  Can we narrow this down to something that only happens with jumbo frames?

Comment 8 Michael Chan 2008-12-18 18:08:05 UTC
Agreed with Andy.  The patch has been verified to fix the issue and upstream patch has also been accepted.  Thanks.

Comment 9 Andy Gospodarek 2008-12-18 18:51:23 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-On RHEL5.3, when removing the bnx2 driver module,a kernel panic will occur.+On RHEL5.3, when removing the bnx2 driver module while using jumbo frames, a kernel panic will occur.

Comment 10 Linda Wang 2008-12-18 21:24:56 UTC
thanks Andy, Mike. I was just trying to get the release notes going.
FWIW, since comment#4 hasn't been posted for review and integrated into
the kernel, we can't really move this bug to VERIFIED. 
So move this bug back to ASSI for 5.4 processing. 

Thanks again.

Comment 11 Don Domingo 2009-01-14 03:18:33 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-On RHEL5.3, when removing the bnx2 driver module while using jumbo frames, a kernel panic will occur.+If jumbo frames are enabled on your system, a kernel panic will occur if you attempt to unload the bnx2 module.

Comment 13 RHEL Program Management 2009-02-16 15:19:06 UTC
Updating PM score.

Comment 15 Don Zickus 2009-04-27 15:58:45 UTC
in kernel-2.6.18-141.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 17 Joe T 2009-04-29 19:42:52 UTC
Issue no longer seen in kernel-xen-2.6.18-141.el5.x86_64.rpm (bnx2 v1.9.3)

Comment 20 errata-xmlrpc 2009-09-02 08:14:22 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.