Description of problem: From problem reported during verification of bz 470625 https://bugzilla.redhat.com/show_bug.cgi?id=470625#c18 We found an issue while under Xen. A kernel panic will occur if the following conditions are met: 1) Jumbo frames are enabled on both ports of a 5709C 2) Both interfaces are brought up. 3) Driver is unloaded from memory (rmmod) Kernel panic will occur Version-Release number of selected component (if applicable): RHEL 5.3 snapshot 6 (likely others as well) How reproducible: not certain, likely to be pretty reproduceable. Steps to Reproduce: 1.see description 2. 3. Actual results: attempts to unload modules in use should fail rather than panic'ing the system Expected results: attempt to unload module in use should fail, system remains up and running. Additional info: very likely to be a module refcounting bug in this driver according to Neil.
Joe, as per your last comment in bz 470625, do you have a backtrace of the panic that you saw there?
I'd like to see that backtrace, too. I was thinking this was related to the freeing of dummy_netdevs, but don't see any immediate problems.
We didn't have the trace captured, so it had to be reproduced: Red Hat Enterprise Linux Server release 5.3 Beta (Tikanga) Kernel 2.6.18-126.el5xen on an x86_64 login: root Password: Last login: Wed Dec 17 21:29:49 on tty1 [root@RHEL53b ~]# ethtool -i eth4 driver: bnx2 version: 1.7.9-1 firmware-version: 1.9.6 bus-info: 0000:04:00.0 [root@RHEL53b ~]# ifconfig eth4 mtu 9000 up [root@RHEL53b ~]# dhclient eth4 Internet Systems Consortium DHCP Client V3.0.5-RedHat Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth4/00:1f:29:e6:d8:56 Sending on LPF/eth4/00:1f:29:e6:d8:56 Sending on Socket/fallback DHCPREQUEST on eth4 to 255.255.255.255 port 67 DHCPREQUEST on eth4 to 255.255.255.255 port 67 DHCPACK from 172.16.10.100 bound to 172.16.99.158 -- renewal in 40276 seconds. [root@RHEL53b ~]# lspci | grep 04:00 04:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12) [root@RHEL53b ~]# rmmod bnx2 Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6 PGD 557e4067 PUD 56859067 PMD 0 Oops: 0002 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:03.0/class CPU 0 Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc cpufreq_ondemand powernow_k8 freq_table dm_mirror dm_log dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec button battery asus_acpi ac parport_pc lp parport pcspkr serio_raw hpilo i2c_piix4 serial_core i2c_core bnx2 ide_cd cdrom shpchp cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 3414, comm: rmmod Not tainted 2.6.18-126.el5xen #1 RIP: e030:[<ffffffff8027cc53>] [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6 RSP: e02b:ffff880057849cf8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff8068fa40 R09: 0000000000000000 R10: ffff880057849cf8 R11: 0000000000000048 R12: 0000000000000001 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 00002af1a13aa6e0(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process rmmod (pid: 3414, threadinfo ffff880057848000, task ffff880069b54100) Stack: ffff880057849d40 0000000000000001 0000000000000000 0000000000007ff0 ffffffff8068ea40 0000000000000001 0000000000000000 0000000000007ff0 0000000000000000 ffffffff804eac00 Call Trace: [<ffffffff80271292>] dma_free_coherent+0x69/0x77 [<ffffffff8810b054>] :bnx2:bnx2_free_mem+0x12d/0x228 [<ffffffff8810cc70>] :bnx2:bnx2_close+0x59/0x7a [<ffffffff80410d11>] dev_close+0x53/0x72 [<ffffffff80410db9>] unregister_netdevice+0x89/0x21b [<ffffffff80410f5c>] unregister_netdev+0x11/0x17 [<ffffffff8810e072>] :bnx2:bnx2_remove_one+0x30/0x8e [<ffffffff80346fc8>] pci_device_remove+0x24/0x3a [<ffffffff803a08a4>] __device_release_driver+0x9f/0xc3 [<ffffffff803a0c44>] driver_detach+0xad/0x101 [<ffffffff8039fe62>] bus_remove_driver+0x6d/0x90 [<ffffffff803a0ccb>] driver_unregister+0xd/0x16 [<ffffffff80347157>] pci_unregister_driver+0x10/0x5f [<ffffffff8029f9c2>] sys_delete_module+0x196/0x1c5 [<ffffffff8025f2f9>] tracesys+0xab/0xb6 Code: f3 aa 48 c7 c7 00 31 53 80 e8 8f 6d fe ff 49 89 c3 48 b8 ff RIP [<ffffffff8027cc53>] xen_destroy_contiguous_region+0x83/0x3d6 RSP <ffff880057849cf8> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception (XEN) Domain 0 crashed: rebooting machine in 5 seconds.
Created attachment 327295 [details] bnx2 patch This problem is likely caused by a bug in bnx2's bnx2_free_rx_mem() and the attached patch should fix it. I'll do more testing and will post the patch upstream. Thanks.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: On RHEL5.3, when removing the bnx2 driver module,a kernel panic will occur.
That release note language seems a bit scary. Can we narrow this down to something that only happens with jumbo frames?
Agreed with Andy. The patch has been verified to fix the issue and upstream patch has also been accepted. Thanks.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -On RHEL5.3, when removing the bnx2 driver module,a kernel panic will occur.+On RHEL5.3, when removing the bnx2 driver module while using jumbo frames, a kernel panic will occur.
thanks Andy, Mike. I was just trying to get the release notes going. FWIW, since comment#4 hasn't been posted for review and integrated into the kernel, we can't really move this bug to VERIFIED. So move this bug back to ASSI for 5.4 processing. Thanks again.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -On RHEL5.3, when removing the bnx2 driver module while using jumbo frames, a kernel panic will occur.+If jumbo frames are enabled on your system, a kernel panic will occur if you attempt to unload the bnx2 module.
Updating PM score.
in kernel-2.6.18-141.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
Issue no longer seen in kernel-xen-2.6.18-141.el5.x86_64.rpm (bnx2 v1.9.3)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html