Description of problem: My host crashed during removal of Bond ( with VLAN tagging ) See additional information for call trace Version-Release number of selected component (if applicable): Linux silver-vdsa.qa.lab.tlv.redhat.com 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 EST 2009 x86_64 x86_64 x86_64 GNU/Linux How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: 2010-01-11 22:23:04,483 sw2: port 9(e1000_14_4) entering disabled state 2010-01-11 22:25:24,644 bonding: bond0: Removing slave eth2 2010-01-11 22:25:24,644 bonding: bond0: Warning: the permanent HWaddr of eth2 - 00:1D:09:68:71:4E - is still in use by bond0. Set the HWaddr of eth2 to a different address to avoid conflicts. 2010-01-11 22:25:24,644 bonding: bond0: releasing active interface eth2 2010-01-11 22:25:24,644 BUG: scheduling while atomic: ifdown-eth/0x00000100/21775 2010-01-11 22:25:24,644 2010-01-11 22:25:24,644 Call Trace: 2010-01-11 22:25:24,644 [<ffffffff8006240d>] __sched_text_start+0x7d/0xbd6 2010-01-11 22:25:24,644 [<ffffffff8009fdd8>] autoremove_wake_function+0x9/0x2e 2010-01-11 22:25:24,644 [<ffffffff8008a9ae>] __wake_up_common+0x3e/0x68 2010-01-11 22:25:24,644 [<ffffffff80063137>] wait_for_completion+0x79/0xa2 2010-01-11 22:25:24,644 [<ffffffff8008c584>] default_wake_function+0x0/0xe 2010-01-11 22:25:24,644 [<ffffffff8027d66b>] klist_next+0xf/0x56 2010-01-11 22:25:24,644 [<ffffffff8009e13f>] synchronize_rcu+0x30/0x36 2010-01-11 22:25:24,644 [<ffffffff8009dc7b>] wakeme_after_rcu+0x0/0x9 2010-01-11 22:25:24,644 [<ffffffff8851beac>] :cnic:cnic_stop_hw+0x38/0xa6 2010-01-11 22:25:24,644 [<ffffffff8851da62>] :cnic:cnic_ctl+0x2b/0x50 2010-01-11 22:25:24,644 [<ffffffff881f988d>] :bnx2:bnx2_netif_stop+0x3a/0xdf 2010-01-11 22:25:24,644 [<ffffffff881fc1e2>] :bnx2:bnx2_vlan_rx_register+0x19/0x4e 2010-01-11 22:25:24,644 [<ffffffff88750942>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 2010-01-11 22:25:24,644 [<ffffffff887527ae>] :bonding:bond_release+0x294/0x39a 2010-01-11 22:25:24,644 [<ffffffff8006457b>] __down_write_nested+0x12/0x92 2010-01-11 22:25:24,644 [<ffffffff8875a3f5>] :bonding:bonding_store_slaves+0x25c/0x2f7 2010-01-11 22:25:24,644 [<ffffffff8010ae88>] sysfs_write_file+0xb9/0xe8 2010-01-11 22:25:24,644 [<ffffffff80016942>] vfs_write+0xce/0x174 2010-01-11 22:25:24,644 [<ffffffff800171fa>] sys_write+0x45/0x6e 2010-01-11 22:25:24,644 [<ffffffff8005d28d>] tracesys+0xd5/0xe0 2010-01-11 22:25:24,644 2010-01-11 22:25:34,644 BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0] 2010-01-11 22:25:34,644 CPU 0: 2010-01-11 22:25:34,644 2010-01-11 22:25:34,644 Modules linked in: 2010-01-11 22:25:34,644 nfs 2010-01-11 22:25:34,644 fscache 2010-01-11 22:25:34,644 nfs_acl 2010-01-11 22:25:34,644 netconsole 2010-01-11 22:25:34,644 bonding 2010-01-11 22:25:34,644 tun 2010-01-11 22:25:34,644 autofs4 2010-01-11 22:25:34,644 hidp 2010-01-11 22:25:34,644 rfcomm 2010-01-11 22:25:34,644 l2cap 2010-01-11 22:25:34,644 bluetooth 2010-01-11 22:25:34,644 lockd 2010-01-11 22:25:34,644 sunrpc 2010-01-11 22:25:34,644 bridge 2010-01-11 22:25:34,644 ip_conntrack_netbios_ns 2010-01-11 22:25:34,644 ip_conntrack 2010-01-11 22:25:34,644 nfnetlink 2010-01-11 22:25:34,644 iptable_filter 2010-01-11 22:25:34,644 ip_tables 2010-01-11 22:25:34,644 ip6t_REJECT 2010-01-11 22:25:34,644 xt_tcpudp 2010-01-11 22:25:34,644 ip6table_filter 2010-01-11 22:25:34,644 ip6_tables 2010-01-11 22:25:34,644 x_tables 2010-01-11 22:25:34,644 ib_iser 2010-01-11 22:25:34,644 rdma_cm 2010-01-11 22:25:34,644 ib_cm 2010-01-11 22:25:34,644 iw_cm 2010-01-11 22:25:34,644 ib_sa 2010-01-11 22:25:34,644 ib_mad 2010-01-11 22:25:34,644 ib_core 2010-01-11 22:25:34,644 ib_addr 2010-01-11 22:25:34,644 iscsi_tcp 2010-01-11 22:25:34,644 bnx2i 2010-01-11 22:25:34,644 cnic 2010-01-11 22:25:34,644 ipv6 2010-01-11 22:25:34,644 xfrm_nalgo 2010-01-11 22:25:34,644 crypto_api 2010-01-11 22:25:34,644 uio 2010-01-11 22:25:34,644 cxgb3i 2010-01-11 22:25:34,644 cxgb3 2010-01-11 22:25:34,644 8021q 2010-01-11 22:25:34,644 libiscsi_tcp 2010-01-11 22:25:34,644 libiscsi2 2010-01-11 22:25:34,644 scsi_transport_iscsi2 2010-01-11 22:25:34,644 scsi_transport_iscsi 2010-01-11 22:25:34,644 dm_round_robin 2010-01-11 22:25:34,644 dm_multipath 2010-01-11 22:25:34,644 scsi_dh 2010-01-11 22:25:34,644 video 2010-01-11 22:25:34,644 hwmon 2010-01-11 22:25:34,644 backlight 2010-01-11 22:25:34,644 sbs 2010-01-11 22:25:34,644 i2c_ec 2010-01-11 22:25:34,644 i2c_core 2010-01-11 22:25:34,644 button 2010-01-11 22:25:34,644 battery 2010-01-11 22:25:34,644 asus_acpi 2010-01-11 22:25:34,644 acpi_memhotplug 2010-01-11 22:25:34,644 ac 2010-01-11 22:25:34,644 parport_pc 2010-01-11 22:25:34,644 lp 2010-01-11 22:25:34,644 parport 2010-01-11 22:25:34,644 ksm(U) 2010-01-11 22:25:34,644 kvm_intel(U) 2010-01-11 22:25:34,660 kvm(U) 2010-01-11 22:25:34,660 sg 2010-01-11 22:25:34,660 ide_cd 2010-01-11 22:25:34,660 i5000_edac 2010-01-11 22:25:34,660 serio_raw 2010-01-11 22:25:34,660 edac_mc 2010-01-11 22:25:34,660 e1000e 2010-01-11 22:25:34,660 cdrom 2010-01-11 22:25:34,660 bnx2 2010-01-11 22:25:34,660 pcspkr 2010-01-11 22:25:34,660 dm_raid45 2010-01-11 22:25:34,660 dm_message 2010-01-11 22:25:34,660 dm_region_hash 2010-01-11 22:25:34,660 dm_mem_cache 2010-01-11 22:25:34,660 dm_snapshot 2010-01-11 22:25:34,660 dm_zero 2010-01-11 22:25:34,660 dm_mirror 2010-01-11 22:25:34,660 dm_log 2010-01-11 22:25:34,660 dm_mod 2010-01-11 22:25:34,660 ata_piix 2010-01-11 22:25:34,660 libata 2010-01-11 22:25:34,660 shpchp 2010-01-11 22:25:34,660 mptsas 2010-01-11 22:25:34,660 mptscsih 2010-01-11 22:25:34,660 mptbase 2010-01-11 22:25:34,660 scsi_transport_sas 2010-01-11 22:25:34,660 sd_mod 2010-01-11 22:25:34,660 scsi_mod 2010-01-11 22:25:34,660 ext3 2010-01-11 22:25:34,660 jbd 2010-01-11 22:25:34,660 uhci_hcd 2010-01-11 22:25:34,660 ohci_hcd 2010-01-11 22:25:34,660 ehci_hcd 2010-01-11 22:25:34,660 2010-01-11 22:25:34,660 Pid: 0, comm: swapper Tainted: G 2.6.18-164.9.1.el5 #1 2010-01-11 22:25:34,660 RIP: 0010:[<ffffffff8006216d>] 2010-01-11 22:25:34,660 [<ffffffff8006216d>] __read_lock_failed+0x5/0x14 2010-01-11 22:25:34,660 RSP: 0018:ffffffff8043dc10 EFLAGS: 00000297 2010-01-11 22:25:34,660 RAX: ffffffff804efa50 RBX: 0000000000000001 RCX: ffff81022a938000 2010-01-11 22:25:34,660 RDX: ffff810225ee3678 RSI: ffff810225ee3000 RDI: ffff810225ee352c 2010-01-11 22:25:34,660 RBP: ffffffff8043db90 R08: 0000000000000000 R09: ffff81022f43e070
I managed to reproduce this issue on this kernel(5.5): root@silver-vdsb ~]# uname -a Linux silver-vdsb.qa.lab.tlv.redhat.com 2.6.18-183.el5 #1 SMP Mon Dec 21 18:37:42 EST 2009 x86_64 x86_64 x86_64 GNU/Linux Network adapters information: [root@silver-vdsb ~]# ethtool -i eth2 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 UMP 1.1.8 bus-info: 0000:03:00.0 [root@silver-vdsb ~]# ethtool -i eth3 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 UMP 1.1.8 bus-info: 0000:07:00.0
This looks to be specific to the use of the bnx2i and cnic drivers as well. Can you provide some more details about your configuration so we can try and reproduce it? Information like the bonding mode, vlan configuration, as well as any iscsi usage by the bnx2 devices. It would also be nice to know if the use of vlans is important to reproduce this failure.
Mike do you have the equipment to set something like this up?
I just reproduced this bug with the following bnx2 adapters: root@silver-vdsa ~]# ethtool -i eth2 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 ipms 1.6.0 bus-info: 0000:03:00.0 [root@silver-vdsa ~]# ethtool -i eth3 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 ipms 1.6.0 bus-info: 0000:07:00.0 [root@silver-vdsa ~]# Bonding mode: mode 4 ( CISCO mode ) bridge configuration : [root@silver-vdsa ~]# brctl show bridge name bridge id STP enabled interfaces rhevm 8000.001517a76a4c no eth0 sw1 8000.001517a76a4d no eth1 sw2 8000.001d09687150 no bond0.162 [root@silver-vdsa ~]# No ISCSI usage by the bnx devices. The error occurred when i tried to remove the bond interface (with the VLAN tag)
The bond interface is on nics eth2,eth3 on bridge sw2 VLAN tag is 162
I didn't see anything about a bridge configuration earlier, I will put the bond in a bridge and see if I can make this fail. Were you using initscripts to get it into the bridge or manually doing that after the bond0.162 interface was up?
I still cannot reproduce this without loading the cnic and bnx2i modules. I'm not at all surprised based on the backtrace. Here is the important info from my config. :::::::::::::: ifcfg-bond0 :::::::::::::: DEVICE=bond0 BOOTPROTO=none ONBOOT=no BONDING_OPTS="mode=4 miimon=100" :::::::::::::: ifcfg-bond0.100 :::::::::::::: DEVICE=bond0.100 BOOTPROTO=none ONBOOT=yes :::::::::::::: ifcfg-eth2 :::::::::::::: DEVICE=eth2 ONBOOT=yes HWADDR=00:10:18:36:0a:d4 MASTER=bond0 SLAVE=yes :::::::::::::: ifcfg-eth3 :::::::::::::: DEVICE=eth3 ONBOOT=yes HWADDR=00:10:18:36:0a:d6 MASTER=bond0 SLAVE=yes # ifup bond0 # ifup bond0.100 Added VLAN with VID == 100 to IF -:bond0:- # brctl addbr br0 # brctl addif br0 bond0.100 # brctl delif br0 bond0.100 # ifdown bond0 bonding: bond0: Warning: the permanent HWaddr of eth2 - 00:10:18:36:0A:D4 - is still in use by bond0. Set the HWaddr of eth2 to a diffe. bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond # ifup bond0 # ifup bond0.100 Added VLAN with VID == 100 to IF -:bond0:- # brctl addbr br1 # brctl addif br1 bond0.100 # ifdown bond0 bonding: bond0: Warning: the permanent HWaddr of eth2 - 00:10:18:36:0A:D4 - is still in use by bond0. Set the HWaddr of eth2 to a diffe. bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. # ifdown bond0.100 Removed VLAN -:bond0.100:- # ifup bond0 # ifup bond0.100 Added VLAN with VID == 100 to IF -:bond0:- # brctl addbr br2 # brctl addif br2 bond0.100 # rmmod bonding bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. # uname -a Linux xw4400 2.6.18-185.el5 #1 SMP Thu Jan 14 16:44:40 EST 2010 x86_64 x86_64 x86_64 GNU/Linux # ethtool -i eth2 driver: bnx2 version: 2.0.2 firmware-version: 4.4.14 bus-info: 0000:10:00.0 # ethtool -i eth3 driver: bnx2 version: 2.0.2 firmware-version: 4.4.14 bus-info: 0000:10:00.1 None of these produce the hang or deadlock described in this bug until I insmoded cnic and bnx2i. I setup everything as I did before and when doing and then did this: # rmmod bonding bonding: bond0: Warning: No 802.3ad response from the link partner for any adapters in the bond BUG: scheduling while atomic: rmmod/0x00000100/7310 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff80093131>] vprintk+0x2cb/0x317 [<ffffffff80064167>] wait_for_completion+0x79/0xa2 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff80283971>] klist_next+0xf/0x56 [<ffffffff8009faa6>] synchronize_rcu+0x30/0x36 [<ffffffff8009f5e2>] wakeme_after_rcu+0x0/0x9 [<ffffffff88674d74>] :cnic:cnic_stop_hw+0x38/0xa6 [<ffffffff88679a2e>] :cnic:cnic_ctl+0x35/0xac [<ffffffff8822588f>] :bnx2:bnx2_netif_stop+0x3a/0xea [<ffffffff88229be5>] :bnx2:bnx2_vlan_rx_register+0x20/0x61 [<ffffffff887b4a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff887b6728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff887b68c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff887bff88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: scheduling while atomic: rmmod/0x00000100/7310 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff8003de05>] lock_timer_base+0x1b/0x3c [<ffffffff8001cc14>] __mod_timer+0x100/0x10f [<ffffffff800648ab>] schedule_timeout+0x8a/0xad [<ffffffff80098a2a>] process_timeout+0x0/0x5 [<ffffffff800990f3>] msleep+0x21/0x2c [<ffffffff887a0332>] :bnx2i:bnx2i_start+0x1f/0x32 [<ffffffff88674a18>] :cnic:cnic_ulp_start+0x6b/0x87 [<ffffffff88679a48>] :cnic:cnic_ctl+0x4f/0xac [<ffffffff88221480>] :bnx2:bnx2_netif_start+0xab/0xbe [<ffffffff88221641>] :bnx2:bnx2_fw_sync+0x34/0xc8 [<ffffffff88229c15>] :bnx2:bnx2_vlan_rx_register+0x50/0x61 [<ffffffff887b4a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff887b6728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff887b68c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff887bff88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: scheduling while atomic: rmmod/0x00000100/7310 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff800a1746>] autoremove_wake_function+0x9/0x2e [<ffffffff8008c137>] __wake_up_common+0x3e/0x68 [<ffffffff80064167>] wait_for_completion+0x79/0xa2 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff80283971>] klist_next+0xf/0x56 [<ffffffff8009faa6>] synchronize_rcu+0x30/0x36 [<ffffffff8009f5e2>] wakeme_after_rcu+0x0/0x9 [<ffffffff88674d74>] :cnic:cnic_stop_hw+0x38/0xa6 [<ffffffff88679a2e>] :cnic:cnic_ctl+0x35/0xac [<ffffffff8822588f>] :bnx2:bnx2_netif_stop+0x3a/0xea [<ffffffff88229be5>] :bnx2:bnx2_vlan_rx_register+0x20/0x61 [<ffffffff887b4a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff887b6728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff887b68c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff887bff88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: scheduling while atomic: rmmod/0x00000100/7310 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff8003de05>] lock_timer_base+0x1b/0x3c [<ffffffff8001cc14>] __mod_timer+0x100/0x10f [<ffffffff800648ab>] schedule_timeout+0x8a/0xad [<ffffffff80098a2a>] process_timeout+0x0/0x5 [<ffffffff800990f3>] msleep+0x21/0x2c [<ffffffff887a0332>] :bnx2i:bnx2i_start+0x1f/0x32 [<ffffffff88674a18>] :cnic:cnic_ulp_start+0x6b/0x87 [<ffffffff88679a48>] :cnic:cnic_ctl+0x4f/0xac [<ffffffff88221480>] :bnx2:bnx2_netif_start+0xab/0xbe [<ffffffff88221641>] :bnx2:bnx2_fw_sync+0x34/0xc8 [<ffffffff88229c15>] :bnx2:bnx2_vlan_rx_register+0x50/0x61 [<ffffffff887b4a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff887b6728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff887b68c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff887bff88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. It's not a panic though -- just ugly noise on the console.
Now that I can reproduce this I was able to scale it down: - The bnx2i driver does not even need to be loaded (only cnic). - The vlan interface doesn't need to be in the bridge. - This works fine with active-backup bonding too (no fancy switch needed). The 'scheduling while atomic' messages are ugly, but not showstoppers. The fact that I can deadlock the system is bad. Use config-files like these: :::::::::::::: ifcfg-bond0 :::::::::::::: DEVICE=bond0 BOOTPROTO=none ONBOOT=no BONDING_OPTS="mode=1 miimon=100" :::::::::::::: ifcfg-bond0.100 :::::::::::::: DEVICE=bond0.100 BOOTPROTO=none ONBOOT=yes :::::::::::::: ifcfg-eth2 :::::::::::::: DEVICE=eth2 ONBOOT=yes HWADDR=00:10:18:36:0a:d4 MASTER=bond0 SLAVE=yes :::::::::::::: ifcfg-eth3 :::::::::::::: DEVICE=eth3 ONBOOT=yes HWADDR=00:10:18:36:0a:d6 MASTER=bond0 SLAVE=yes (obviously with different mac addresses) and type these commands. # ifup bond0 # ifup bond0.100 # rmmod bonding BUG: scheduling while atomic: rmmod/0x00000100/8590 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff80150c87>] __next_cpu+0x19/0x28 [<ffffffff8008c850>] find_busiest_group+0x20d/0x621 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff80064167>] wait_for_completion+0x79/0xa2 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff8009faa6>] synchronize_rcu+0x30/0x36 [<ffffffff8009f5e2>] wakeme_after_rcu+0x0/0x9 [<ffffffff88674d74>] :cnic:cnic_stop_hw+0x38/0xa6 [<ffffffff88679a2e>] :cnic:cnic_ctl+0x35/0xac [<ffffffff8822588f>] :bnx2:bnx2_netif_stop+0x3a/0xea [<ffffffff88229be5>] :bnx2:bnx2_vlan_rx_register+0x20/0x61 [<ffffffff887a0a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff887a2728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff887a28c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff887abf88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#0 stuck for 10s! [avahi-daemon:3183] CPU 0: Modules linked in: bonding libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic uio ipt_MASQUERADE iptable_nat ip_nat bridge autofd Pid: 3183, comm: avahi-daemon Not tainted 2.6.18-185.el5 #1 RIP: 0010:[<ffffffff8006319d>] [<ffffffff8006319d>] __read_lock_failed+0x5/0x14 RSP: 0018:ffff81002f99b918 EFLAGS: 00000297 RAX: 0000000000000056 RBX: ffff81002cea2280 RCX: ffff81002cae7980 RDX: ffffffff80350500 RSI: ffff81002d03d000 RDI: ffff81002d03d530 RBP: ffff81002f99b860 R08: ffff810038ddb2e0 R09: ffff81002cea2080 R10: ffff8100380c1ea8 R11: 000000408009f78d R12: ffffffff8027edd2 R13: d40a36feff181002 R14: 00000000000080fe R15: fb00000000000000 FS: 00002b74b9604ff0(0000) GS:ffffffff803c9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000003a7eed3280 CR3: 000000002f993000 CR4: 00000000000006e0 Call Trace: [<ffffffff80065b75>] _read_lock+0xb/0xc [<ffffffff887a4bb1>] :bonding:bond_xmit_activebackup+0x19/0x6c [<ffffffff88556acd>] :ipv6:ip6_output_finish+0x0/0xf8 [<ffffffff8022f5e2>] dev_hard_start_xmit+0x1b7/0x28a [<ffffffff8002fcbd>] dev_queue_xmit+0x1c5/0x271 [<ffffffff88556f28>] :ipv6:ip6_output2+0x2cb/0x33d [<ffffffff88557dd1>] :ipv6:ip6_output+0xbbe/0xbe2 [<ffffffff80056f3d>] nf_hook_slow+0x58/0xbc [<ffffffff88556730>] :ipv6:dst_output+0x0/0xe [<ffffffff8855827b>] :ipv6:ip6_push_pending_frames+0x486/0x55f [<ffffffff8856b26f>] :ipv6:udp_v6_push_pending_frames+0x123/0x145 [<ffffffff8856ca42>] :ipv6:udpv6_sendmsg+0x68c/0x8e0 [<ffffffff8012a6c8>] avc_has_perm+0x46/0x58 [<ffffffff800556ac>] sock_sendmsg+0xf8/0x14a [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff800a173d>] autoremove_wake_function+0x0/0x2e [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff8012a6c8>] avc_has_perm+0x46/0x58 [<ffffffff802266a2>] sys_sendmsg+0x217/0x28a [<ffffffff8000e2e9>] current_fs_time+0x3b/0x40 [<ffffffff8002e586>] __wake_up+0x38/0x4f [<ffffffff8002a2ac>] file_update_time+0x30/0xdb [<ffffffff80029f36>] pipe_writev+0x448/0x4b0 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: soft lockup - CPU#1 stuck for 10s! [swapper:0] CPU 1: Modules linked in: bonding libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi cnic uio ipt_MASQUERADE iptable_nat ip_nat bridge autofd Pid: 0, comm: swapper Not tainted 2.6.18-185.el5 #1 RIP: 0010:[<ffffffff8006319d>] [<ffffffff8006319d>] __read_lock_failed+0x5/0x14 RSP: 0018:ffff81000176fcf0 EFLAGS: 00000297 RAX: 0000000000000056 RBX: ffff81002c988dc0 RCX: ffff81002fb7c280 RDX: ffffffff80350500 RSI: ffff81002d03d000 RDI: ffff81002d03d530 RBP: ffff81000176fc70 R08: ffff81002d03d000 R09: 0000000000000038 R10: 0000000080000000 R11: ffffffff8002faf8 R12: ffffffff8005ec8e R13: ffff81002d03d500 R14: ffffffff80078f50 R15: ffff81000176fc70 FS: 0000000000000000(0000) GS:ffff8100017437c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b64f6d430a0 CR3: 000000002fb44000 CR4: 00000000000006e0 Call Trace: <IRQ> [<ffffffff80065b75>] _read_lock+0xb/0xc [<ffffffff887a4bb1>] :bonding:bond_xmit_activebackup+0x19/0x6c [<ffffffff8022f5e2>] dev_hard_start_xmit+0x1b7/0x28a [<ffffffff8002fcbd>] dev_queue_xmit+0x1c5/0x271 [<ffffffff88556f3c>] :ipv6:ip6_output2+0x2df/0x33d [<ffffffff88557dd1>] :ipv6:ip6_output+0xbbe/0xbe2 [<ffffffff80056f3d>] nf_hook_slow+0x58/0xbc [<ffffffff88567af0>] :ipv6:dst_output+0x0/0xe [<ffffffff8856a790>] :ipv6:ndisc_send_rs+0x3de/0x505 [<ffffffff8855f4b4>] :ipv6:addrconf_rs_timer+0x0/0xe2 [<ffffffff8855f55f>] :ipv6:addrconf_rs_timer+0xab/0xe2 [<ffffffff8009880f>] run_timer_softirq+0x193/0x241 [<ffffffff80012388>] __do_softirq+0x89/0x133 [<ffffffff8005f2fc>] call_softirq+0x1c/0x28 [<ffffffff8006dba8>] do_softirq+0x2c/0x85 [<ffffffff800575ff>] mwait_idle+0x0/0x4a [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff80057635>] mwait_idle+0x36/0x4a [<ffffffff800497ef>] cpu_idle+0x95/0xb8 [<ffffffff800786bc>] start_secondary+0x495/0x4a4
Problem exists on at least 2.6.32-rc8 as well: [root@xw4400 ~]# modprobe cnic [root@xw4400 ~]# rmmod bonding BUG: sleeping function called from invalid context at kernel/mutex.c:280 in_atomic(): 1, irqs_disabled(): 0, pid: 4063, name: rmmod 2 locks held by rmmod/4063: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff812aac4b>] rtnl_lock+0x12/0x14 #1: (&bond->lock){++.?..}, at: [<ffffffffa049397a>] bond_del_vlans_from_slave+0x2e/0x109 [bonding] Pid: 4063, comm: rmmod Not tainted 2.6.32-rc8 #204 Call Trace: [<ffffffff810690bf>] ? __debug_show_held_locks+0x22/0x24 [<ffffffff81036908>] __might_sleep+0xe9/0xee [<ffffffff81330044>] mutex_lock_nested+0x32/0x2b8 [<ffffffff810608bf>] ? sched_clock_cpu+0xbc/0xc7 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa0493a3d>] bond_del_vlans_from_slave+0xf1/0x109 [bonding] [<ffffffffa0494d04>] bond_release_all+0xc6/0x214 [bonding] [<ffffffff8104f457>] ? del_timer_sync+0x0/0x84 [<ffffffffa0494e80>] bond_free_all+0x2e/0x84 [bonding] [<ffffffffa049e864>] bonding_exit+0x30/0x37 [bonding] [<ffffffff81076105>] sys_delete_module+0x1b3/0x222 [<ffffffff81069cd5>] ? trace_hardirqs_on_caller+0x113/0x13e [<ffffffff81084a52>] ? audit_syscall_entry+0x1bb/0x1ee [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b BUG: scheduling while atomic: rmmod/4063/0x10000100 2 locks held by rmmod/4063: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff812aac4b>] rtnl_lock+0x12/0x14 #1: (&bond->lock){++.?..}, at: [<ffffffffa049397a>] bond_del_vlans_from_slave+0x2e/0x109 [bonding] Modules linked in: cnic uio ipt_REJECT bridge stp autofs4 i2c_dev i2c_core hidp rfcomm l2cap crc16 bluetooth rfkill sunrpc bonding(-) 8] Pid: 4063, comm: rmmod Not tainted 2.6.32-rc8 #204 Call Trace: [<ffffffff810690bf>] ? __debug_show_held_locks+0x22/0x24 [<ffffffff8103b4a1>] __schedule_bug+0x6d/0x72 [<ffffffff8132ece2>] schedule+0x86/0x91e [<ffffffff8100f247>] ? show_trace+0x10/0x12 [<ffffffff81040a2d>] __cond_resched+0x25/0x30 [<ffffffff8132f77a>] _cond_resched+0x24/0x2f [<ffffffff81330049>] mutex_lock_nested+0x37/0x2b8 [<ffffffff810608bf>] ? sched_clock_cpu+0xbc/0xc7 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa0493a3d>] bond_del_vlans_from_slave+0xf1/0x109 [bonding] [<ffffffffa0494d04>] bond_release_all+0xc6/0x214 [bonding] [<ffffffff8104f457>] ? del_timer_sync+0x0/0x84 [<ffffffffa0494e80>] bond_free_all+0x2e/0x84 [bonding] [<ffffffffa049e864>] bonding_exit+0x30/0x37 [bonding] [<ffffffff81076105>] sys_delete_module+0x1b3/0x222 [<ffffffff81069cd5>] ? trace_hardirqs_on_caller+0x113/0x13e [<ffffffff81084a52>] ? audit_syscall_entry+0x1bb/0x1ee [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b ====================================================== [ INFO: SOFTIRQ-READ-safe -> SOFTIRQ-READ-unsafe lock order detected ] 2.6.32-rc8 #204 ------------------------------------------------------ rmmod/4063 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire: (&bp->cnic_lock){+.+...}, at: [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] and this task is already holding: (&bond->lock){++.?..}, at: [<ffffffffa049397a>] bond_del_vlans_from_slave+0x2e/0x109 [bonding] which would create a new lock dependency: (&bond->lock){++.?..} -> (&bp->cnic_lock){+.+...} but this new dependency connects a SOFTIRQ-READ-irq-safe lock: (&bond->lock){++.?..} ... which became SOFTIRQ-READ-irq-safe at: [<ffffffff8106d378>] __lock_acquire+0x5fa/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331f66>] _read_lock+0x34/0x69 [<ffffffffa0497437>] bond_start_xmit+0xed/0x37c [bonding] [<ffffffff812a050c>] dev_hard_start_xmit+0x260/0x316 [<ffffffff812a37f7>] dev_queue_xmit+0x2e0/0x3e9 [<ffffffff812a93ca>] neigh_resolve_output+0x2b7/0x2ec [<ffffffffa03fb40d>] ip6_output_finish+0x6f/0xd6 [ipv6] [<ffffffffa03fb9ab>] ip6_output2+0x271/0x27c [ipv6] [<ffffffffa03fc8aa>] ip6_output+0xd20/0xd45 [ipv6] [<ffffffffa04160a2>] mld_sendpack+0x29d/0x495 [ipv6] [<ffffffffa0417361>] mld_ifc_timer_expire+0x1d6/0x20f [ipv6] [<ffffffff8104f18b>] run_timer_softirq+0x1d0/0x284 [<ffffffff81049961>] __do_softirq+0xdb/0x1ab [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [<ffffffff8100e1b3>] do_softirq+0x38/0x85 [<ffffffff81049884>] irq_exit+0x45/0x47 [<ffffffff81020ed4>] smp_apic_timer_interrupt+0x89/0x99 [<ffffffff8100c533>] apic_timer_interrupt+0x13/0x20 to a SOFTIRQ-READ-irq-unsafe lock: (&bp->cnic_lock){+.+...} ... which became SOFTIRQ-READ-irq-unsafe at: ... [<ffffffff8106d3e6>] __lock_acquire+0x668/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0496a30>] bond_vlan_rx_register+0x48/0x5f [bonding] [<ffffffffa04874e1>] register_vlan_dev+0x216/0x295 [8021q] [<ffffffffa0487df8>] vlan_ioctl_handler+0x36b/0x403 [8021q] [<ffffffff81292b53>] sock_ioctl+0x198/0x231 [<ffffffff810ebb08>] vfs_ioctl+0x2a/0x77 [<ffffffff810ec051>] do_vfs_ioctl+0x484/0x4d5 [<ffffffff810ec0f9>] sys_ioctl+0x57/0x7a [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 2 locks held by rmmod/4063: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff812aac4b>] rtnl_lock+0x12/0x14 #1: (&bond->lock){++.?..}, at: [<ffffffffa049397a>] bond_del_vlans_from_slave+0x2e/0x109 [bonding] the dependencies between SOFTIRQ-READ-irq-safe lock and the holding lock: -> (&bond->lock){++.?..} ops: 146 { HARDIRQ-ON-W at: [<ffffffff8106d3c4>] __lock_acquire+0x646/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331c84>] _write_lock_bh+0x36/0x6b [<ffffffffa04982ab>] bond_close+0x50/0x131 [bonding] [<ffffffff812a10ed>] dev_close+0x81/0x9c [<ffffffff812a0ace>] dev_change_flags+0xa8/0x168 [<ffffffff812ebaac>] devinet_ioctl+0x269/0x5da [<ffffffff812ecc22>] inet_ioctl+0x8a/0xa2 [<ffffffff81292bc3>] sock_ioctl+0x208/0x231 [<ffffffff810ebb08>] vfs_ioctl+0x2a/0x77 [<ffffffff810ec051>] do_vfs_ioctl+0x484/0x4d5 [<ffffffff810ec0f9>] sys_ioctl+0x57/0x7a [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b HARDIRQ-ON-R at: [<ffffffff8106d39b>] __lock_acquire+0x61d/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331e18>] _read_lock_bh+0x39/0x6c [<ffffffffa0496d7f>] bond_get_stats+0x4a/0x17d [bonding] [<ffffffff8129d56e>] dev_get_stats+0x19/0x7d [<ffffffff812aa556>] rtnl_fill_ifinfo+0x302/0x553 [<ffffffff812aaa61>] rtmsg_ifinfo+0x66/0xca [<ffffffff812aab05>] rtnetlink_event+0x40/0x44 [<ffffffff813343ad>] notifier_call_chain+0x33/0x5b [<ffffffff8105fcfd>] __raw_notifier_call_chain+0x9/0xb [<ffffffff8105fd0e>] raw_notifier_call_chain+0xf/0x11 [<ffffffff812a0797>] call_netdevice_notifiers+0x16/0x18 [<ffffffff812a198a>] register_netdevice+0x2a9/0x2f5 [<ffffffffa0494065>] bond_create+0xa8/0xf1 [bonding] [<ffffffffa04aa7c0>] 0xffffffffa04aa7c0 [<ffffffff81009060>] do_one_initcall+0x5a/0x14f [<ffffffff81078dfa>] sys_init_module+0xcd/0x22b [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b IN-SOFTIRQ-R at: [<ffffffff8106d378>] __lock_acquire+0x5fa/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331f66>] _read_lock+0x34/0x69 [<ffffffffa0497437>] bond_start_xmit+0xed/0x37c [bonding] [<ffffffff812a050c>] dev_hard_start_xmit+0x260/0x316 [<ffffffff812a37f7>] dev_queue_xmit+0x2e0/0x3e9 [<ffffffff812a93ca>] neigh_resolve_output+0x2b7/0x2ec [<ffffffffa03fb40d>] ip6_output_finish+0x6f/0xd6 [ipv6] [<ffffffffa03fb9ab>] ip6_output2+0x271/0x27c [ipv6] [<ffffffffa03fc8aa>] ip6_output+0xd20/0xd45 [ipv6] [<ffffffffa04160a2>] mld_sendpack+0x29d/0x495 [ipv6] [<ffffffffa0417361>] mld_ifc_timer_expire+0x1d6/0x20f [ipv6] [<ffffffff8104f18b>] run_timer_softirq+0x1d0/0x284 [<ffffffff81049961>] __do_softirq+0xdb/0x1ab [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [<ffffffff8100e1b3>] do_softirq+0x38/0x85 [<ffffffff81049884>] irq_exit+0x45/0x47 [<ffffffff81020ed4>] smp_apic_timer_interrupt+0x89/0x99 [<ffffffff8100c533>] apic_timer_interrupt+0x13/0x20 SOFTIRQ-ON-R at: [<ffffffff8106d3e6>] __lock_acquire+0x668/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331f66>] _read_lock+0x34/0x69 [<ffffffffa0495570>] bond_mii_monitor+0x27/0x4d8 [bonding] [<ffffffff81058dbe>] worker_thread+0x1af/0x2ae [<ffffffff8105c5a4>] kthread+0x7d/0x85 [<ffffffff8100ca5a>] child_rip+0xa/0x20 INITIAL USE at: [<ffffffff8106d431>] __lock_acquire+0x6b3/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff81331e18>] _read_lock_bh+0x39/0x6c [<ffffffffa0496d7f>] bond_get_stats+0x4a/0x17d [bonding] [<ffffffff8129d56e>] dev_get_stats+0x19/0x7d [<ffffffff812aa556>] rtnl_fill_ifinfo+0x302/0x553 [<ffffffff812aaa61>] rtmsg_ifinfo+0x66/0xca [<ffffffff812aab05>] rtnetlink_event+0x40/0x44 [<ffffffff813343ad>] notifier_call_chain+0x33/0x5b [<ffffffff8105fcfd>] __raw_notifier_call_chain+0x9/0xb [<ffffffff8105fd0e>] raw_notifier_call_chain+0xf/0x11 [<ffffffff812a0797>] call_netdevice_notifiers+0x16/0x18 [<ffffffff812a198a>] register_netdevice+0x2a9/0x2f5 [<ffffffffa0494065>] bond_create+0xa8/0xf1 [bonding] [<ffffffffa04aa7c0>] 0xffffffffa04aa7c0 [<ffffffff81009060>] do_one_initcall+0x5a/0x14f [<ffffffff81078dfa>] sys_init_module+0xcd/0x22b [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b } ... key at: [<ffffffffa04a3a98>] __key.43926+0x0/0xffffffffffffadd3 [bonding] ... acquired at: [<ffffffff8106c02e>] check_irq_usage+0xb3/0xc5 [<ffffffff8106c7b6>] validate_chain+0x776/0xd3e [<ffffffff8106d52e>] __lock_acquire+0x7b0/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0493a3d>] bond_del_vlans_from_slave+0xf1/0x109 [bonding] [<ffffffffa0494d04>] bond_release_all+0xc6/0x214 [bonding] [<ffffffffa0494e80>] bond_free_all+0x2e/0x84 [bonding] [<ffffffffa049e864>] bonding_exit+0x30/0x37 [bonding] [<ffffffff81076105>] sys_delete_module+0x1b3/0x222 [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b the dependencies between the lock to be acquired and SOFTIRQ-READ-irq-unsafe lock: -> (&bp->cnic_lock){+.+...} ops: 5 { HARDIRQ-ON-W at: [<ffffffff8106d3c4>] __lock_acquire+0x646/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0496a30>] bond_vlan_rx_register+0x48/0x5f [bonding] [<ffffffffa04874e1>] register_vlan_dev+0x216/0x295 [8021q] [<ffffffffa0487df8>] vlan_ioctl_handler+0x36b/0x403 [8021q] [<ffffffff81292b53>] sock_ioctl+0x198/0x231 [<ffffffff810ebb08>] vfs_ioctl+0x2a/0x77 [<ffffffff810ec051>] do_vfs_ioctl+0x484/0x4d5 [<ffffffff810ec0f9>] sys_ioctl+0x57/0x7a [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b SOFTIRQ-ON-W at: [<ffffffff8106d3e6>] __lock_acquire+0x668/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0496a30>] bond_vlan_rx_register+0x48/0x5f [bonding] [<ffffffffa04874e1>] register_vlan_dev+0x216/0x295 [8021q] [<ffffffffa0487df8>] vlan_ioctl_handler+0x36b/0x403 [8021q] [<ffffffff81292b53>] sock_ioctl+0x198/0x231 [<ffffffff810ebb08>] vfs_ioctl+0x2a/0x77 [<ffffffff810ec051>] do_vfs_ioctl+0x484/0x4d5 [<ffffffff810ec0f9>] sys_ioctl+0x57/0x7a [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b INITIAL USE at: [<ffffffff8106d431>] __lock_acquire+0x6b3/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0496a30>] bond_vlan_rx_register+0x48/0x5f [bonding] [<ffffffffa04874e1>] register_vlan_dev+0x216/0x295 [8021q] [<ffffffffa0487df8>] vlan_ioctl_handler+0x36b/0x403 [8021q] [<ffffffff81292b53>] sock_ioctl+0x198/0x231 [<ffffffff810ebb08>] vfs_ioctl+0x2a/0x77 [<ffffffff810ec051>] do_vfs_ioctl+0x484/0x4d5 [<ffffffff810ec0f9>] sys_ioctl+0x57/0x7a [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b } ... key at: [<ffffffffa02c4790>] __key.50251+0x0/0xffffffffffffd7b5 [bnx2] ... acquired at: [<ffffffff8106c02e>] check_irq_usage+0xb3/0xc5 [<ffffffff8106c7b6>] validate_chain+0x776/0xd3e [<ffffffff8106d52e>] __lock_acquire+0x7b0/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa0493a3d>] bond_del_vlans_from_slave+0xf1/0x109 [bonding] [<ffffffffa0494d04>] bond_release_all+0xc6/0x214 [bonding] [<ffffffffa0494e80>] bond_free_all+0x2e/0x84 [bonding] [<ffffffffa049e864>] bonding_exit+0x30/0x37 [bonding] [<ffffffff81076105>] sys_delete_module+0x1b3/0x222 [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b stack backtrace: Pid: 4063, comm: rmmod Not tainted 2.6.32-rc8 #204 Call Trace: [<ffffffff8106bf67>] check_usage+0x453/0x467 [<ffffffff8100f345>] ? print_context_stack+0x91/0xa9 [<ffffffff8106c02e>] check_irq_usage+0xb3/0xc5 [<ffffffff8106c7b6>] validate_chain+0x776/0xd3e [<ffffffff81069d0d>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81331841>] ? _spin_unlock_irq+0x2b/0x30 [<ffffffff8106079e>] ? sched_clock_local+0x11/0x76 [<ffffffff810608bf>] ? sched_clock_cpu+0xbc/0xc7 [<ffffffff8106d52e>] __lock_acquire+0x7b0/0x816 [<ffffffff8106d65b>] lock_acquire+0xc7/0xe4 [<ffffffffa02bbbf0>] ? bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa02bbbf0>] ? bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffff8133006f>] mutex_lock_nested+0x5d/0x2b8 [<ffffffffa02bbbf0>] ? bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffff810608bf>] ? sched_clock_cpu+0xbc/0xc7 [<ffffffffa02bbbf0>] bnx2_netif_stop+0x25/0xfc [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa02bf752>] bnx2_vlan_rx_register+0x28/0x6a [bnx2] [<ffffffffa049397a>] ? bond_del_vlans_from_slave+0x2e/0x109 [bonding] [<ffffffffa0493a3d>] bond_del_vlans_from_slave+0xf1/0x109 [bonding] [<ffffffffa0494d04>] bond_release_all+0xc6/0x214 [bonding] [<ffffffff8104f457>] ? del_timer_sync+0x0/0x84 [<ffffffffa0494e80>] bond_free_all+0x2e/0x84 [bonding] [<ffffffffa049e864>] bonding_exit+0x30/0x37 [bonding] [<ffffffff81076105>] sys_delete_module+0x1b3/0x222 [<ffffffff81069cd5>] ? trace_hardirqs_on_caller+0x113/0x13e [<ffffffff81084a52>] ? audit_syscall_entry+0x1bb/0x1ee [<ffffffff8100b9ab>] system_call_fastpath+0x16/0x1b
bond_del_vlans_from_slave() holds bond->lock and calls ndo_vlan_rx_register(). We then call bnx2_netif_stop() -> bnx2_cnic_stop() which has many sleeping functions. Even without cnic loaded, bnx2_netif_stop() -> bnx2_disable_int_sync() -> synchrinize_irq() -> wait_event() potentially is a problem. I think most of the time wait_event() never sleeps because the IRQ is not pending, so we don't see the problem. Assuming we cannot remove the spinlock in the bonding driver, I need to think about how to fix this. Thanks.
I'm still thinking about this as well, Michael. Just wanted to be sure you were aware of the problem.
I encountered errors while trying to add bond interface to bridge( without VLAN tagging ) Whenever i'm trying to add the bond interface the host become non responsive. This is what i found on the net console log: 2010-01-27 15:03:10,822 bonding: bond0: enslaving eth2 as a backup interface with a down link. 2010-01-27 15:03:10,822 bnx2i: iSCSI not supported, dev=eth2 2010-01-27 15:03:10,869 ADDRCONF(NETDEV_UP): bond0: link is not ready 2010-01-27 15:03:10,869 bonding: bond0: link status definitely up for interface eth3. 2010-01-27 15:03:10,947 device eth3 entered promiscuous mode 2010-01-27 15:03:10,947 device eth2 entered promiscuous mode 2010-01-27 15:03:10,947 device bond0 entered promiscuous mode 2010-01-27 15:03:11,980 bnx2: eth2 NIC Copper Link is Up, 2010-01-27 15:03:11,980 1000 Mbps 2010-01-27 15:03:11,980 full duplex 2010-01-27 15:03:11,980 2010-01-27 15:03:12,058 bonding: bond0: link status definitely up for interface eth2. On the host console i saw this error message: Unable to handle kernel null pointer dereference. Host and network adapters information: Linux silver-vdsa.qa.lab.tlv.redhat.com 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37 EST 2009 x86_64 x86_64 x86_64 GNU/Linux root@silver-vdsa ~]# ethtool -i eth2 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 ipms 1.6.0 bus-info: 0000:03:00.0 [root@silver-vdsa ~]# ethtool -i eth3 driver: bnx2 version: 2.0.2 firmware-version: 3.5.12 ipms 1.6.0 bus-info: 0000:07:00.0 [root@silver-vdsa ~]#
(In reply to comment #14) > > > On the host console i saw this error message: > Unable to handle kernel null pointer dereference. > Without the rest of the message, unfortunately this isn't helpful.
Created attachment 387114 [details] screen shots
Created attachment 394626 [details] Patch to fix the issue. This patch should fix the issue. Please review and test to confirm. We'll do more testing before sending upstream. Thanks.
This patch seems fine to ms as long as you do not feel like the cnic part needs to actually be reset when adding and removing vlans. I don't have a way to test cnic functionality with this patch, but I will verify that it will resolve the problem (I suspect it will).
Michael, I tested the patch in comment #17 on 2.6.33-rc8 and it seems to work. I still get some lockdep warnings (below), but these are not new. How is it looking based on your testing? ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.33-rc8 #2 ------------------------------------------------------- rmmod/4410 is trying to acquire lock: ((bond_dev->name)){+.+...}, at: [<ffffffff810514ce>] cleanup_workqueue_thread+0x1e/0xb8 but task is already holding lock: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103c7c4>] cpu_maps_update_begin+0x12/0x14 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (cpu_add_remove_lock){+.+.+.}: [<ffffffff81065a09>] validate_chain+0xa40/0xd38 [<ffffffff810664ae>] __lock_acquire+0x7ad/0x813 [<ffffffff810665db>] lock_acquire+0xc7/0xe4 [<ffffffff8133443b>] mutex_lock_nested+0x5d/0x2d0 [<ffffffff8103c7c4>] cpu_maps_update_begin+0x12/0x14 [<ffffffff810515c8>] destroy_workqueue+0x2b/0x9f [<ffffffffa04bc5de>] bond_uninit+0x30a/0x338 [bonding] [<ffffffff812a3e11>] rollback_registered_many+0xeb/0x16b [<ffffffff812a3ea5>] unregister_netdevice_many+0x14/0x3f [<ffffffff812adca0>] __rtnl_kill_links+0x5f/0x6a [<ffffffff812adcc9>] __rtnl_link_unregister+0x1e/0x46 [<ffffffff812adf92>] rtnl_link_unregister+0x19/0x22 [<ffffffffa04c3e36>] bonding_exit+0x32/0x40 [bonding] [<ffffffff8106fc9d>] sys_delete_module+0x1c5/0x236 [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b -> #2 (rtnl_mutex){+.+.+.}: [<ffffffff81065a09>] validate_chain+0xa40/0xd38 [<ffffffff810664ae>] __lock_acquire+0x7ad/0x813 [<ffffffff810665db>] lock_acquire+0xc7/0xe4 [<ffffffff8133443b>] mutex_lock_nested+0x5d/0x2d0 [<ffffffff812add5c>] rtnl_lock+0x12/0x14 [<ffffffffa04ba463>] bond_mii_monitor+0x27e/0x4d9 [bonding] [<ffffffff810520cd>] worker_thread+0x1af/0x2ae [<ffffffff81054d41>] kthread+0x7d/0x85 [<ffffffff81003794>] kernel_thread_helper+0x4/0x10 -> #1 ((&(&bond->mii_work)->work)){+.+...}: [<ffffffff81065a09>] validate_chain+0xa40/0xd38 [<ffffffff810664ae>] __lock_acquire+0x7ad/0x813 [<ffffffff810665db>] lock_acquire+0xc7/0xe4 [<ffffffff810520c7>] worker_thread+0x1a9/0x2ae [<ffffffff81054d41>] kthread+0x7d/0x85 [<ffffffff81003794>] kernel_thread_helper+0x4/0x10 -> #0 ((bond_dev->name)){+.+...}: [<ffffffff810656f5>] validate_chain+0x72c/0xd38 [<ffffffff810664ae>] __lock_acquire+0x7ad/0x813 [<ffffffff810665db>] lock_acquire+0xc7/0xe4 [<ffffffff810514f5>] cleanup_workqueue_thread+0x45/0xb8 [<ffffffff81051600>] destroy_workqueue+0x63/0x9f [<ffffffffa04bc5de>] bond_uninit+0x30a/0x338 [bonding] [<ffffffff812a3e11>] rollback_registered_many+0xeb/0x16b [<ffffffff812a3ea5>] unregister_netdevice_many+0x14/0x3f [<ffffffff812adca0>] __rtnl_kill_links+0x5f/0x6a [<ffffffff812adcc9>] __rtnl_link_unregister+0x1e/0x46 [<ffffffff812adf92>] rtnl_link_unregister+0x19/0x22 [<ffffffffa04c3e36>] bonding_exit+0x32/0x40 [bonding] [<ffffffff8106fc9d>] sys_delete_module+0x1c5/0x236 [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b other info that might help us debug this: 2 locks held by rmmod/4410: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff812add5c>] rtnl_lock+0x12/0x14 #1: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8103c7c4>] cpu_maps_update_begin+0x12/0x14 stack backtrace: Pid: 4410, comm: rmmod Not tainted 2.6.33-rc8 #2 Call Trace: [<ffffffff810648c1>] print_circular_bug+0xb3/0xc1 [<ffffffff810656f5>] validate_chain+0x72c/0xd38 [<ffffffff810664ae>] __lock_acquire+0x7ad/0x813 [<ffffffff810665db>] lock_acquire+0xc7/0xe4 [<ffffffff810514ce>] ? cleanup_workqueue_thread+0x1e/0xb8 [<ffffffff810514f5>] cleanup_workqueue_thread+0x45/0xb8 [<ffffffff810514ce>] ? cleanup_workqueue_thread+0x1e/0xb8 [<ffffffff81051600>] destroy_workqueue+0x63/0x9f [<ffffffffa04bc5de>] bond_uninit+0x30a/0x338 [bonding] [<ffffffff812a3e11>] rollback_registered_many+0xeb/0x16b [<ffffffff812a3ea5>] unregister_netdevice_many+0x14/0x3f [<ffffffff812adca0>] __rtnl_kill_links+0x5f/0x6a [<ffffffff812adcc9>] __rtnl_link_unregister+0x1e/0x46 [<ffffffff812adf92>] rtnl_link_unregister+0x19/0x22 [<ffffffffa04c3e36>] bonding_exit+0x32/0x40 [bonding] [<ffffffff8106fc9d>] sys_delete_module+0x1c5/0x236 [<ffffffff813366e9>] ? retint_swapgs+0xe/0x13 [<ffffffff8107db7a>] ? audit_syscall_entry+0x1d0/0x203 [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch is still in progress, targeting RHEL 5.6 for submission.
New test kernels available here: http://people.redhat.com/agospoda/#rhel5 Any feedback you can provide is greatly apprecaited.
I installed the test kernel on my hosts and I tried to reproduce this problem: In order to reproduce this problem I changed the interfaces removal order in the scripts to the original order when I found this issue : Nics , Bond, VLAN , Bridge. I used to encounter this issue whenever I tried to remove bond interface ( with or without VLAN tagging ). I tried to do it several times and I didn't manage to reproduce it again.
Michael, It looks like your patch works well. What is the target for upstream inclusion?
Andy, I can't reproduce this bug on kernel 2.6.18-183 and 2.6.18-185. Here is our test steps and configuration files. In my test, the firmware version of bnx2 is "4.0.3 ipms 1.6.0" which is different with #comment 9. Would you please help me to clarify verify steps? # uname -a Linux amd-8356-32-3 2.6.18-185.el5 #1 SMP Thu Jan 14 16:44:40 EST 2010 x86_64 x86_64 x86_64 GNU/Linux # ethtool -i eth4 driver: bnx2 version: 2.0.2 firmware-version: 4.0.3 ipms 1.6.0 bus-info: 0000:0d:00.0 # ethtool -i eth5 driver: bnx2 version: 2.0.2 firmware-version: 4.0.3 ipms 1.6.0 bus-info: 0000:18:00.0 ##The configuration file## ==/etc/sysconfig/network-scripts/ifcfg-bond0== DEVICE=bond0 BOOTPROTO=none ONBOOT=no BONDING_OPTS="mode=1 miimon=100" ============================= ==/etc/sysconfig/network-scripts/ifcfg-bond0.100== DEVICE=bond0.100 BOOTPROTO=none ONBOOT=yes VLAN=yes ============================= ==/etc/sysconfig/network-scripts/ifcfg-eth4== # Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet DEVICE=eth4 #HWADDR=00:14:5E:F4:95:F2 ONBOOT=no MASTER=bond0 SLAVE=yes ============================= ==/etc/sysconfig/network-scripts/ifcfg-eth5== # Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet DEVICE=eth5 #HWADDR=00:14:5E:F4:95:F4 ONBOOT=no MASTER=bond0 SLAVE=yes ============================= ==/etc/modprobe.conf== alias eth0 e1000e alias eth1 e1000e alias eth2 e1000e alias eth3 e1000e alias eth4 bnx2 alias eth5 bnx2 alias scsi_hostadapter aacraid alias scsi_hostadapter1 lpfc alias bond0 bonding ============================= ####Steps######### # ifup bond0 # ifup bond0.100 Added VLAN with VID == 100 to IF -:bond0:- # brctl addbr br0 # brctl addif br0 bond0.100 # brctl show bridge name bridge id STP enabled interfaces br0 8000.00145ef495f2 no bond0.100 # cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008) Bonding Mode: fault-tolerance (active-backup) Primary Slave: None Currently Active Slave: eth4 MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: eth4 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:14:5e:f4:95:f2 Slave Interface: eth5 MII Status: up Link Failure Count: 0 Permanent HW addr: 00:14:5e:f4:95:f4 # rmmod bonding # cat /proc/net/bonding/bond0 cat: /proc/net/bonding/bond0: No such file or directory #####dmesg######## bonding: bond0: setting mode to active-backup (1). bonding: bond0: Setting MII monitoring interval to 100. ADDRCONF(NETDEV_UP): bond0: link is not ready bonding: bond0: Adding slave eth4. bnx2: eth4: using MSI bonding: bond0: enslaving eth4 as a backup interface with a down link. bonding: bond0: Adding slave eth5. bnx2: eth5: using MSI bonding: bond0: enslaving eth5 as a backup interface with a down link. bnx2: eth4 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth4. bonding: bond0: making interface eth4 the new active one. bonding: bond0: first active interface up! ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready bnx2: eth5 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth5. 802.1Q VLAN Support v1.8 Ben Greear <greearb> All bugs added by David S. Miller <davem> bond0: no IPv6 routers present bond0.100: no IPv6 routers present Bridge firewalling registered bond0.100: dev_set_promiscuity(master, 1) device eth4 entered promiscuous mode device bond0 entered promiscuous mode device bond0.100 entered promiscuous mode device eth4 left promiscuous mode bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. bonding: bond0: released all slaves br0: port 1(bond0.100) entering disabled state
Comment#9 and comment#10 describe how to reproduce this. All you need to do on your setup is: # modprobe cnic # ifup bond0 # ifup bond0.100 # rmmod bonding I suspect your previous tests were not run with cnic loaded. It will not be loaded by default so it must be manually inserted.
Hi Andy, thanks for your help! I have already reproduce this bug following the four steps.
I verified this bug on kernel 2.6.18-185.el5 and kernel 2.6.18-196.el5. There are no bug messages output on kernel 2.6.18-196.el5. ==On kernel 2.6.18-185.el5== # modprobe cnic # ifup bond0 # ifup bond0.100 # rmmod bonding # uname -a Linux amd-8356-32-3 2.6.18-185.el5 #1 SMP Thu Jan 14 16:44:40 EST 2010 x86_64 x86_64 x86_64 GNU/Linux And then got the messages in `dmesg`, #####dmesg######## cnic: Added CNIC device: eth4 cnic: Added CNIC device: eth5 bonding: bond0: setting mode to active-backup (1). bonding: bond0: Setting MII monitoring interval to 100. ADDRCONF(NETDEV_UP): bond0: link is not ready bonding: bond0: Adding slave eth4. bnx2: eth4: using MSI bonding: bond0: enslaving eth4 as a backup interface with a down link. bonding: bond0: Adding slave eth5. bnx2: eth5: using MSI bonding: bond0: enslaving eth5 as a backup interface with a down link. bnx2: eth4 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth4. bonding: bond0: making interface eth4 the new active one. bonding: bond0: first active interface up! ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready bnx2: eth5 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth5. bond0: no IPv6 routers present bond0.100: no IPv6 routers present BUG: scheduling while atomic: rmmod/0x00000100/6486 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff80150c87>] __next_cpu+0x19/0x28 [<ffffffff8008c850>] find_busiest_group+0x20d/0x621 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff80064167>] wait_for_completion+0x79/0xa2 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff8009faa6>] synchronize_rcu+0x30/0x36 [<ffffffff8009f5e2>] wakeme_after_rcu+0x0/0x9 [<ffffffff88587d74>] :cnic:cnic_stop_hw+0x38/0xa6 [<ffffffff8858ca2e>] :cnic:cnic_ctl+0x35/0xac [<ffffffff8823b88f>] :bnx2:bnx2_netif_stop+0x3a/0xea [<ffffffff8823fbe5>] :bnx2:bnx2_vlan_rx_register+0x20/0x61 [<ffffffff88483a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff88485728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff884858c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff8848ef88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 BUG: scheduling while atomic: rmmod/0x00000100/6486 Call Trace: [<ffffffff8006343d>] __sched_text_start+0x7d/0xbd6 [<ffffffff80064167>] wait_for_completion+0x79/0xa2 [<ffffffff8008dd0d>] default_wake_function+0x0/0xe [<ffffffff8009faa6>] synchronize_rcu+0x30/0x36 [<ffffffff8009f5e2>] wakeme_after_rcu+0x0/0x9 [<ffffffff88587d74>] :cnic:cnic_stop_hw+0x38/0xa6 [<ffffffff8858ca2e>] :cnic:cnic_ctl+0x35/0xac [<ffffffff8823b88f>] :bnx2:bnx2_netif_stop+0x3a/0xea [<ffffffff88426f78>] :ipv6:fib6_age+0x0/0x65 [<ffffffff8823fbe5>] :bnx2:bnx2_vlan_rx_register+0x20/0x61 [<ffffffff88483a37>] :bonding:bond_del_vlans_from_slave+0xa6/0xb9 [<ffffffff88485728>] :bonding:bond_release_all+0xb3/0x21c [<ffffffff884858c0>] :bonding:bond_free_all+0x2f/0xb5 [<ffffffff8848ef88>] :bonding:bonding_exit+0x30/0x36 [<ffffffff800a7394>] sys_delete_module+0x196/0x1c5 [<ffffffff8005e28d>] tracesys+0xd5/0xe0 bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. bonding: bond0: released all slaves ==On kernel 2.6.18-196.el5== # modprobe cnic # ifup bond0 # ifup bond0.100 # rmmod bonding # uname -a Linux amd-8356-32-3 2.6.18-196.el5 #1 SMP Tue Apr 13 12:36:38 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux And then got the messages in `dmesg`, ==demesg==== cnic: Added CNIC device: eth4 cnic: Added CNIC device: eth5 bonding: bond0: setting mode to active-backup (1). bonding: bond0: Setting MII monitoring interval to 100. ADDRCONF(NETDEV_UP): bond0: link is not ready bonding: bond0: Adding slave eth4. bnx2: eth4: using MSI bonding: bond0: enslaving eth4 as a backup interface with a down link. bonding: bond0: Adding slave eth5. bnx2: eth5: using MSI bonding: bond0: enslaving eth5 as a backup interface with a down link. bnx2: eth4 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth4. bonding: bond0: making interface eth4 the new active one. bonding: bond0: first active interface up! ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready bnx2: eth5 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON bonding: bond0: link status definitely up for interface eth5. bond0: no IPv6 routers present bond0.100: no IPv6 routers present bonding: bond0: Warning: clearing HW address of bond0 while it still has VLANs. bonding: bond0: When re-adding slaves, make sure the bond's HW address matches its VLANs'. bonding: bond0: released all slaves
in kernel-2.6.18-197.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details.
(In reply to comment #25) > Michael, It looks like your patch works well. What is the target for upstream > inclusion? Sorry for taking so long. It has been merged upstream yesterday.
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The cnic parts resets could cause a deadlock when the bnx2 device was enslaved in a bonding device and that device had an associated VLAN.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html