Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 577407

Summary: Oops in bonding:bond_xmit_roundrobin when all bonded slaves are taken down
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: kernelAssignee: Andy Gospodarek <agospoda>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.5CC: jpirko, peterm
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-18 14:42:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2010-03-26 21:12:27 UTC
Description of problem:
I sequentially took down all 8 slave nics in my bonded interface, and when the final one was taken down, the machine paniced.

[root@taft-01 ~]# netstat -i
Kernel Interface table
Iface  MTU  Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
bond0  1500   0     0      0      0      0     0      0      0      0 BMmU
eth0   1500   0   153      0      0      0   166      0      0      0 BMRU
eth2   1500   0     0      0      0      0     0      0      0      0 BMsU
eth3   1500   0     0      0      0      0     0      0      0      0 BMsU
eth4   1500   0     0      0      0      0     0      0      0      0 BMsU
eth5   1500   0     0      0      0      0     0      0      0      0 BMsU
eth6   1500   0     0      0      0      0     0      0      0      0 BMsU
eth7   1500   0     0      0      0      0     0      0      0      0 BMsU
eth8   1500   0     0      0      0      0     0      0      0      0 BMsU
eth9   1500   0     0      0      0      0     0      0      0      0 BMsU
lo     16436  0    70      0      0      0    70      0      0      0 LRU

[root@taft-01 ~]# cat /etc/modprobe.conf 
alias eth0 e1000
alias eth1 e1000
alias eth2 e1000
alias eth3 e1000
alias eth4 e1000
alias eth5 e1000
alias eth6 e1000
alias eth7 e1000
alias eth8 e1000
alias eth9 e1000
alias scsi_hostadapter megaraid_mbox
alias bond0 bonding
options bond0 miimon=100 mode=0


e1000: eth2 NIC Link is Down
bonding: bond0: link status definitely down for interface eth2, disabling it
e1000: eth3 NIC Link is Down
bonding: bond0: link status definitely down for interface eth3, disabling it
e1000: eth4 NIC Link is Down
bonding: bond0: link status definitely down for interface eth4, disabling it
e1000: eth5 NIC Link is Down
bonding: bond0: link status definitely down for interface eth5, disabling it
e1000: eth6 NIC Link is Down
bonding: bond0: link status definitely down for interface eth6, disabling it
e1000: eth7 NIC Link is Down
bonding: bond0: link status definitely down for interface eth7, disabling it
e1000: eth8 NIC Link is Down
bonding: bond0: link status definitely down for interface eth8, disabling it


Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff88510075>] :bonding:bond_xmit_roundrobin+0x8d/0xdf             
PGD 0                                                                     
Oops: 0000 [1] SMP                                                        
last sysfs file: /devices/pci0000:00/0000:00:06.0/0000:08:00.2/0000:0b:03.0/0000:0c:06.1/irq
CPU 1                                                                                       
Modules linked in: lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) lockd sunrpc bonding ipv6 xfrm_nalgd
Pid: 0, comm: swapper Tainted: G      2.6.18-194.el5.gtest.86 #1                                                                          
RIP: 0010:[<ffffffff88510075>]  [<ffffffff88510075>] :bonding:bond_xmit_roundrobin+0x8d/0xdf                                              
RSP: 0018:ffff8101fff37e10  EFLAGS: 00010293                                                                                              
RAX: 0000000000000056 RBX: ffff8102188f1500 RCX: 0000000000000000                                                                         
RDX: ffffffff80352780 RSI: 0000000000000000 RDI: ffff8102188f1530                                                                         
RBP: ffff8102152f3780 R08: ffff81020bebd860 R09: ffffffff8024f1b0                                                                         
R10: 0000000080000000 R11: ffffffff80000000 R12: ffff810217c78810
R13: ffff8102105bd380 R14: ffff8102188f1000 R15: ffff810215296428
FS:  0000000000000000(0000) GS:ffff8101fff117c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000210ffe000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffff8101fff30000, task ffff8101fff15100)
Stack:  ffff8102188f1000 ffff8102152f3780 ffffffff804fac50 ffffffff8022ffa4
 ffff8102188f1000 ffff8102188f1200 ffff8102152f3780 ffff8102105bd380
 ffff8102188ffc38 ffffffff8002fc48 ffff81020bebd860 ffff8102152f3780
Call Trace:
 <IRQ>  [<ffffffff8022ffa4>] dev_hard_start_xmit+0x1b7/0x28a
 [<ffffffff8002fc48>] dev_queue_xmit+0x1c5/0x271
 [<ffffffff800322bb>] ip_output+0x29a/0x2dd
 [<ffffffff80265d2e>] igmpv3_sendpack+0x9a/0x9e
 [<ffffffff802662db>] igmp_ifc_timer_expire+0x1c6/0x1f0
 [<ffffffff80266115>] igmp_ifc_timer_expire+0x0/0x1f0
 [<ffffffff80098c76>] run_timer_softirq+0x193/0x241
 [<ffffffff80012409>] __do_softirq+0x89/0x133
 [<ffffffff8005f2fc>] call_softirq+0x1c/0x28
 [<ffffffff8006dba8>] do_softirq+0x2c/0x85
 [<ffffffff800575d0>] mwait_idle+0x0/0x4a
 [<ffffffff8005ec8e>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80057606>] mwait_idle+0x36/0x4a
 [<ffffffff800497be>] cpu_idle+0x95/0xb8
 [<ffffffff80078997>] start_secondary+0x498/0x4a7


Code: 48 8b 11 f6 82 a0 00 00 00 01 74 17 8b 42 40 a8 02 74 10 8b
RIP  [<ffffffff88510075>] :bonding:bond_xmit_roundrobin+0x8d/0xdf
 RSP <ffff8101fff37e10>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception


Code: 48 8b 11 f6 82 a0 00 00 00 01 74 17 8b 42 40 a8 02 74 10 8b
RIP  [<ffffffff88510075>] :bonding:bond_xmit_roundrobin+0x8d/0xdf
 RSP <ffff8101fff37e10>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception



Version-Release number of selected component (if applicable):
[root@taft-01 ~]# uname -ar
Linux taft-01 2.6.18-194.el5.gtest.86 #1 SMP Wed Mar 24 14:50:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


How reproducible:
Everytime

Comment 1 Andy Gospodarek 2010-03-29 14:42:39 UTC
So this is easily reproducible?

The patch that ultimately went upstream for this fix had an additional check that I think will prevent this panic.  I will backport that patch and post here when it is available in my test kernels.

Comment 2 Corey Marthaler 2010-03-29 19:20:05 UTC
From comment #0:

How reproducible:
Everytime

Comment 3 Andy Gospodarek 2010-03-29 19:23:52 UTC
OK, I'll see if I can reproduce it here.

Comment 4 Andy Gospodarek 2010-03-29 20:25:08 UTC
I was able to reproduce it as well and confirmed the upstream patch will resolve it.

Corey, if I build more test kernels can you verify the final fix for me?

Comment 5 Corey Marthaler 2010-03-29 20:29:30 UTC
Andy,

Sure thing.

Comment 6 Andy Gospodarek 2010-03-30 13:36:59 UTC
Corey, new kernels posted here:

http://people.redhat.com/agospoda/#rhel5

Thanks for testing these for me.

Comment 7 Corey Marthaler 2010-03-30 17:16:40 UTC
Andy,

This kernel looks good.

e1000: eth2 NIC Link is Down
bonding: bond0: link status definitely down for interface eth2, disabling it
e1000: eth3 NIC Link is Down
bonding: bond0: link status definitely down for interface eth3, disabling it
e1000: eth4 NIC Link is Down
bonding: bond0: link status definitely down for interface eth4, disabling it
e1000: eth5 NIC Link is Down
bonding: bond0: link status definitely down for interface eth5, disabling it
e1000: eth6 NIC Link is Down
bonding: bond0: link status definitely down for interface eth6, disabling it
e1000: eth7 NIC Link is Down
bonding: bond0: link status definitely down for interface eth7, disabling it
e1000: eth8 NIC Link is Down
bonding: bond0: link status definitely down for interface eth8, disabling it
e1000: eth9 NIC Link is Down
bonding: bond0: link status definitely down for interface eth9, disabling it
bonding: bond0: now running without any active interface !

Linux taft-02 2.6.18-194.el5.gtest.87 #1 SMP Mon Mar 29 17:33:42 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment 8 Andy Gospodarek 2010-05-18 14:42:50 UTC

*** This bug has been marked as a duplicate of bug 570645 ***