Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 451939

Summary: bonding driver can leave rtnl_lock unbalanced
Product: Red Hat Enterprise Linux 5 Reporter: RHEL Program Management <pm-rhel>
Component: kernelAssignee: Jiri Pirko <jpirko>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.2CC: agospoda, ahecox, anton, dhoward, pm-eus, rkhan, robd, sfolkwil, skakar
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-08-04 18:12:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 450219    
Bug Blocks:    

Description RHEL Program Management 2008-06-18 09:18:25 UTC
This bug has been copied from bug #450219 and has been proposed
to be backported to 5.2 z-stream (EUS).

Comment 4 Jiri Pirko 2008-07-14 08:30:03 UTC
in kernel-2.6.18-92.1.7.el5

Comment 7 David Mair 2008-07-30 20:41:13 UTC
*** Bug 451677 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2008-08-04 18:12:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0612.html

Comment 10 Robert J. Palmer 2008-09-04 20:50:29 UTC
Still having issues with this after applying the RHSA-2008-0612 patch.

Here's an example of what we're seeing on our hosts with bnx2 NIC cards:

Sep  3 04:15:34 lessno-cluster1 kernel: Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007)
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: MII link monitoring set to 1000 ms
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth0 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth1 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: making interface eth0 the new active one.
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: first active interface up!

Comment 11 Andy Gospodarek 2008-09-04 21:28:27 UTC
What base kernel are you using?  It looks like something from RHEL5.1.

You cannot take the patch from RHSA-2008-0612, apply it to a RHEL5.1 kernel and cause the rtnl messages will in comment #10 to disappear.  Without using the changes from the RHEL5.2 kernel (which uses bonding driver version 3.2.4) you will still get the messages shown in comment #10 whether you have the patch for RHSA-2008-0612 or not.

Comment 12 Robert J. Palmer 2008-09-04 23:39:16 UTC
Andy,

Thanks for catching the version number. As it turns out grub wasn't updated to boot the new kernel so that explains nothing changing after the update. All is well with 2.6.18-92.el5.

Thanks,
Rob

Comment 13 Andy Gospodarek 2008-09-05 01:26:03 UTC
Excellent -- glad to hear it's working.