Bug 451939 - bonding driver can leave rtnl_lock unbalanced
bonding driver can leave rtnl_lock unbalanced
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
urgent Severity high
: rc
: ---
Assigned To: Jiri Pirko
Martin Jenner
: ZStream
: 451677 (view as bug list)
Depends On: 450219
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-18 05:18 EDT by RHEL Product and Program Management
Modified: 2015-05-04 21:15 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-08-04 14:12:49 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description RHEL Product and Program Management 2008-06-18 05:18:25 EDT
This bug has been copied from bug #450219 and has been proposed
to be backported to 5.2 z-stream (EUS).
Comment 4 Jiri Pirko 2008-07-14 04:30:03 EDT
in kernel-2.6.18-92.1.7.el5
Comment 7 David Mair 2008-07-30 16:41:13 EDT
*** Bug 451677 has been marked as a duplicate of this bug. ***
Comment 9 errata-xmlrpc 2008-08-04 14:12:49 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0612.html
Comment 10 Robert J. Palmer 2008-09-04 16:50:29 EDT
Still having issues with this after applying the RHSA-2008-0612 patch.

Here's an example of what we're seeing on our hosts with bnx2 NIC cards:

Sep  3 04:15:34 lessno-cluster1 kernel: Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007)
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: MII link monitoring set to 1000 ms
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth0 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth1 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: making interface eth0 the new active one.
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: first active interface up!
Comment 11 Andy Gospodarek 2008-09-04 17:28:27 EDT
What base kernel are you using?  It looks like something from RHEL5.1.

You cannot take the patch from RHSA-2008-0612, apply it to a RHEL5.1 kernel and cause the rtnl messages will in comment #10 to disappear.  Without using the changes from the RHEL5.2 kernel (which uses bonding driver version 3.2.4) you will still get the messages shown in comment #10 whether you have the patch for RHSA-2008-0612 or not.
Comment 12 Robert J. Palmer 2008-09-04 19:39:16 EDT
Andy,

Thanks for catching the version number. As it turns out grub wasn't updated to boot the new kernel so that explains nothing changing after the update. All is well with 2.6.18-92.el5.

Thanks,
Rob
Comment 13 Andy Gospodarek 2008-09-04 21:26:03 EDT
Excellent -- glad to hear it's working.

Note You need to log in before you can comment on or make changes to this bug.