Bug 451939 - bonding driver can leave rtnl_lock unbalanced
Summary: bonding driver can leave rtnl_lock unbalanced
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Martin Jenner
URL:
Whiteboard:
: 451677 (view as bug list)
Depends On: 450219
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-18 09:18 UTC by RHEL Program Management
Modified: 2015-05-05 01:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-04 18:12:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2008:0612 0 normal SHIPPED_LIVE Important: kernel security and bug fix update 2008-08-06 14:46:27 UTC

Description RHEL Program Management 2008-06-18 09:18:25 UTC
This bug has been copied from bug #450219 and has been proposed
to be backported to 5.2 z-stream (EUS).

Comment 4 Jiri Pirko 2008-07-14 08:30:03 UTC
in kernel-2.6.18-92.1.7.el5

Comment 7 David Mair 2008-07-30 20:41:13 UTC
*** Bug 451677 has been marked as a duplicate of this bug. ***

Comment 9 errata-xmlrpc 2008-08-04 18:12:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0612.html

Comment 10 Robert J. Palmer 2008-09-04 20:50:29 UTC
Still having issues with this after applying the RHSA-2008-0612 patch.

Here's an example of what we're seeing on our hosts with bnx2 NIC cards:

Sep  3 04:15:34 lessno-cluster1 kernel: Ethernet Channel Bonding Driver: v3.1.2 (January 20, 2007)
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: MII link monitoring set to 1000 ms
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth0 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: Adding slave eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1: using MSI
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: enslaving eth1 as an active interface with a down link.
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bnx2: eth1 NIC Link is Up, 1000 Mbps full duplex
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth0.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: link status definitely up for interface eth1.
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: making interface eth0 the new active one.
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862516b>] :bonding:alb_swap_mac_addr+0x95/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/core/fib_rules.c (388)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff8021caa5>] fib_rules_event+0x3d/0xff
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: RTNL: assertion failed at net/ipv4/devinet.c (984)
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: Call Trace:
Sep  3 04:15:34 lessno-cluster1 kernel:  <IRQ>  [<ffffffff80241dee>] inetdev_event+0x48/0x282
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8003cc8d>] rt_run_flush+0x7f/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80064def>] notifier_call_chain+0x20/0x32
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8020fc95>] dev_set_mac_address+0x52/0x58
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88624ceb>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862517d>] :bonding:alb_swap_mac_addr+0xa7/0x163
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff886200a6>] :bonding:bond_change_active_slave+0x205/0x360
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8862052d>] :bonding:bond_select_active_slave+0xa4/0xd9
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621ffe>] :bonding:bond_mii_monitor+0x3bd/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff88621c41>] :bonding:bond_mii_monitor+0x0/0x403
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff800928ea>] run_timer_softirq+0x133/0x1b0
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80011cb4>] __do_softirq+0x5e/0xd5
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005c2fc>] call_softirq+0x1c/0x28
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8006a53a>] do_softirq+0x2c/0x85
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff8005bc8e>] apic_timer_interrupt+0x66/0x6c
Sep  3 04:15:34 lessno-cluster1 kernel:  <EOI>  [<ffffffff80054f26>] mwait_idle+0x36/0x4a
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff801831ed>] acpi_processor_idle+0x1a6/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80183047>] acpi_processor_idle+0x0/0x463
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80046f8d>] cpu_idle+0x95/0xb8
Sep  3 04:15:34 lessno-cluster1 kernel:  [<ffffffff80074501>] start_secondary+0x45a/0x469
Sep  3 04:15:34 lessno-cluster1 kernel: 
Sep  3 04:15:34 lessno-cluster1 kernel: bonding: bond0: first active interface up!

Comment 11 Andy Gospodarek 2008-09-04 21:28:27 UTC
What base kernel are you using?  It looks like something from RHEL5.1.

You cannot take the patch from RHSA-2008-0612, apply it to a RHEL5.1 kernel and cause the rtnl messages will in comment #10 to disappear.  Without using the changes from the RHEL5.2 kernel (which uses bonding driver version 3.2.4) you will still get the messages shown in comment #10 whether you have the patch for RHSA-2008-0612 or not.

Comment 12 Robert J. Palmer 2008-09-04 23:39:16 UTC
Andy,

Thanks for catching the version number. As it turns out grub wasn't updated to boot the new kernel so that explains nothing changing after the update. All is well with 2.6.18-92.el5.

Thanks,
Rob

Comment 13 Andy Gospodarek 2008-09-05 01:26:03 UTC
Excellent -- glad to hear it's working.


Note You need to log in before you can comment on or make changes to this bug.