Bug 241719 - bonding causes kernel issue.
bonding causes kernel issue.
Status: CLOSED DUPLICATE of bug 251902
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Andy Gospodarek
Martin Jenner
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-05-29 15:38 EDT by Jeremiah Johnson
Modified: 2014-06-29 18:58 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-30 13:20:08 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jeremiah Johnson 2007-05-29 15:38:05 EDT
Description of problem: Configuring network bonding causes kernel issues.


Version-Release number of selected component (if applicable):
Linux sfg-hou-nfs1 2.6.18-8.1.4.el5 #1 SMP Fri May 4 22:15:17 EDT 2007 x86_64
x86_64 x86_64 GNU/Linux



How reproducible:
Allways.


Steps to Reproduce:
[root@sfg-hou-nfs1 network-scripts]# cat ifcfg-bond0
DEVICE=bond0
IPADDR=10.10.253.65
NETMASK=255.255.255.0
NETWORK=10.10.253.0
BROADCAST=10.10.253.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BONDING_OPTS="mode=balance-alb miimon=100"
[root@sfg-hou-nfs1 network-scripts]# cat ifcfg-eth0
DEVICE=eth0
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
HWADDR=00:19:B9:C3:C9:E3
BOOTPROTO=none
[root@sfg-hou-nfs1 network-scripts]# cat ifcfg-eth1
# Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet
DEVICE=eth1
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
HWADDR=00:19:B9:C3:C9:E5
BOOTPROTO=none

  
Actual results:
bonding: bond0: Removing slave eth0
bonding: bond0: Warning: the permanent HWaddr of eth0 - 00:19:B9:C3:C9:E3 - is
still in use by bond0. Set the HWaddr of eth0 to a different address to avoid
conflicts.
bonding: bond0: releasing active interface eth0
bonding: bond0: Removing slave eth1
bonding: bond0: releasing active interface eth1
bonding: bond0: setting mode to balance-alb (6).
bonding: bond0: Setting MII monitoring interval to 100.
ADDRCONF(NETDEV_UP): bond0: link is not ready
bonding: bond0: Adding slave eth0.
bnx2: eth0: using MSI
ADDRCONF(NETDEV_UP): eth0: link is not ready
bonding: bond0: enslaving eth0 as an active interface with a down link.
bonding: bond0: Adding slave eth1.
bnx2: eth1: using MSI
ADDRCONF(NETDEV_UP): eth1: link is not ready
bonding: bond0: enslaving eth1 as an active interface with a down link.
bnx2: eth0 NIC Link is Up, 1000 Mbps full duplex
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
bonding: bond0: link status definitely up for interface eth0.
bonding: bond0: making interface eth0 the new active one.
RTNL: assertion failed at net/core/fib_rules.c (388)

Call Trace:
 <IRQ>  [<ffffffff80213c69>] fib_rules_event+0x3d/0xff
 [<ffffffff8006492c>] notifier_call_chain+0x20/0x32
 [<ffffffff80206e22>] dev_set_mac_address+0x52/0x58
 [<ffffffff8857d857>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
 [<ffffffff8857dcd7>] :bonding:alb_swap_mac_addr+0x95/0x163
 [<ffffffff88578f2f>] :bonding:bond_change_active_slave+0x1db/0x2f6
 [<ffffffff88579a48>] :bonding:bond_select_active_slave+0xa5/0xda
 [<ffffffff8857adb4>] :bonding:bond_mii_monitor+0x3a8/0x3ee
 [<ffffffff8857aa0c>] :bonding:bond_mii_monitor+0x0/0x3ee
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469

RTNL: assertion failed at net/ipv4/devinet.c (984)

Call Trace:
 <IRQ>  [<ffffffff80238e22>] inetdev_event+0x48/0x27d
 [<ffffffff8003ccc5>] rt_run_flush+0x7f/0xb8
 [<ffffffff8006492c>] notifier_call_chain+0x20/0x32
 [<ffffffff80206e22>] dev_set_mac_address+0x52/0x58
 [<ffffffff8857d857>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
 [<ffffffff8857dcd7>] :bonding:alb_swap_mac_addr+0x95/0x163
 [<ffffffff88578f2f>] :bonding:bond_change_active_slave+0x1db/0x2f6
 [<ffffffff88579a48>] :bonding:bond_select_active_slave+0xa5/0xda
 [<ffffffff8857adb4>] :bonding:bond_mii_monitor+0x3a8/0x3ee
 [<ffffffff8857aa0c>] :bonding:bond_mii_monitor+0x0/0x3ee
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469

RTNL: assertion failed at net/core/fib_rules.c (388)

Call Trace:
 <IRQ>  [<ffffffff80213c69>] fib_rules_event+0x3d/0xff
 [<ffffffff8006492c>] notifier_call_chain+0x20/0x32
 [<ffffffff80206e22>] dev_set_mac_address+0x52/0x58
 [<ffffffff8857d857>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
 [<ffffffff8857dce9>] :bonding:alb_swap_mac_addr+0xa7/0x163
 [<ffffffff88578f2f>] :bonding:bond_change_active_slave+0x1db/0x2f6
 [<ffffffff88579a48>] :bonding:bond_select_active_slave+0xa5/0xda
 [<ffffffff8857adb4>] :bonding:bond_mii_monitor+0x3a8/0x3ee
 [<ffffffff8857aa0c>] :bonding:bond_mii_monitor+0x0/0x3ee
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469

RTNL: assertion failed at net/ipv4/devinet.c (984)

Call Trace:
 <IRQ>  [<ffffffff80238e22>] inetdev_event+0x48/0x27d
 [<ffffffff8003ccc5>] rt_run_flush+0x7f/0xb8
 [<ffffffff8006492c>] notifier_call_chain+0x20/0x32
 [<ffffffff80206e22>] dev_set_mac_address+0x52/0x58
 [<ffffffff8857d857>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
 [<ffffffff8857dce9>] :bonding:alb_swap_mac_addr+0xa7/0x163
 [<ffffffff88578f2f>] :bonding:bond_change_active_slave+0x1db/0x2f6
 [<ffffffff88579a48>] :bonding:bond_select_active_slave+0xa5/0xda
 [<ffffffff8857adb4>] :bonding:bond_mii_monitor+0x3a8/0x3ee
 [<ffffffff8857aa0c>] :bonding:bond_mii_monitor+0x0/0x3ee
 [<ffffffff80092c4a>] run_timer_softirq+0x133/0x1b0
 [<ffffffff80011c19>] __do_softirq+0x5e/0xd5
 [<ffffffff8005c330>] call_softirq+0x1c/0x28
 [<ffffffff8006a312>] do_softirq+0x2c/0x85
 [<ffffffff80054f2e>] mwait_idle+0x0/0x4a
 [<ffffffff8005bcc2>] apic_timer_interrupt+0x66/0x6c
 <EOI>  [<ffffffff80054f64>] mwait_idle+0x36/0x4a
 [<ffffffff80046fb7>] cpu_idle+0x95/0xb8
 [<ffffffff80073bb7>] start_secondary+0x45a/0x469

bonding: bond0: first active interface up!
ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
eth0: no IPv6 routers present
bond0: no IPv6 routers present

Additional info:
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.4.44-1 (August 10, 2006)
ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 16 (level, low) -> IRQ 169
eth0: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at
 mem f8000000, IRQ 169, node addr 0019b9c3c9e3
ACPI: PCI Interrupt 0000:09:00.0[A] -> GSI 16 (level, low) -> IRQ 169
eth1: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at
 mem f4000000, IRQ 169, node addr 0019b9c3c9e5
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4

[root@sfg-hou-nfs1 network-scripts]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.0.3 (March 23, 2006)

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Link Failure Count: 0
Permanent HW addr: 00:19:b9:c3:c9:e3

Slave Interface: eth1
MII Status: down
Link Failure Count: 0
Permanent HW addr: 00:19:b9:c3:c9:e5
Comment 1 Andy Gospodarek 2007-06-04 10:33:12 EDT
This is related to bug 210577 and fixes planned for that one should resolve this
issue.
Comment 2 Adrian Reber 2007-06-15 09:20:30 EDT
Same on Fedora 7 with e1000 on x86_64.
Comment 3 Andy Gospodarek 2007-06-15 11:07:51 EDT
(In reply to comment #2)
> Same on Fedora 7 with e1000 on x86_64.

Not surprising since the fix isn't upstream yet.
Comment 4 Andy Gospodarek 2007-06-15 11:09:38 EDT
Also, there are test kernels that contain a patch that should resolve this
issue.  You can get them here:

http://people.redhat.com/agospoda/#rhel5

Any testing you can do would be appreciated!
Comment 5 Jeremiah Johnson 2007-06-15 12:54:25 EDT
gospo, I don't see any test kernels available.  If one becomes available today I
will test it.

I updated to 2.6.18-8.1.6.el5 on a server that did not have bonding yet
configured, but has the exact same hardware.  We're still seeing the error but I
don't think this problem was resolved in the new kernel update.
Comment 6 Andy Gospodarek 2007-06-15 13:56:33 EDT
Jeremiah,

You can download the test kernels here:

http://people.redhat.com/agospoda/#rhel5

They are ones I've built for testing fixes and are not offically supported.
Comment 7 Jeremiah Johnson 2007-06-15 14:17:43 EDT
Hrm, I had went to that link earlier and nothing was displaying.  Maybe a
browser issue, anyways 2.6.18-22.el5.gtest.18 on my server I don't see any
traces when I restart networking with bonding enabled.
Comment 8 Andy Gospodarek 2007-06-15 15:06:15 EDT
Glad to hear that someone else gets the same results that I get. :-)

Feel free to put that test kernel through any test cycles you like since I'd
like to make sure others agree with me that the bugs are worked out.
Comment 9 Jeremiah Johnson 2007-06-15 18:14:10 EDT
We will continue to run your kernel until our testing phase for this system has
finished.  If we run into any other problems we will let you know.
Comment 13 Ville Skyttä 2008-01-06 14:08:08 EST
What are the effects of this issue?

It seems still present in latest EL5 kernels, but despite of the assertion
failure messages, bonding-alb appears to work in some quick tests (tested with
two tg3 interfaces).
Comment 16 Kirill Korotaev 2008-01-29 13:21:22 EST
we hit similar problems with OVZ kernels (based on RHEL5.1):

The call trace is:

bond_mii_monitor
  write_lock(&bond->curr_slave_lock);
  bond_select_active_slave
    bond_change_active_slave
      bond_alb_handle_active_change
        alb_set_slave_mac_addr
           dev_set_mac_address

dev_set_mac_adress will call (sooner or later) device notifier chain which
should be run under rtnl_lock exclusively without any other locks get.

The mainstream kernel has this problem fixed by complete locking rewrite.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=drivers/net/bonding/bond_main.c;h=49a198206e3de901a74f34fc40694bb056ad922c;hb=HEAD
see patches from Jay Vosburgh, 2007-10-24

Comment 17 Andy Gospodarek 2008-01-30 13:19:00 EST
Kirill,

This is fixed already in the latest rhel5 dev kernels.  You can also check my
test kernels:

http://people.redhat.com/agospoda/#rhel5

for some bonding updates that I hope to get included in 5.2.  Any feedback you
can provide is helpful.
Comment 18 Andy Gospodarek 2008-01-30 13:20:08 EST

*** This bug has been marked as a duplicate of 251902 ***

Note You need to log in before you can comment on or make changes to this bug.