Bug 584872 - NIC bonding arp monitoring method doesn't work when a bond is added to a bridge
Summary: NIC bonding arp monitoring method doesn't work when a bond is added to a bridge
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
medium
high
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-22 15:47 UTC by Tom
Modified: 2018-11-14 15:06 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-05-27 18:25:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Tom 2010-04-22 15:47:20 UTC
Description of problem:


Redhat 5.4 + Xen Hypervisor.


How reproducible:

The problem can be easily reproduced.

Steps to Reproduce:

1. create a bond named bond0 with following settings,

#ifcfg-bond0
DEVICE=bond0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
BONDING_OPTS='mode=1 arp_interval=10000 arp_validate=all arp_ip_target=192.168.1.100 primary=eth0,eth1'
BRIDGE=bridge1
#Be noted that 192.168.1.100 is another host on the same ethernet and it is arp pingable.

#ifcfg-eth0
DEVICE=eth0
HWADDR=xxxxxxxxxx
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

#ifcfg-eth1
DEVICE=eth1
HWADDR=yyyyyyyyyyyy
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

2. create a bridge named bridge1,

#ifcfg-bridge1
DEVICE=bridg1
BOOTPROTO=static
DHCPCLASS=
ONBOOT=yes
TYPE=Bridge
DELAY=0
IPADDR=192.168.1.11
NETMASK=255.255.255.0
GATEWAY=192.168.1.99
NETWORK=192.168.1.0
BROADCAST=192.168.1.255

3. run "network service restart"
  
Actual results:

I am unable to ping 192.168.1.11 from another machine or ping other machines from this machine and the status of eth0 and eth1 reported  in /proc/net/bonding/bond0 is "down", as shown below.

############################################################
Ethernet Channel Bonding Driver: v3.4.0 (October 7, 2008)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 10000
ARP IP target/s (n.n.n.n form): 192.168.1.100

Slave Interface: eth0
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:15:17:73:85:ea

Slave Interface: eth1
MII Status: down
Link Failure Count: 1
Permanent HW addr: 00:1a:64:e7:18:a8

############################################################

Expected results:

I should be able to ping other mechines from this machine or vice versa.

Additional info:

1) The traffic on eth0 is shown as following,

# tethereal -f eth0

  0.000000 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 0.0.0.0
  0.000007 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP Gratuitous ARP for 192.168.1.100 (Reply)
  9.999245 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 0.0.0.0
 19.999799 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 0.0.0.0
 19.999905 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP Gratuitous ARP for 192.168.1.100 (Reply)
 30.000267 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 0.0.0.0
 30.000370 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP Gratuitous ARP for 192.168.1.100 (Reply)
 40.001085 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 0.0.0.0

The linux kernel does does send ARP requests every 10 seconds as expected. However, two things to note, a) source ip of ARP request is 0.0.0.0; b)the ARP response is Gratuitous ARP response.

2) Afer I removed bond0 from bridge and configure an IP address (say 192.168.1.41) for the bond, I notice the status of eth0 and eth1 can be reported correctly in /proc/net/bonding/bond0 and everything works perfectly. I notice that the arp requests and responses are different from previous test: 

  0.284006 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 192.168.1.41
  0.284134 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP 192.168.1.100 is at 00:14:5e:83:3d:93
 10.284632 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 192.168.1.41
 10.284746 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP 192.168.1.100 is at 00:14:5e:83:3d:93
 20.285252 IntelCor_73:85:ea -> Broadcast    ARP Who has 192.168.1.100?  Tell 192.168.1.41
 20.285358 Ibm_83:3d:93 -> IntelCor_73:85:ea ARP 192.168.1.100 is at 00:14:5e:83:3d:93

Regards,
Hongming Xiao

Comment 1 Tom 2010-04-27 18:59:25 UTC
Missed following two steps on how to reproduce the problem, 

a) append "alias bond0 bonding" at the end of /etc/modprobe.conf
b) add "GATEWAYDEV=bridge1" at the end of /etc/sysconfig/network

Comment 2 Andy Gospodarek 2010-04-29 19:29:39 UTC
Patches were just posted to netdev that may resolve this:

http://permalink.gmane.org/gmane.linux.network/159403

Comment 3 Tom 2010-05-03 15:40:47 UTC
Andy,

Thanks for providing the patch in such a short time. I didn't get time to verify it yet. Does Red Hat have any plan to provide an updated bonding.ko or updated kernel to include this fix? 

I can build this kernel module myself and apply it. However, I am wondering whether it is a proper way of applying this kind of kernel patch in a production environment.

Regards,
Hongming

Comment 4 Andy Gospodarek 2010-05-03 15:56:58 UTC
Hongming (Tom), right now we do not have immediate plans to include this (but I suspect we will if enough demand it).

Since we *just* released RHEL5.5, RHEL5.6 would be the earliest a backported version of the patch in comment #2 would be included.

Comment 5 Tom 2010-05-05 21:27:23 UTC
Hi Andy,

I tried to apply your patch and saw following error , 

[hongming@dom0 linux-2.6.18.x86_64]$ patch -p1 < ~/rpmbuild/SPECS/arp.patch
patching file drivers/net/bonding/bond_main.c
Hunk #1 succeeded at 2000 (offset 60 lines).
patch: **** malformed patch at line 20: net_device *orig_dev)

where, line 19 and 20 of the patch file is 

#line 19:
-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct
#line 20:
net_device *orig_dev)

I am guessing somehow the patch file was reformated by the HTML page which caused line 19 and line 20 are separated into two lines.
 
To avoid this problem, can you attach the patch file directly to here?

Thanks,
Hongming

Comment 6 Tom 2010-06-16 13:27:57 UTC
Andy, refer to my previous message. I think the patch you provided simply cannot be applied to RedHat 5.4 directly! Can you confirm? 

Regards, Hongming

Comment 7 Andy Gospodarek 2010-06-16 16:05:18 UTC
Tom, you are correct.  The patch cannot be directly applies to RHEL5.4.  Someone (most-likely me) will have to do the work to add this feature to RHEL5.6.  I do no think this feature will be added to RHEL5.4 or RHEL5.5.

Comment 8 Tom 2010-06-16 16:24:54 UTC
Andy, thanks for the information.

What I can do to ensure that this feature will be really scheduled for 5.6? 

I want to verify whether the patch you provided really solves the problem or not. What should I do? Download the latest Fedora source and apply your patch? If this is not the case, may you give some instructions?

Regards, Hongming

Comment 13 Shi jin 2011-02-23 20:24:06 UTC
Hi there,

I am having similar problems on RHEL-6 with briding+bonding, although in my case I am running KVM instead of Xen.
Basically the ARP table get screed up and I cannot ping the VM IP address from other machines. It actually returns the MAC address of the eth0 of the physical host instead of the VM MAC address when asked for the ARP on the VM IP.

Comment 14 Andy Gospodarek 2011-02-23 21:51:06 UTC
(In reply to comment #13)
> Hi there,
> 
> I am having similar problems on RHEL-6 with briding+bonding, although in my
> case I am running KVM instead of Xen.
> Basically the ARP table get screed up and I cannot ping the VM IP address from
> other machines. It actually returns the MAC address of the eth0 of the physical
> host instead of the VM MAC address when asked for the ARP on the VM IP.

Active-backup with ARP monitoring on a bond placed in a bridge will simply not work.  I would suggest disabling ARP monitoring or using 802.3ad (mode 4) bonding.

Any mode where the switch might broadcast frames to an inactive link is one that could cause problems and will ruin the forwarding database in the kernel's bridge.

Comment 15 Shi jin 2011-02-23 22:57:07 UTC
Thank you Andy.

I was using 
==> ifcfg-bond0 <==
DEVICE=bond0
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
BRIDGE=br0
BONDING_OPTS="mode=6 miimon=100"

This is not the active-backup mode but the balance-alb(mode 6). Do I still need to disable arp monitoring? I think the default arp_internal=0 and it is only for the active-backup mode, right? 
Anyway, the real question is that can I ever do bridge on bond on mode 6 and if so, how should I do it? Thank you very much.

Comment 16 Andy Gospodarek 2011-02-24 14:57:28 UTC
(In reply to comment #15)
> Thank you Andy.
> 
> I was using 
> ==> ifcfg-bond0 <==
> DEVICE=bond0
> ONBOOT=yes
> BOOTPROTO=none
> USERCTL=no
> BRIDGE=br0
> BONDING_OPTS="mode=6 miimon=100"
> 
> This is not the active-backup mode but the balance-alb(mode 6). Do I still need
> to disable arp monitoring? I think the default arp_internal=0 and it is only
> for the active-backup mode, right? 
> Anyway, the real question is that can I ever do bridge on bond on mode 6 and if
> so, how should I do it? Thank you very much.

Sorry, I assumed you were using active-backup with arp monitoring since this bug was trying to address those issues.  I have seen reports of mode 5 (balance-tlb) and mode 4 (802.3ad) working well in a bridge.  Mode 6 is not a good solution because of the ARP frames that is sends trying to direct and balance traffic.

Comment 17 Shi jin 2011-02-24 18:55:55 UTC
Thank you.
I can confirm that both mode 1 and 5 works with bridge.
* mode 1: BONDING_OPTS="mode=1 primary=eth0 miimon=100"
* mode 5: BONDING_OPTS="mode=5 miimon=100"
mode 6 indeed does not work with bridging due to the ARP problem.

However, I do see about 30 seconds delay at link failure before the connection resumes in mode 5. There is no obvious delay in mode 1. Is this the expected behavior? 

Thanks a lot.

Comment 18 Andy Gospodarek 2011-02-24 20:30:47 UTC
(In reply to comment #17)
> Thank you.
> I can confirm that both mode 1 and 5 works with bridge.
> * mode 1: BONDING_OPTS="mode=1 primary=eth0 miimon=100"
> * mode 5: BONDING_OPTS="mode=5 miimon=100"
> mode 6 indeed does not work with bridging due to the ARP problem.
> 
> However, I do see about 30 seconds delay at link failure before the connection
> resumes in mode 5. There is no obvious delay in mode 1. Is this the expected
> behavior? 
> 
> Thanks a lot.

The 30 second delay sounds a lot like spanning tree.  I would guess you tested this the same switch and host, so I suspect this is not the case.  My guess is that 30 seconds is the timeout for the forwarding database in the kernel's bridge and due to the way mode 5 will transmit on all interfaces, you are still getting a short period where the forwarding database is wrong.  Can you compare the output of:

brctl showmacs br0 

(or similar) when the system is working, after the failover when it does not have connectivity, and after 30sec when it works?  Output of:

brctl show

at any time would also be helpful.

Comment 19 Tom 2011-02-25 13:17:33 UTC
My interpretation to your words is, the patch given in comment #2 wasn't backported to 6.0 or doesn't work as expected after being backported to 6.0. Can you confirm? 

(In reply to comment #14)
> (In reply to comment #13)
> > Hi there,
> > 
> > I am having similar problems on RHEL-6 with briding+bonding, although in my
> > case I am running KVM instead of Xen.
> > Basically the ARP table get screed up and I cannot ping the VM IP address from
> > other machines. It actually returns the MAC address of the eth0 of the physical
> > host instead of the VM MAC address when asked for the ARP on the VM IP.
> Active-backup with ARP monitoring on a bond placed in a bridge will simply not
> work.  I would suggest disabling ARP monitoring or using 802.3ad (mode 4)
> bonding.
> Any mode where the switch might broadcast frames to an inactive link is one
> that could cause problems and will ruin the forwarding database in the kernel's
> bridge.

Comment 21 Neil Horman 2011-04-28 17:51:45 UTC
What does your backport look like?  You uploaded an srpm to brew without a cvs or git reference, so I can't see it, and the upstream patch from comment 2 is completely un-appliable to RHEL5 without major modification.  Please attach the patch here.

I'm tempted to say we should just close this as a wontfix, given that the working upstream solution is way to invasive to take in RHEL5 at this late stage in the lifecycle, and anything else is more or less just a hack (coupled with the fact that bridging + bonding has non-fixable problems in other operational modes), but please attach the patch here, maybe is contained and safe enough that we can take it.

Comment 22 Flavio Leitner 2011-04-28 17:59:39 UTC
(In reply to comment #21)
> What does your backport look like?  You uploaded an srpm to brew without a cvs
> or git reference, so I can't see it, and the upstream patch from comment 2 is
> completely un-appliable to RHEL5 without major modification.  Please attach the
> patch here.

Hi Neil,
Indeed, I already asked wmealing (backport author) to attach the patch here.
He is in APAC timezone afaik, so might take a while for him to attach.

fbl

Comment 23 Neil Horman 2011-04-28 19:43:24 UTC
Ok, please don't clear the needinfo flag when updating a bz without the needed info.  Thank you.

Comment 26 Neil Horman 2011-04-29 18:33:38 UTC
Thank you wade, I'm hesitant to introduce that changeset this late in RHEL5's life cycle, especially since it introduces changes to the common receive path.  If the customer is the only one thats tested it and just to confirm that the one problem is fixed, I'm really not comfortable with the change unless it gets lots more testing.  What would be even better is if we could just convince the customer to use 802.3ad mode, so that they just won't have this problem.  Is that a possiblity?

Comment 33 purpleidea 2012-07-09 21:05:24 UTC
I think I'm experiencing this bug on latest 6.2. Can someone confirm if this should be fixed? I don't have all the information to fully figure out if this is what is occuring. If so could someone bump this to 6.2 and a high severity as this is a big regression.

Thanks,
James

Comment 34 Maurits van de Lande 2012-07-18 09:01:08 UTC
On 6.3 bonding mode 1 with the following settings doesn't work

BONDING_OPTS="mode=1 arp_interval=100 arp_validate=all arp_ip_target=172.16.117.10,172.16.117.11,172.16.117.20,172.16.117.21" 

In my case the bonding interface is not connected to a bridge.

Best regards,

Maurits

Comment 35 Andy Gospodarek 2012-07-20 15:17:48 UTC
(In reply to comment #34)
> On 6.3 bonding mode 1 with the following settings doesn't work
> 
> BONDING_OPTS="mode=1 arp_interval=100 arp_validate=all
> arp_ip_target=172.16.117.10,172.16.117.11,172.16.117.20,172.16.117.21" 
> 
> In my case the bonding interface is not connected to a bridge.
> 
> Best regards,
> 
> Maurits

Without any additional information there is no way anyone can really help out.  I'm also not sure this is the best place to ask for support since you are not using bridging.    I would suggest opening a new bug to address this.

Just as a note, ARP monitoring does work on 6.3 but will be different from 6.2 as the hosts used for monitoring must be on the same subnet as the bond device.

Comment 36 Jorrit 2012-08-10 10:08:55 UTC
I've been able to reproduce active-backup bonding to fail whenever a bonding interface is added to a bridge AND arp_validate is set to 
- active(1): the current active slave interface starts flapping up and down
- all(3)   : the current active slave interface goes down and stays down

Situation:

      HOST1                                 HOST2
+---------------+                     +---------------+
| eth1 \        +-----[ switch1 ]-----+        / eth2 | 
|       --bond0 |                     | bond0--       | 
| eth0 /        +-----[ switch2 ]-----+        \ eth0 | 
+---------------+                     +---------------+

Running 6.3 with kernel 2.6.32-279.2.1

How to reproduce:

1) Configure 2 hosts with bond0 interfaces, mode=active-backup(1), each other as arp_ip_target, arp_validate=0

2) This setup works, according to /proc/net/bonding/bond0 *) on host2, eth0 is down and eth2 is up. Ping works. Fine.

3) Set arp_validate=3. This setup works.

4) Set arp_validate=0, and (on host2) add bond0 to a bridge
[root@host2 ~]# echo 0 > /sys/class/net/bond0/bonding/arp_validate
[root@host2 ~]# ifconfig bond0 0.0.0.0
[root@host2 ~]# brctl addbr br0
[root@host2 ~]# brctl addif br0 bond0
[root@host2 ~]# ifconfig br0 10.254.239.200/24
This setup still works

5) Now set arp_validate=3 on bond0:
[root@host2 ~]# echo 3 > /sys/class/net/bond0/bonding/arp_validate

6) /var/log/messages reports:
Aug 10 11:02:34 brug01 kernel: bonding: bond0: setting arp_validate to all (3).
Aug 10 11:02:34 brug01 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it
Aug 10 11:02:34 brug01 kernel: device eth0 left promiscuous mode
Aug 10 11:02:34 brug01 kernel: bonding: bond0: now running without any active interface !
Aug 10 11:02:34 brug01 kernel: br0: port 1(bond0) entering disabled state

7) This setup had stopped working.
/proc/net/bonding/bond0 **) says both eth0 and eth2 are down.

8) With 2 UTP cables instead of switches this behaviour remains.

--

*) Output from /proc/net/bonding/bond0 with arp_validate=3 in active-backup mode WITHOUT bridge
[root@host2 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.254.240.14

Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:11:0a:5f:95:a4
Slave queue ID: 0

Slave Interface: eth2
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:30:48:73:b0:24
Slave queue ID: 0

--

**) Output from /proc/net/bonding/bond0 with arp_validate=3 in active-backup mode WITH bridge:
[root@brug01 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: None
MII Status: down
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
ARP Polling Interval (ms): 1000
ARP IP target/s (n.n.n.n form): 10.254.240.14

Slave Interface: eth0
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:11:0a:5f:95:a4
Slave queue ID: 0

Slave Interface: eth2
MII Status: down
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 00:30:48:73:b0:24
Slave queue ID: 0

Comment 37 Maurits van de Lande 2012-08-24 10:12:53 UTC
>Just as a note, ARP monitoring does work on 6.3 but will be different from 6.2 >as the hosts used for monitoring must be on the same subnet as the bond device.

Thanks,

I just modified the ifcfg-bond0 file to use IP targets on the same subnet as the bonding interface.It looks like this works.

Comment 38 wu_chulin 2015-05-13 02:59:15 UTC
I want to know the problem solved or not,I also encounter this problem now.

Comment 39 martin 2016-05-06 12:40:03 UTC
I think, that I have got similar problem on my RHEL7.2
I set bonding + vlan on kvm host,
My bonding mode is active-backup, when I turn off one of bonding slave device, everything is ok but only on the host.
On my kvm guest machines I have problems with connections.
I lost connection to them.
What is iteresting I can ping the gateway from my guest machine, but I can not ping this guest machine from any other host.


Note You need to log in before you can comment on or make changes to this bug.