Bug 487763 - Adding bonding in balance-alb mode to bridge causes host network connectivity to be lost
Summary: Adding bonding in balance-alb mode to bridge causes host network connectivity...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Jiri Pirko
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 503762 526775 533192 560588
TreeView+ depends on / blocked
 
Reported: 2009-02-27 19:46 UTC by Marat
Modified: 2016-04-26 14:52 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 503762 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:43:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
look through all interfaces in bond (3.75 KB, patch)
2009-02-27 20:11 UTC, Marat
no flags Details | Diff
first draft of the patch - kabi breaking (3.16 KB, application/octet-stream)
2009-03-14 17:31 UTC, Jiri Pirko
no flags Details
workaround script (910 bytes, application/x-perl)
2009-03-30 11:42 UTC, Jiri Pirko
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 56882 0 None None None 2019-05-03 18:38:18 UTC
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Marat 2009-02-27 19:46:19 UTC
Description of problem:
Suppose we have two network interfaces.
The first one is bonding in balance-alb mode (bond0), the second is bridge interface (br0).
Adding bond0 interface to bridge cause network connectivity to be lost.


Version-Release number of selected component (if applicable):
# uname -a
Linux host 2.6.18-128.1.1.el5 #1 SMP Mon Jan 26 13:59:00 EST 2009 i686 athlon i386 GNU/Linux

How reproducible:


Steps to Reproduce:
# cat /etc/modprobe.conf
...
alias bond0 bonding
options bond0 mode=balance-alb miimon=100
...
# modprobe bond0
# ifconfig  bond0 up
# ifenslave bond0 eth1
# ifenslave bond0 eth0
# brctl addif br0 bond0
# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.0030485bab35       no              bond0
Then somehow assign ip address to br0
# ping a.b.c.d
PING a.b.c.d (a.b.c.d) 56(84) bytes of data.
64 bytes from a.b.c.d: icmp_seq=1 ttl=64 time=5.53 ms
64 bytes from a.b.c.d: icmp_seq=2 ttl=64 time=0.223 ms
64 bytes from a.b.c.d: icmp_seq=3 ttl=64 time=0.222 ms
64 bytes from a.b.c.d: icmp_seq=4 ttl=64 time=0.237 ms
64 bytes from a.b.c.d: icmp_seq=5 ttl=64 time=0.240 ms
64 bytes from a.b.c.d: icmp_seq=6 ttl=64 time=0.223 ms
64 bytes from a.b.c.d: icmp_seq=49 ttl=64 time=0.223 ms
64 bytes from a.b.c.d: icmp_seq=50 ttl=64 time=0.227 ms
64 bytes from a.b.c.d: icmp_seq=51 ttl=64 time=0.219 ms
64 bytes from a.b.c.d: icmp_seq=52 ttl=64 time=0.260 ms
64 bytes from a.b.c.d: icmp_seq=53 ttl=64 time=0.235 ms
64 bytes from a.b.c.d: icmp_seq=54 ttl=64 time=0.240 ms
64 bytes from a.b.c.d: icmp_seq=89 ttl=64 time=0.222 ms
64 bytes from a.b.c.d: icmp_seq=90 ttl=64 time=0.232 ms
64 bytes from a.b.c.d: icmp_seq=91 ttl=64 time=0.237 ms
64 bytes from a.b.c.d: icmp_seq=92 ttl=64 time=0.229 ms
64 bytes from a.b.c.d: icmp_seq=93 ttl=64 time=0.233 ms
64 bytes from a.b.c.d: icmp_seq=94 ttl=64 time=0.224 ms

--- a.b.c.d ping statistics ---
99 packets transmitted, 18 received, 81% packet loss, time 97994ms
rtt min/avg/max/mdev = 0.219/0.525/5.534/1.215 ms


Actual results:
99 packets transmitted, 18 received, 81% packet loss

Expected results:
99 packets transmitted, 99 received, 0% packet loss

Additional info:

Comment 1 Marat 2009-02-27 20:04:05 UTC
I've investigated this issue and here is my results:

- adding a particular interface to the bridge makes bridge code to add
interface's MAC address in forward database (fdb) as well as mark this record
as "local" record.

-  when the process of handling an ingress packet is in progress bridge code looks through fdb looking for destination MAC and then checks if that MAC address is marked as "local", if yes - it treats this packet as a local one, if no - it forwards or floods the packet.

- in case of bonding interface is in balance-alb mode an ingress packet may have
destination MAC address which is equal to one of the MAC addresses that are belong to physical interfaces in bond. But bridge code marked only one of that MAC
addresses as "local" and if the packet doesn't have marked MAC it is not
considered as "local".

Comment 2 Marat 2009-02-27 20:11:55 UTC
Created attachment 333526 [details]
look through all interfaces in bond

May be this patch is not a solution or workaround but it is rather an additional explanation of that I've said before. It serves for making mе more understandable.

Comment 3 jsullivan 2009-03-12 13:59:43 UTC
I have the exact same problem.  If the primary interface is not the MAC address used, connectivity breaks.  In the below example, we see ping failing, we check the arp cache and see a secondary MAC is being used.  We delete the arp entry, enter a new arp entry with the primary interface MAC and ping works.

This is a big problem for us as alb has proved a lifesaver for us in a fairly complicated environment.  We need to deploy KVM but this bridging/bonding issue is a big problem.  Is there any workaround? Thanks - John

jsullivan@jaspav:~$ ping 172.30.10.22
PING 172.30.10.22 (172.30.10.22) 56(84) bytes of data.

--- 172.30.10.22 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 3000ms

jsullivan@jaspav:~$ arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
172.30.10.64             ether   00:30:48:C6:91:2C   C                     eth0
172.30.10.22             ether   00:30:48:C6:A0:E9   C                     eth0
172.30.10.9              ether   00:1F:FE:49:3D:E0   C                     eth0
172.30.10.8              ether   00:1F:FE:49:69:A0   C                     eth0
192.168.223.1            ether   00:D0:CF:04:05:BE   C                     wlan0
172.30.10.1              ether   00:15:17:90:3D:7E   C                     eth0
jsullivan@jaspav:~$ sudo arp -d 172.30.10.22
jsullivan@jaspav:~$ sudo arp -s 172.30.10.22 00:30:48:C6:A0:E8
jsullivan@jaspav:~$ ping 172.30.10.22
PING 172.30.10.22 (172.30.10.22) 56(84) bytes of data.
64 bytes from 172.30.10.22: icmp_seq=1 ttl=64 time=0.133 ms
64 bytes from 172.30.10.22: icmp_seq=2 ttl=64 time=0.122 ms
64 bytes from 172.30.10.22: icmp_seq=3 ttl=64 time=0.128 ms

--- 172.30.10.22 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.122/0.127/0.133/0.013 ms

Comment 4 Jiri Pirko 2009-03-13 08:44:24 UTC
I've just tested that rhel5.3 behaviour is the same as current upstream (2.6.29-rc6)

Comment 5 jsullivan 2009-03-13 14:06:59 UTC
My apologies for my ignorance but does that mean it is the same as 2.6.18 or that it has been fixed? Thanks - John

Comment 6 Jiri Pirko 2009-03-13 14:46:45 UTC
(In reply to comment #5)
> My apologies for my ignorance but does that mean it is the same as 2.6.18 or
> that it has been fixed? Thanks - John 
Yes John, it is the same. Currently I'm working on solution which I want to push upstream.

Comment 7 Jiri Pirko 2009-03-13 22:44:51 UTC
draft of patch sent for comments:
http://lkml.org/lkml/2009/3/13/372

I'm currently building rhel5.4 kernel with its backport for testing.

Comment 8 Jiri Pirko 2009-03-14 17:31:21 UTC
Created attachment 335216 [details]
first draft of the patch - kabi breaking

This patch solves the issue, however breaks kabi (don't pose a problem for now...). I'm now thinking about another approach which can be maybe done. It would be easier and even without kabi breaking.

Comment 9 Andy Gospodarek 2009-03-27 20:34:40 UTC
One comment upstream that I found interesting was the idea that bonding should be fixing that code is the one doing the MAC rewriting.  I tend to agree, but that might be too much code for skb_bond_should_drop for each frame.

You could probably work around this temporarily by using some ebtables prerouting/input rules to rewrite the MAC addresses from the slave devices to the MAC of the primary interface.

Comment 10 Jiri Pirko 2009-03-30 11:40:40 UTC
Yes Andy you are right. I've tested it and it seems to work fine. There is need to do it by following command:

ebtables -t nat -I PREROUTING -d MACOFSLAVE -j dnat --to-destination MACOFBONDDEVICE

Comment 11 Jiri Pirko 2009-03-30 11:42:45 UTC
Created attachment 337207 [details]
workaround script

This is simple perl script which provides workaround for this issue using ebtables dnat for mac rewriting.

Comment 12 Andy Gospodarek 2009-04-03 13:03:39 UTC
The script attached in comment #11 looks like a good one.  One thing to note, is that ebtables is unfortunately not included as a base package in RHEL5.  It can be easily obtained from EPEL however.

http://fedoraproject.org/wiki/EPEL

You can either add the EPEL yum repo to your system (I do this on ALL of my RHEL5 systems), or download the ebtables rpm directly:

http://download.fedora.redhat.com/pub/epel/5/i386/repoview/ebtables.html
http://download.fedora.redhat.com/pub/epel/5/x86_64/repoview/ebtables.html

I can't remember if there are any dependencies, but you can always use the 'yum localintall' command to install the rpm and have the deps satisfied automatically.

Comment 14 Bobby Shepherd 2009-07-14 19:40:06 UTC
My customer is running into this issue and I have temporarily had to split up their bond and pass multiple interfaces to their VM's instead.

Any packages not shipped with RHEL are not allowed by this customer (Common Criteria and other package approval forms needed), so the ebtables workaround is probably not an option.

Is there an actual kernel patch in works for this as I saw for 4.9?

The only other possible workaround is to have them use Mode 0 for their bond if mode 6 is the only one that has this problem...

Comment 15 Bobby Shepherd 2009-07-14 19:42:46 UTC
Well I missed the patch posted in the comment right above mine so ignore that :p  I'm assuming it's too late for this to be considered for 5.4?

Comment 16 Juris Krumins 2009-07-15 14:21:36 UTC
Can anybody comment on that. As far as I can see, there were patches for 2.6.30.
Does that means that this problem is probably solved in 2.6.30 ?

Thank you.

Comment 17 Jiri Pirko 2009-07-15 15:04:12 UTC
(In reply to comment #16)
> Can anybody comment on that. As far as I can see, there were patches for
> 2.6.30.
> Does that means that this problem is probably solved in 2.6.30 ?

Yes, it is. There have been also posted patches for both rhel5 and rhel4.

> 
> Thank you.

Comment 18 Juris Krumins 2009-07-16 12:17:01 UTC
Thank you Jiri for comment.
Can you please also specify rhel5 kernel version contains this patches ?

Thank you once again.

Comment 19 Juris Krumins 2009-07-17 13:43:19 UTC
Thanks. Already found 2.6.18-128.1.10.el5

Thank you.

Comment 20 Jiri Pirko 2009-07-20 10:03:06 UTC
(In reply to comment #19)
> Thanks. Already found 2.6.18-128.1.10.el5

I doubt that. This wasn't even proposed for z-stream. The patch is in the queue for rhel5.5 so it's not in any rhel kernel atm.

Jirka
> 
> Thank you.

Comment 21 Juris Krumins 2009-09-04 13:57:25 UTC
Any news about your patch. Maybe you can send me this patch over email, I'll try to test it on our systems. Or https://bugzilla.redhat.com/attachment.cgi?id=335216 is the final version of this patch ?

Comment 22 RHEL Program Management 2009-09-25 17:36:37 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 23 Don Zickus 2009-10-06 19:36:36 UTC
in kernel-2.6.18-168.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 25 IBM Bug Proxy 2009-10-12 22:30:52 UTC
------- Comment From linuxram.com 2009-10-12 18:26 EDT-------
> You can download this test kernel from http://people.redhat.com/dzickus/el5

The above kernel fixes the issue.

Comment 29 IBM Bug Proxy 2009-11-16 19:51:21 UTC
------- Comment From linuxram.com 2009-11-16 14:43 EDT-------
Applied the following patch to 2.6.18-164.el5 kernel, The problem disappears.

http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=5d4e039b2cb1ca4de9774344ea7b61ad7fa1b0a1

Can this patch be considered for zstream kernel please?

Comment 31 Perry Myers 2009-11-17 17:35:18 UTC
(In reply to comment #29)
> ------- Comment From linuxram.com 2009-11-16 14:43 EDT-------
> Applied the following patch to 2.6.18-164.el5 kernel, The problem disappears.
> 
> http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=5d4e039b2cb1ca4de9774344ea7b61ad7fa1b0a1
> 
> Can this patch be considered for zstream kernel please?  

Hi...  In conversation with one of the kernel developers (agospoda) the following was mentioned:

-----------
"Jiri's patch will resolve the case where a bridge simply contains a
balance-alb bond device.  Since the bridge interface will only send and
receive traffic based on it's own MAC address (which should be the same
as the bond), there will be no issues with the hosts connectivity
through the bridged device.

Problems arise when one uses balance-alb in a bridge and and expect it
to properly forward the traffic from another host that is connected to
another port on the host that is included in the bridge.  That is really
what 506770 is about.  The 'receive load balancing' part of balance-alb
(which is really what differentiates it from balance-tlb) is the problem
and one of the main reasons we suggest using balance-tlb instead of
balance-alb when the host is using bonding.

As you can hopefully see the patch in bug 487763 does fix a problem that
customers complain about, but I do not feel it is wise to place a
balance-alb mode bonding interface inside a bridge as the code is right
now."
------------

So it would be good to know from IBM what is expected from this fix.  I know IBM has tested this Jiri's patch and it seems to resolve the issue you currently have, but in the context of Andy's comments above is it solving enough of your bonding issues to be useful?

Comment 32 IBM Bug Proxy 2009-11-17 19:30:47 UTC
------- Comment From linuxram.com 2009-11-17 14:20 EDT-------
our use case does not require us to add one other port to the bridge to which the alb-bond is associated.

Hence I am ok with this patch since it should not effect us. But without this patch or any other solution, we will be handicapped.

Comment 33 Andy Gospodarek 2009-11-18 03:17:14 UTC
(In reply to comment #32)
> ------- Comment From linuxram.com 2009-11-17 14:20 EDT-------
> our use case does not require us to add one other port to the bridge to which
> the alb-bond is associated.
> 
> Hence I am ok with this patch since it should not effect us. But without this
> patch or any other solution, we will be handicapped.  

Can I ask why you are putting the bond in a bridge if there are no other ports in it?  I cannot come up with a compelling technical reason (spanning-tree usage doesn't seem that critical), so there must be something I'm missing.

Comment 34 IBM Bug Proxy 2009-11-18 16:21:23 UTC
------- Comment From linuxram.com 2009-11-18 11:10 EDT-------
the bond has two ports. Each port is connected through a different switch. So this gives redundancy if any one port fails.  Now the bond is in the bridge because the guest VMs use the bridge to reach the outside world.

Comment 35 Andy Gospodarek 2009-11-18 18:07:29 UTC
(In reply to comment #34)
> ------- Comment From linuxram.com 2009-11-18 11:10 EDT-------
> the bond has two ports. Each port is connected through a different switch. So
> this gives redundancy if any one port fails.  Now the bond is in the bridge
> because the guest VMs use the bridge to reach the outside world.  

The 'virt case' is exactly the problem that I was trying to address with bug 506770.  I observed connectivity problems between guests and external hosts if any of the hosts on the network did not send traffic for a period greater than the forwarding database age-time on the switch or as soon as the guest send broadcast traffic.  A simple 'arping -b <remote ip>' from the guest could break it.  This is a problem with balance-alb and balance-rr.

In a multi-switch configuration this might not be as much of an issue.  Especially if under normal circumstances, the switches are in different broadcast domains or somehow traffic doesn't flow between the two switches and cause forwarding database moves.

Comment 36 Jiri Pirko 2009-12-07 12:10:28 UTC
proposing for 5.4.z cause this solves customer's problem with bonging + VMs.

Comment 37 IBM Bug Proxy 2010-02-01 08:11:15 UTC
------- Comment From linuxram.com 2010-02-01 03:08 EDT-------
Will this bug be fixed in 5.4.z or 5.5 ?? What is the status of this bug?

Comment 38 Jiri Pirko 2010-02-01 08:58:05 UTC
(In reply to comment #37)
> ------- Comment From linuxram.com 2010-02-01 03:08 EDT-------
> Will this bug be fixed in 5.4.z or 5.5 ?? What is the status of this bug?    

Yes, the fix is included in 5.5 kernel. If it will be also in 5.4.z haven't been determined yet.

Comment 43 errata-xmlrpc 2010-03-30 07:43:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.