Bug 841983 - VLAN configured on top of a bonded interface (active-backup) does not failover
VLAN configured on top of a bonded interface (active-backup) does not failover
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Neil Horman
Liang Zheng
: ZStream
Depends On:
Blocks: 842429
  Show dependency treegraph
 
Reported: 2012-07-20 14:53 EDT by Neal Kim
Modified: 2016-09-23 11:15 EDT (History)
24 users (show)

See Also:
Fixed In Version: kernel-2.6.32-294.el6
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 01:42:12 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Network setup script (2.13 KB, application/octet-stream)
2012-07-20 14:56 EDT, Neal Kim
no flags Details
[PATCH] vlan: filter device events on bonds (5.17 KB, patch)
2012-07-20 16:03 EDT, Neil Horman
no flags Details | Diff

  None (edit)
Description Neal Kim 2012-07-20 14:53:27 EDT
Description of problem:

Bonding interface fails over (in active-backup) then the VLANs on top of it do not fail over as well.

Version-Release number of selected component (if applicable):
kernel-2.6.32-279.2.1.el6

How reproducible:

Always.

Steps to Reproduce:

* Configure a bonded interface, in active-backup Bonding mode, with 2 ethernets.
* Configure a VLAN on top of the bonded interface. Check that we can communicate with other devices on that VLAN.
* Disable the interface on the switch that *either* the active *or* the standby ethernets are connected to.
* Verify that traffic on the bonded interface still works - i.e. if we disabled the active ethernet then it has failed over.

* Observe that we can no longer communicate on the VLAN.
* Observe that "cat /sys/class/net/bond1.3091/operstate" returns "lowerlayerdown".

Actual results:

VLAN does not fail-over as expected.

Expected results:

VLAN fail-over successful.

Additional info:
Comment 1 Neal Kim 2012-07-20 14:56:51 EDT
Created attachment 599440 [details]
Network setup script
Comment 4 Neil Horman 2012-07-20 16:03:34 EDT
Created attachment 599456 [details]
[PATCH] vlan: filter device events on bonds


Since bond masters and slaves only have separate vlan groups now, the
vlan_device_event handler has to be taught to ignore network events from slave
devices when they're truly attached to the bond master.  We do this by looking
up the network device of a given vide on both the slave and its master.  if they
match, then we're processing an event for a physical device that we don't really
care about (since the masters events are realy what we're interested in.

This patch adds that comparison, and allows us to filter those slave events that
the vlan code should ignore.
---
 net/8021q/vlan.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)
Comment 5 Neil Horman 2012-07-20 16:04:26 EDT
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4629537

Brew build link for you.  Please test and report as to weather or not this corrects the reported problem
Comment 6 Neal Kim 2012-07-21 02:06:06 EDT
Good news!

Initial test results are looking good. Failing one interface results in the VLAN *not* going down.


Cheers,
Comment 7 Neal Kim 2012-07-21 02:35:45 EDT
I can confirm the same on my virtual setup as well. After disconnecting one of the virtual interfaces results in the operstate as "up":

[root@rhel63test ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          inet addr:192.168.2.200  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:2424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:782 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:181326 (177.0 KiB)  TX bytes:262852 (256.6 KiB)

bond0.10  Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          inet addr:192.168.2.175  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:168 (168.0 b)  TX bytes:746 (746.0 b)

eth0      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2140 errors:0 dropped:0 overruns:0 frame:0
          TX packets:782 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:164302 (160.4 KiB)  TX bytes:262852 (256.6 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:284 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:17024 (16.6 KiB)  TX bytes:0 (0.0 b)

[root@rhel63test ~]# uname -r
2.6.32-287.el6.test.x86_64
[root@rhel63test ~]# cat /sys/class/net/bond0.10/operstate 
up

Just in case, I also disconnected *both* virtual interfaces that are part of bond0, and confirmed bond0.10 operstate to be "lowerlayerdown". I then brought one virtual interface back up, thereby reactivating bond0, and could see bond0.10 operstate to be "up" as well.


Cheers,
Comment 8 Neil Horman 2012-07-21 07:06:02 EDT
ok, that is good news.  When bytemobile confirms the same, I'll post the patch.  I recommend that you, neal, flag this as a z-stream candidate as well.
Comment 9 Neil Horman 2012-07-21 15:12:50 EDT
Neal, quick note, please make sure to test the non-bonded case.  i.e. in addition to adding a vlan to a bonded interface, also test the case in which you add a vlan to a single physical interface.  Please make sure that, when the physical interface is taken down the operstate of the vlan transitions to lowerlayerdown.  I want to be sure this doesn't create any new regressions.
Comment 10 Neal Kim 2012-07-21 15:20:54 EDT
No problem Neil, that should be easy enough to test.
Comment 11 Neal Kim 2012-07-21 15:53:44 EDT
So far so good.

I configured a VLAN interface (eth1.20), verified the link status and VLAN operstate (eth1 in up/down state).

eth1      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:60  
          inet addr:192.168.2.223  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2449 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:149504 (146.0 KiB)  TX bytes:1742 (1.7 KiB)

eth1.20   Link encap:Ethernet  HWaddr 00:0C:29:8B:33:60  
          inet addr:192.168.2.180  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:720 (720.0 b)

[root@rhel63test ~]# ethtool eth1 | grep -i detected
	Link detected: yes

[root@rhel63test ~]# ethtool eth1.20 | grep -i detected
	Link detected: yes

[root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate 
up

+---------------------+
| Simulate Cable Pull |
+---------------------+

[root@rhel63test ~]# ethtool eth1 | grep -i detected
	Link detected: no

[root@rhel63test ~]# ethtool eth1.20 | grep -i detected
	Link detected: no

[root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate 
lowerlayerdown

I then reconnected the interfaces and the eth1.20 operstate reported as "up" (as expected). Nothing out of the ordinary recorded in dmesg either.
Comment 12 Neil Horman 2012-07-21 19:21:26 EDT
excellent, thank you.  Unless you object, I'll post this for review tomorrow (yes, sunday), so we can get acks monday.  I suggest you nominate this for z-stream, so we can get them a z-stream kernel asap.
Comment 13 RHEL Product and Program Management 2012-07-22 09:40:03 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 15 Liang Zheng 2012-07-23 22:46:43 EDT
Hi Neil,
I have a question for the failover event.What's the different between Cable Pull and shut down interface on switch to simulate failover event ?
Can I just shut down the interface on switch to sumulate the failover events?

Thank you.
Liang Zheng.
Comment 16 Neil Horman 2012-07-24 08:48:46 EDT
The real answer to that question often lies in the driver details.  For the purposes of this test I think the differences are largely irrelevant, but generally speaking, running ifdown will clear the IFF_UP flag from the interface before sending a carrier-off linkwatch event.  Just pulling the cable will only send the linkwatch event, without clearing the IFF_UP flag.  listeners for the event may behave different based on those differences.
Comment 17 Marcelo Giles 2012-07-24 11:17:05 EDT
(In reply to comment #12)

I will test the patched kernel in this environment that has 2 RHEL 6.3 kvm hosts using nic+bond+vlan+bridge and let you know it it fixes the issues that we have observed.

As a side note, we also have a RHEV 3 environment with the same network setup and RHEV-M fails to create the bonds using the vlan interfaces on RHEV-H 6.3 hypervisors. It works fine with RHEV-H 6.2 hypervisors.
Comment 18 Kapetanakis Giannis 2012-07-24 12:07:24 EDT
I've just tested patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff on top of 2.6.32-279.2.1 and works fine.

My setup is nics->bond->vlans->bridges and I had the same problem
after applying kernel 2.6.32-279.2.1

I've tested both ifup/ifdown as well as port disable on switch.

regards,

Giannis
Comment 20 Marcelo Giles 2012-07-26 08:50:30 EDT
(In reply to comment #18)
In the case I'm testing the problem affects NICs bonded using mode 4 (link aggregation). Should I open a separate BZ? Or maybe is already open?
Comment 22 Zhenjie Chen 2012-07-27 03:43:59 EDT
Hi,
I reproduce the bug in kernel 2.6.32-289 
and I also test in kernel 2.6.270, 2.6.279.5.1, 2.6.293, no this bug exist.
Comment 23 Kapetanakis Giannis 2012-08-01 05:07:53 EDT
Hi,

What's the status on this one? 
Is it fixed on any kernel publicly available?

thanx

Giannis
Comment 24 Suzanne Forsberg 2012-08-01 10:34:59 EDT
(In reply to comment #23)
> Hi,
> 
> What's the status on this one? 
> Is it fixed on any kernel publicly available?
> 
> thanx
> 
> Giannis

Hi,

Red Hat is working on a fix for this in an upcoming erratum for 6.3. We are targeting that release for mid-August (it is currently in test).

Regards,
- Sue
Comment 25 Jarod Wilson 2012-08-07 17:47:16 EDT
Patch(es) available on kernel-2.6.32-294.el6
Comment 28 Kapetanakis Giannis 2012-09-12 07:11:12 EDT
Problem seems to be solved in 2.6.32-279.5.2

I've seen that patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff in applied in source.
Comment 29 John Ronciak 2012-09-13 20:25:18 EDT
From the testing done by our validation people the above kernel does indeed fix the issue.  Sorry for the delay in getting this tested.
Comment 36 errata-xmlrpc 2013-02-21 01:42:12 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.