841983 – VLAN configured on top of a bonded interface (active-backup) does not failover

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 841983 - VLAN configured on top of a bonded interface (active-backup) does not failover

Summary: VLAN configured on top of a bonded interface (active-backup) does not failover

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	Neil Horman
QA Contact:	Liang Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	842429
TreeView+	depends on / blocked

Reported:	2012-07-20 18:53 UTC by Neal Kim
Modified:	2018-12-05 15:35 UTC (History)
CC List:	24 users (show)
Fixed In Version:	kernel-2.6.32-294.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-02-21 06:42:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Network setup script (2.13 KB, application/octet-stream) 2012-07-20 18:56 UTC, Neal Kim	no flags	Details
[PATCH] vlan: filter device events on bonds (5.17 KB, patch) 2012-07-20 20:03 UTC, Neil Horman	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:0496	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 6 kernel update	2013-02-20 21:40:54 UTC

Description Neal Kim 2012-07-20 18:53:27 UTC

Description of problem:

Bonding interface fails over (in active-backup) then the VLANs on top of it do not fail over as well.

Version-Release number of selected component (if applicable):
kernel-2.6.32-279.2.1.el6

How reproducible:

Always.

Steps to Reproduce:

* Configure a bonded interface, in active-backup Bonding mode, with 2 ethernets.
* Configure a VLAN on top of the bonded interface. Check that we can communicate with other devices on that VLAN.
* Disable the interface on the switch that *either* the active *or* the standby ethernets are connected to.
* Verify that traffic on the bonded interface still works - i.e. if we disabled the active ethernet then it has failed over.

* Observe that we can no longer communicate on the VLAN.
* Observe that "cat /sys/class/net/bond1.3091/operstate" returns "lowerlayerdown".

Actual results:

VLAN does not fail-over as expected.

Expected results:

VLAN fail-over successful.

Additional info:

Comment 1 Neal Kim 2012-07-20 18:56:51 UTC

Created attachment 599440 [details]
Network setup script

Comment 4 Neil Horman 2012-07-20 20:03:34 UTC

Created attachment 599456 [details]
[PATCH] vlan: filter device events on bonds


Since bond masters and slaves only have separate vlan groups now, the
vlan_device_event handler has to be taught to ignore network events from slave
devices when they're truly attached to the bond master.  We do this by looking
up the network device of a given vide on both the slave and its master.  if they
match, then we're processing an event for a physical device that we don't really
care about (since the masters events are realy what we're interested in.

This patch adds that comparison, and allows us to filter those slave events that
the vlan code should ignore.
---
 net/8021q/vlan.c |   64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 64 insertions(+), 0 deletions(-)

Comment 5 Neil Horman 2012-07-20 20:04:26 UTC

http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4629537

Brew build link for you.  Please test and report as to weather or not this corrects the reported problem

Comment 6 Neal Kim 2012-07-21 06:06:06 UTC

Good news!

Initial test results are looking good. Failing one interface results in the VLAN *not* going down.


Cheers,

Comment 7 Neal Kim 2012-07-21 06:35:45 UTC

I can confirm the same on my virtual setup as well. After disconnecting one of the virtual interfaces results in the operstate as "up":

[root@rhel63test ~]# ifconfig
bond0     Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          inet addr:192.168.2.200  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:2424 errors:0 dropped:0 overruns:0 frame:0
          TX packets:782 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:181326 (177.0 KiB)  TX bytes:262852 (256.6 KiB)

bond0.10  Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          inet addr:192.168.2.175  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:4 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:168 (168.0 b)  TX bytes:746 (746.0 b)

eth0      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:2140 errors:0 dropped:0 overruns:0 frame:0
          TX packets:782 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:164302 (160.4 KiB)  TX bytes:262852 (256.6 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:56  
          UP BROADCAST SLAVE MULTICAST  MTU:1500  Metric:1
          RX packets:284 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:17024 (16.6 KiB)  TX bytes:0 (0.0 b)

[root@rhel63test ~]# uname -r
2.6.32-287.el6.test.x86_64
[root@rhel63test ~]# cat /sys/class/net/bond0.10/operstate 
up

Just in case, I also disconnected *both* virtual interfaces that are part of bond0, and confirmed bond0.10 operstate to be "lowerlayerdown". I then brought one virtual interface back up, thereby reactivating bond0, and could see bond0.10 operstate to be "up" as well.


Cheers,

Comment 8 Neil Horman 2012-07-21 11:06:02 UTC

ok, that is good news.  When bytemobile confirms the same, I'll post the patch.  I recommend that you, neal, flag this as a z-stream candidate as well.

Comment 9 Neil Horman 2012-07-21 19:12:50 UTC

Neal, quick note, please make sure to test the non-bonded case.  i.e. in addition to adding a vlan to a bonded interface, also test the case in which you add a vlan to a single physical interface.  Please make sure that, when the physical interface is taken down the operstate of the vlan transitions to lowerlayerdown.  I want to be sure this doesn't create any new regressions.

Comment 10 Neal Kim 2012-07-21 19:20:54 UTC

No problem Neil, that should be easy enough to test.

Comment 11 Neal Kim 2012-07-21 19:53:44 UTC

So far so good.

I configured a VLAN interface (eth1.20), verified the link status and VLAN operstate (eth1 in up/down state).

eth1      Link encap:Ethernet  HWaddr 00:0C:29:8B:33:60  
          inet addr:192.168.2.223  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2449 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:149504 (146.0 KiB)  TX bytes:1742 (1.7 KiB)

eth1.20   Link encap:Ethernet  HWaddr 00:0C:29:8B:33:60  
          inet addr:192.168.2.180  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:720 (720.0 b)

[root@rhel63test ~]# ethtool eth1 | grep -i detected
	Link detected: yes

[root@rhel63test ~]# ethtool eth1.20 | grep -i detected
	Link detected: yes

[root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate 
up

+---------------------+
| Simulate Cable Pull |
+---------------------+

[root@rhel63test ~]# ethtool eth1 | grep -i detected
	Link detected: no

[root@rhel63test ~]# ethtool eth1.20 | grep -i detected
	Link detected: no

[root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate 
lowerlayerdown

I then reconnected the interfaces and the eth1.20 operstate reported as "up" (as expected). Nothing out of the ordinary recorded in dmesg either.

Comment 12 Neil Horman 2012-07-21 23:21:26 UTC

excellent, thank you.  Unless you object, I'll post this for review tomorrow (yes, sunday), so we can get acks monday.  I suggest you nominate this for z-stream, so we can get them a z-stream kernel asap.

Comment 13 RHEL Program Management 2012-07-22 13:40:03 UTC

This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 15 Liang Zheng 2012-07-24 02:46:43 UTC

Hi Neil,
I have a question for the failover event.What's the different between Cable Pull and shut down interface on switch to simulate failover event ?
Can I just shut down the interface on switch to sumulate the failover events?

Thank you.
Liang Zheng.

Comment 16 Neil Horman 2012-07-24 12:48:46 UTC

The real answer to that question often lies in the driver details.  For the purposes of this test I think the differences are largely irrelevant, but generally speaking, running ifdown will clear the IFF_UP flag from the interface before sending a carrier-off linkwatch event.  Just pulling the cable will only send the linkwatch event, without clearing the IFF_UP flag.  listeners for the event may behave different based on those differences.

Comment 17 Marcelo Giles 2012-07-24 15:17:05 UTC

(In reply to comment #12)

I will test the patched kernel in this environment that has 2 RHEL 6.3 kvm hosts using nic+bond+vlan+bridge and let you know it it fixes the issues that we have observed.

As a side note, we also have a RHEV 3 environment with the same network setup and RHEV-M fails to create the bonds using the vlan interfaces on RHEV-H 6.3 hypervisors. It works fine with RHEV-H 6.2 hypervisors.

Comment 18 Kapetanakis Giannis 2012-07-24 16:07:24 UTC

I've just tested patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff on top of 2.6.32-279.2.1 and works fine.

My setup is nics->bond->vlans->bridges and I had the same problem
after applying kernel 2.6.32-279.2.1

I've tested both ifup/ifdown as well as port disable on switch.

regards,

Giannis

Comment 20 Marcelo Giles 2012-07-26 12:50:30 UTC

(In reply to comment #18)
In the case I'm testing the problem affects NICs bonded using mode 4 (link aggregation). Should I open a separate BZ? Or maybe is already open?

Comment 22 Zhenjie Chen 2012-07-27 07:43:59 UTC

Hi,
I reproduce the bug in kernel 2.6.32-289 
and I also test in kernel 2.6.270, 2.6.279.5.1, 2.6.293, no this bug exist.

Comment 23 Kapetanakis Giannis 2012-08-01 09:07:53 UTC

Hi,

What's the status on this one? 
Is it fixed on any kernel publicly available?

thanx

Giannis

Comment 24 Suzanne Forsberg 2012-08-01 14:34:59 UTC

(In reply to comment #23)
> Hi,
> 
> What's the status on this one? 
> Is it fixed on any kernel publicly available?
> 
> thanx
> 
> Giannis

Hi,

Red Hat is working on a fix for this in an upcoming erratum for 6.3. We are targeting that release for mid-August (it is currently in test).

Regards,
- Sue

Comment 25 Jarod Wilson 2012-08-07 21:47:16 UTC

Patch(es) available on kernel-2.6.32-294.el6

Comment 28 Kapetanakis Giannis 2012-09-12 11:11:12 UTC

Problem seems to be solved in 2.6.32-279.5.2

I've seen that patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff in applied in source.

Comment 29 John Ronciak 2012-09-14 00:25:18 UTC

From the testing done by our validation people the above kernel does indeed fix the issue.  Sorry for the delay in getting this tested.

Comment 36 errata-xmlrpc 2013-02-21 06:42:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.

ajb
bilias
cww
david
dhoward
fhrbata
gdurandv
gouyang
jcpunk
john.ronciak
kzhang
leiwang
lzheng
mgiles
mishu
ngalvin
nhorman
redhat-bugzilla
rik.theys
sforsber
sputhenp
toracat
vcojot
zhchen