|Summary:||VLAN configured on top of a bonded interface (active-backup) does not failover|
|Product:||Red Hat Enterprise Linux 6||Reporter:||Neal Kim <nkim>|
|Component:||kernel||Assignee:||Neil Horman <nhorman>|
|Status:||CLOSED ERRATA||QA Contact:||Liang Zheng <lzheng>|
|Version:||6.3||CC:||ajb, bilias, cww, david, dhoward, fhrbata, gdurandv, gouyang, jcpunk, john.ronciak, kzhang, leiwang, lzheng, mgiles, mishu, ngalvin, nhorman, redhat-bugzilla, rik.theys, sforsber, sputhenp, toracat, vcojot, zhchen|
|Fixed In Version:||kernel-2.6.32-294.el6||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2013-02-21 06:42:12 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
Description Neal Kim 2012-07-20 18:53:27 UTC
Description of problem: Bonding interface fails over (in active-backup) then the VLANs on top of it do not fail over as well. Version-Release number of selected component (if applicable): kernel-2.6.32-279.2.1.el6 How reproducible: Always. Steps to Reproduce: * Configure a bonded interface, in active-backup Bonding mode, with 2 ethernets. * Configure a VLAN on top of the bonded interface. Check that we can communicate with other devices on that VLAN. * Disable the interface on the switch that *either* the active *or* the standby ethernets are connected to. * Verify that traffic on the bonded interface still works - i.e. if we disabled the active ethernet then it has failed over. * Observe that we can no longer communicate on the VLAN. * Observe that "cat /sys/class/net/bond1.3091/operstate" returns "lowerlayerdown". Actual results: VLAN does not fail-over as expected. Expected results: VLAN fail-over successful. Additional info:
Comment 4 Neil Horman 2012-07-20 20:03:34 UTC
Created attachment 599456 [details] [PATCH] vlan: filter device events on bonds Since bond masters and slaves only have separate vlan groups now, the vlan_device_event handler has to be taught to ignore network events from slave devices when they're truly attached to the bond master. We do this by looking up the network device of a given vide on both the slave and its master. if they match, then we're processing an event for a physical device that we don't really care about (since the masters events are realy what we're interested in. This patch adds that comparison, and allows us to filter those slave events that the vlan code should ignore. --- net/8021q/vlan.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 64 insertions(+), 0 deletions(-)
Comment 5 Neil Horman 2012-07-20 20:04:26 UTC
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4629537 Brew build link for you. Please test and report as to weather or not this corrects the reported problem
Comment 6 Neal Kim 2012-07-21 06:06:06 UTC
Good news! Initial test results are looking good. Failing one interface results in the VLAN *not* going down. Cheers,
Comment 7 Neal Kim 2012-07-21 06:35:45 UTC
I can confirm the same on my virtual setup as well. After disconnecting one of the virtual interfaces results in the operstate as "up": [root@rhel63test ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 inet addr:192.168.2.200 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:2424 errors:0 dropped:0 overruns:0 frame:0 TX packets:782 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:181326 (177.0 KiB) TX bytes:262852 (256.6 KiB) bond0.10 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 inet addr:192.168.2.175 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:168 (168.0 b) TX bytes:746 (746.0 b) eth0 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2140 errors:0 dropped:0 overruns:0 frame:0 TX packets:782 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:164302 (160.4 KiB) TX bytes:262852 (256.6 KiB) eth1 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1 RX packets:284 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:17024 (16.6 KiB) TX bytes:0 (0.0 b) [root@rhel63test ~]# uname -r 2.6.32-287.el6.test.x86_64 [root@rhel63test ~]# cat /sys/class/net/bond0.10/operstate up Just in case, I also disconnected *both* virtual interfaces that are part of bond0, and confirmed bond0.10 operstate to be "lowerlayerdown". I then brought one virtual interface back up, thereby reactivating bond0, and could see bond0.10 operstate to be "up" as well. Cheers,
Comment 8 Neil Horman 2012-07-21 11:06:02 UTC
ok, that is good news. When bytemobile confirms the same, I'll post the patch. I recommend that you, neal, flag this as a z-stream candidate as well.
Comment 9 Neil Horman 2012-07-21 19:12:50 UTC
Neal, quick note, please make sure to test the non-bonded case. i.e. in addition to adding a vlan to a bonded interface, also test the case in which you add a vlan to a single physical interface. Please make sure that, when the physical interface is taken down the operstate of the vlan transitions to lowerlayerdown. I want to be sure this doesn't create any new regressions.
Comment 10 Neal Kim 2012-07-21 19:20:54 UTC
No problem Neil, that should be easy enough to test.
Comment 11 Neal Kim 2012-07-21 19:53:44 UTC
So far so good. I configured a VLAN interface (eth1.20), verified the link status and VLAN operstate (eth1 in up/down state). eth1 Link encap:Ethernet HWaddr 00:0C:29:8B:33:60 inet addr:192.168.2.223 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2449 errors:0 dropped:0 overruns:0 frame:0 TX packets:23 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:149504 (146.0 KiB) TX bytes:1742 (1.7 KiB) eth1.20 Link encap:Ethernet HWaddr 00:0C:29:8B:33:60 inet addr:192.168.2.180 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:720 (720.0 b) [root@rhel63test ~]# ethtool eth1 | grep -i detected Link detected: yes [root@rhel63test ~]# ethtool eth1.20 | grep -i detected Link detected: yes [root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate up +---------------------+ | Simulate Cable Pull | +---------------------+ [root@rhel63test ~]# ethtool eth1 | grep -i detected Link detected: no [root@rhel63test ~]# ethtool eth1.20 | grep -i detected Link detected: no [root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate lowerlayerdown I then reconnected the interfaces and the eth1.20 operstate reported as "up" (as expected). Nothing out of the ordinary recorded in dmesg either.
Comment 12 Neil Horman 2012-07-21 23:21:26 UTC
excellent, thank you. Unless you object, I'll post this for review tomorrow (yes, sunday), so we can get acks monday. I suggest you nominate this for z-stream, so we can get them a z-stream kernel asap.
Comment 13 RHEL Program Management 2012-07-22 13:40:03 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Comment 15 Liang Zheng 2012-07-24 02:46:43 UTC
Hi Neil, I have a question for the failover event.What's the different between Cable Pull and shut down interface on switch to simulate failover event ? Can I just shut down the interface on switch to sumulate the failover events? Thank you. Liang Zheng.
Comment 16 Neil Horman 2012-07-24 12:48:46 UTC
The real answer to that question often lies in the driver details. For the purposes of this test I think the differences are largely irrelevant, but generally speaking, running ifdown will clear the IFF_UP flag from the interface before sending a carrier-off linkwatch event. Just pulling the cable will only send the linkwatch event, without clearing the IFF_UP flag. listeners for the event may behave different based on those differences.
Comment 17 Marcelo Giles 2012-07-24 15:17:05 UTC
(In reply to comment #12) I will test the patched kernel in this environment that has 2 RHEL 6.3 kvm hosts using nic+bond+vlan+bridge and let you know it it fixes the issues that we have observed. As a side note, we also have a RHEV 3 environment with the same network setup and RHEV-M fails to create the bonds using the vlan interfaces on RHEV-H 6.3 hypervisors. It works fine with RHEV-H 6.2 hypervisors.
Comment 18 Kapetanakis Giannis 2012-07-24 16:07:24 UTC
I've just tested patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff on top of 2.6.32-279.2.1 and works fine. My setup is nics->bond->vlans->bridges and I had the same problem after applying kernel 2.6.32-279.2.1 I've tested both ifup/ifdown as well as port disable on switch. regards, Giannis
Comment 20 Marcelo Giles 2012-07-26 12:50:30 UTC
(In reply to comment #18) In the case I'm testing the problem affects NICs bonded using mode 4 (link aggregation). Should I open a separate BZ? Or maybe is already open?
Comment 22 Zhenjie Chen 2012-07-27 07:43:59 UTC
Hi, I reproduce the bug in kernel 2.6.32-289 and I also test in kernel 2.6.270, 2.6.279.5.1, 2.6.293, no this bug exist.
Comment 23 Kapetanakis Giannis 2012-08-01 09:07:53 UTC
Hi, What's the status on this one? Is it fixed on any kernel publicly available? thanx Giannis
Comment 24 Suzanne Forsberg 2012-08-01 14:34:59 UTC
(In reply to comment #23) > Hi, > > What's the status on this one? > Is it fixed on any kernel publicly available? > > thanx > > Giannis Hi, Red Hat is working on a fix for this in an upcoming erratum for 6.3. We are targeting that release for mid-August (it is currently in test). Regards, - Sue
Comment 25 Jarod Wilson 2012-08-07 21:47:16 UTC
Patch(es) available on kernel-2.6.32-294.el6
Comment 28 Kapetanakis Giannis 2012-09-12 11:11:12 UTC
Problem seems to be solved in 2.6.32-279.5.2 I've seen that patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff in applied in source.
Comment 29 John Ronciak 2012-09-14 00:25:18 UTC
From the testing done by our validation people the above kernel does indeed fix the issue. Sorry for the delay in getting this tested.
Comment 36 errata-xmlrpc 2013-02-21 06:42:12 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0496.html