Bug 841983
Summary: | VLAN configured on top of a bonded interface (active-backup) does not failover | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Neal Kim <nkim> | ||||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Liang Zheng <lzheng> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 6.3 | CC: | ajb, bilias, cww, david, dhoward, fhrbata, gdurandv, gouyang, jcpunk, john.ronciak, kzhang, leiwang, lzheng, mgiles, mishu, ngalvin, nhorman, redhat-bugzilla, rik.theys, sforsber, sputhenp, toracat, vcojot, zhchen | ||||||
Target Milestone: | rc | Keywords: | ZStream | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-2.6.32-294.el6 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-02-21 06:42:12 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 842429 | ||||||||
Attachments: |
|
Description
Neal Kim
2012-07-20 18:53:27 UTC
Created attachment 599440 [details]
Network setup script
Created attachment 599456 [details]
[PATCH] vlan: filter device events on bonds
Since bond masters and slaves only have separate vlan groups now, the
vlan_device_event handler has to be taught to ignore network events from slave
devices when they're truly attached to the bond master. We do this by looking
up the network device of a given vide on both the slave and its master. if they
match, then we're processing an event for a physical device that we don't really
care about (since the masters events are realy what we're interested in.
This patch adds that comparison, and allows us to filter those slave events that
the vlan code should ignore.
---
net/8021q/vlan.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 64 insertions(+), 0 deletions(-)
http://brewweb.devel.redhat.com/brew/taskinfo?taskID=4629537 Brew build link for you. Please test and report as to weather or not this corrects the reported problem Good news! Initial test results are looking good. Failing one interface results in the VLAN *not* going down. Cheers, I can confirm the same on my virtual setup as well. After disconnecting one of the virtual interfaces results in the operstate as "up": [root@rhel63test ~]# ifconfig bond0 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 inet addr:192.168.2.200 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:2424 errors:0 dropped:0 overruns:0 frame:0 TX packets:782 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:181326 (177.0 KiB) TX bytes:262852 (256.6 KiB) bond0.10 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 inet addr:192.168.2.175 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3356/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:4 errors:0 dropped:0 overruns:0 frame:0 TX packets:11 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:168 (168.0 b) TX bytes:746 (746.0 b) eth0 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:2140 errors:0 dropped:0 overruns:0 frame:0 TX packets:782 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:164302 (160.4 KiB) TX bytes:262852 (256.6 KiB) eth1 Link encap:Ethernet HWaddr 00:0C:29:8B:33:56 UP BROADCAST SLAVE MULTICAST MTU:1500 Metric:1 RX packets:284 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:17024 (16.6 KiB) TX bytes:0 (0.0 b) [root@rhel63test ~]# uname -r 2.6.32-287.el6.test.x86_64 [root@rhel63test ~]# cat /sys/class/net/bond0.10/operstate up Just in case, I also disconnected *both* virtual interfaces that are part of bond0, and confirmed bond0.10 operstate to be "lowerlayerdown". I then brought one virtual interface back up, thereby reactivating bond0, and could see bond0.10 operstate to be "up" as well. Cheers, ok, that is good news. When bytemobile confirms the same, I'll post the patch. I recommend that you, neal, flag this as a z-stream candidate as well. Neal, quick note, please make sure to test the non-bonded case. i.e. in addition to adding a vlan to a bonded interface, also test the case in which you add a vlan to a single physical interface. Please make sure that, when the physical interface is taken down the operstate of the vlan transitions to lowerlayerdown. I want to be sure this doesn't create any new regressions. No problem Neil, that should be easy enough to test. So far so good. I configured a VLAN interface (eth1.20), verified the link status and VLAN operstate (eth1 in up/down state). eth1 Link encap:Ethernet HWaddr 00:0C:29:8B:33:60 inet addr:192.168.2.223 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2449 errors:0 dropped:0 overruns:0 frame:0 TX packets:23 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:149504 (146.0 KiB) TX bytes:1742 (1.7 KiB) eth1.20 Link encap:Ethernet HWaddr 00:0C:29:8B:33:60 inet addr:192.168.2.180 Bcast:192.168.2.255 Mask:255.255.255.0 inet6 addr: fe80::20c:29ff:fe8b:3360/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:12 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:720 (720.0 b) [root@rhel63test ~]# ethtool eth1 | grep -i detected Link detected: yes [root@rhel63test ~]# ethtool eth1.20 | grep -i detected Link detected: yes [root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate up +---------------------+ | Simulate Cable Pull | +---------------------+ [root@rhel63test ~]# ethtool eth1 | grep -i detected Link detected: no [root@rhel63test ~]# ethtool eth1.20 | grep -i detected Link detected: no [root@rhel63test ~]# cat /sys/class/net/eth1.20/operstate lowerlayerdown I then reconnected the interfaces and the eth1.20 operstate reported as "up" (as expected). Nothing out of the ordinary recorded in dmesg either. excellent, thank you. Unless you object, I'll post this for review tomorrow (yes, sunday), so we can get acks monday. I suggest you nominate this for z-stream, so we can get them a z-stream kernel asap. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Hi Neil, I have a question for the failover event.What's the different between Cable Pull and shut down interface on switch to simulate failover event ? Can I just shut down the interface on switch to sumulate the failover events? Thank you. Liang Zheng. The real answer to that question often lies in the driver details. For the purposes of this test I think the differences are largely irrelevant, but generally speaking, running ifdown will clear the IFF_UP flag from the interface before sending a carrier-off linkwatch event. Just pulling the cable will only send the linkwatch event, without clearing the IFF_UP flag. listeners for the event may behave different based on those differences. (In reply to comment #12) I will test the patched kernel in this environment that has 2 RHEL 6.3 kvm hosts using nic+bond+vlan+bridge and let you know it it fixes the issues that we have observed. As a side note, we also have a RHEV 3 environment with the same network setup and RHEV-M fails to create the bonds using the vlan interfaces on RHEV-H 6.3 hypervisors. It works fine with RHEV-H 6.2 hypervisors. I've just tested patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff on top of 2.6.32-279.2.1 and works fine. My setup is nics->bond->vlans->bridges and I had the same problem after applying kernel 2.6.32-279.2.1 I've tested both ifup/ifdown as well as port disable on switch. regards, Giannis (In reply to comment #18) In the case I'm testing the problem affects NICs bonded using mode 4 (link aggregation). Should I open a separate BZ? Or maybe is already open? Hi, I reproduce the bug in kernel 2.6.32-289 and I also test in kernel 2.6.270, 2.6.279.5.1, 2.6.293, no this bug exist. Hi, What's the status on this one? Is it fixed on any kernel publicly available? thanx Giannis (In reply to comment #23) > Hi, > > What's the status on this one? > Is it fixed on any kernel publicly available? > > thanx > > Giannis Hi, Red Hat is working on a fix for this in an upcoming erratum for 6.3. We are targeting that release for mid-August (it is currently in test). Regards, - Sue Patch(es) available on kernel-2.6.32-294.el6 Problem seems to be solved in 2.6.32-279.5.2 I've seen that patch https://bugzilla.redhat.com/attachment.cgi?id=599456&action=diff in applied in source. From the testing done by our validation people the above kernel does indeed fix the issue. Sorry for the delay in getting this tested. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0496.html |