Hide Forgot
Description of problem: Given this network configuration: - Physical device 1 without hardware VLAN tagging (NETIF_F_HW_VLAN_TX) - Physical device 2 - Bonding device in active-backup mode, with devices 1 and 2 as slaves - VLAN device configured on top of the bonding device - IPv6 address configured on the VLAN (a link-local address is sufficient) Switching the active slave triggers a gratuitous neighbour advertisement for the VLAN. If the active slave is now device 1, the skb for the advertisement is generated incorrectly. This may also result in an 'oops' in skb_under_panic(). Version-Release number of selected component (if applicable): 2.6.18-164.el5 (and later) Bug was introduced by the patch linux-2.6-net-bonding-update-to-upstream-version-3-4-0.patch How reproducible: Always Steps to Reproduce: This requires an Ethernet device without VLAN tag insertion. In these commands, eth1 is assumed to be such a device and eth2 can be any other Ethernet device. # modprobe bonding miimon=100 mode=1 # modprobe ipv6 # echo +eth2 > /sys/class/net/bond0/bonding/slaves # echo +eth1 > /sys/class/net/bond0/bonding/slaves # ip link set bond0 up # vconfig add bond0 2 # ip link set bond0.2 up # echo eth1 > /sys/class/net/bond0/bonding/active_slave Actual results: Neighbour advertisement is sent with a VLAN tag inserted within the IPv6 header, not within the Ethernet header. In some cases (I'm not sure when) an 'oops' can occur in skb_under_panic() because the skb does not have space to insert the VLAN tag. Expected results: A correct VLAN-tagged neighbour advertisement is sent and the system continues to run. Additional info: The bug was fixed upstream in commit f88a4a9b65a6f3422b81be995535d0e69df11bb8. It cannot be fixed in the same way in earlier kernel versions, but the attached patch should be suitable as a workaround.
Created attachment 478321 [details] bonding/vlan: Avoid mangled NAs on slaves without VLAN tag insertion
Thanks, Ben. I will get this queued up for the next update.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
I've been looking at Ben's patch more closely and I think it looks good. Thanks again, Ben. The first hunk took me a bit longer to understand, but I realized that RHEL5 did not have the 'software emulation for vlan acceleration' patch[0] from 2.6.37, so it is necessary to drop the all vlan-tagged tagged NA when using hardware without support for TX VLAN acceleration. The only change I would consider making to this patch is a warning that the hardware is not recommended with bonding since it doesn't support vlan TX acceleration and ipv6. It might be nice for users to know why the NA are not coming out of the box. 0. commit 7b9c60903714bf0a19d746b228864bad3497284e Author: Jesse Gross <jesse> Date: Wed Oct 20 13:56:04 2010 +0000 vlan: Enable software emulation for vlan accleration.
This also appears to affect RHEL 6 as of kernel version 2.6.32-122.el6 (the version in 6.1 beta).
(In reply to comment #6) > This also appears to affect RHEL 6 as of kernel version 2.6.32-122.el6 (the > version in 6.1 beta). I suspect that is the case based on the vintage of the patches that appear to resolve this.
patch in comment#12 has been reviewed and deem to cause feature regression hence defer this issue to 5.9: http://post-office.corp.redhat.com/archives/rhkernel-list/2011-September/msg00544.html
I can't see comment 12. If you're concerned about the fact that I proposed to disable unsolicited NAs for devices without VLAN tag acceleration, this is not a feature regression since the feature never worked.
(In reply to comment #15) > I can't see comment 12. If you're concerned about the fact that I proposed to > disable unsolicited NAs for devices without VLAN tag acceleration, this is not > a feature regression since the feature never worked. Although I agree with you this is not regression, there was a pushback against your patch on our mailing list. DaveM's comment: "Handling the packet properly, in software, is the only proper way to resolve this." Not sure how exactly to do that nicely.
Hi Ben, I'm really sorry to say this, but I'm closing this bug without getting your patch included :(. It's based on comment c#16 and the fact that 5.11 is a release for small fixes, and probably won't get any big patches sucked in, especially with these kind of workarounds :(. If you *really* need it, I can try to push it, however the chances are really low... Upgrading (and fixing, if needed) RHEL6 would be the best approach for this... Thanks for understanding and sorry again. :-/