Bug 1067802
| Summary: | ixgbe in SR-IOV mode does not respect unicast promiscuous mode in internal switch | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Gibson <dgibson> |
| Component: | kernel | Assignee: | Nikolay Aleksandrov <naleksan> |
| Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.5 | CC: | alex.williamson, dgibson, john.ronciak, mst, nhorman, peterm, rhod, shawn.kennedy, tbowling, toracat, vyasevic |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-07-08 16:00:54 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Gibson
2014-02-21 05:53:35 UTC
I suspect the VLAN spoof checking that exists in the hardware (and is enabled by default in RHEL6) is the issue here. Unfortunately there is not an easy way to disable this for individual VFs.
Do you have a local reproducer for this? If so, you could consider this patch as a test to see if this resolves the issue:
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 6c449e7..87a1e3a 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -3348,7 +3348,7 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
/* Enable MAC Anti-Spoofing */
hw->mac.ops.set_mac_anti_spoofing(hw,
(adapter->antispoofing_enabled =
- (adapter->num_vfs != 0)),
+ false),
adapter->num_vfs);
}
Andy,
* Yes, I have a local reproduce, although that's on borrowed hardware that I might not have for much longer.
* I don't think this is plausibly caused by the anti-spoofing:
- guest0 (with the VF) isn't trying to send or receive packets for any VLAN it's not assigned to, nor any MAC other than the one it's assigned.
- As noted, guest0 *can* ping guest1 if it's located on a different host. So, the NIC is willing to send the packets out the physical interface, and send the replies from an external source back to the VF. It just doesn't loopback the unicast packets from the VF to the PF.
- The unicast packets that aren't getting through *are* addressed to a MAC that isn't the host's normal MAC (it's the MAC of guest1 on the bridge). So they would usually be filtered, but the unicast promiscuous mode bit (which the bridge code enables) should allow them to come through.
- Again, since this works with guest1 on another host, the card correctly receives the guest1 destined packets on the PF if they come from externally, just not if they come from a local VF.
* I'll attempt to test your patch anyway, but as above, I don't think it will help.
There is a discussion on what appears to be the same problem at https://communities.intel.com/thread/38613 However, it doesn't quite make sense to me. It implies this is a limitation in the driver which has been fixed upstream, however as described above the real problem seems to be that the internal switch routing logic in the firmware doesn't check individual VFs or PFs unicast promiscuous mode. Reading that I can think of two possible workarounds: 1) Put the NIC into VEPA mode, if you have a VEPA capable switch. The driver will then tell the NIC to make no attempt to forward packets between VFs, instead all packets will go to the external switch which is expected to hairpin them back. AFAICT VEPA mode is available upstream, but not in RHEL 6.5 2) Manually add MAC addresses for any interfaces on the Linux bridge to the MAC filter on the PF. AFAICT there isn't a userspace way of doing this, however, in either upstream or RHEL. The parts of that thread about manually adding VF addresses to the bridge forwarding database make no sense to me - the problem isn't the Linux bridge forward to the wrong places, it's that the ixgbe internal switch doesn't present unicast packets from the VF to the PF unless they match the PF's mac, even though it is in promiscuous mode. My mistake, there is a way of using workaround (2) above in RHEL.
Assuming the ixgbe PF is ethX, then for each MAC address XX:XX:XX:XX:XX:XX used by a VM on the Linux bridge, run:
# ip link add macXXXXXXXXXXXX link ethX type macvlan
# ip link set macXXXXXXXXXXXX address XX:XX:XX:XX:XX:XX up
This forces the VM's MAC address onto the PF's unicast MAC filter, allowing packets from the VF to be received by the PF and thereby forwarded onto the bridged VM.
Exactly what "firmware" are you talking about. The Intel 10 gig HW does not have firmware. So I can't follow what you are saying about this issue. > The Intel 10 gig HW does not have firmware. Really? I hard it hard to believe all the card's complex features are implemented without any firmware at all. But I guess if the firmware isn't updatable, then there's nothing we can do. Do you have any idea what sort of driver side fix was envisaged in the discussion at https://communities.intel.com/thread/38613. As far as I can tell this is a hardware erratum, and the only way to fix it in the driver is with a very ugly workaround to detect with the interface is bridged, monitor MACs learned by the bridge and automatically add them to the hardware's MAC filter. (In reply to David Gibson from comment #6) > > The parts of that thread about manually adding VF addresses to the bridge > forwarding database make no sense to me - the problem isn't the Linux bridge > forward to the wrong places, it's that the ixgbe internal switch doesn't > present unicast packets from the VF to the PF unless they match the PF's > mac, even though it is in promiscuous mode. I can shed some light on this for you. The above workaround has to do with how the /sbin/bridge command operates. It has 2 modes of operation: 1) Operate on a master device 2) Operate on the specified device itself. The default mode of operation is 2 (device itself). So when you issue a command: bridge fdb add XX:XX:XX:XX:XX:XX dev eth0 you are actually adding the mac address the eth0 MAC filter table, similar to what macvlan does. This happens even if eth0 is a bridge port. This functionality is not in rhel6. -vlad Vlad, thanks for that clarification. Andy, As noted in pasing in c#9, I think it's at least theoretically possible to make this an automated workaround in the driver, by monitoring the bridge forwarding db and adjusting the MAC filter accordingly. That approach would certainly be ugly, but it's the only way I can see to work around this hardware bug. Does that method seem at all feasible to you? (In reply to David Gibson from comment #11) > Vlad, thanks for that clarification. > > Andy, > > As noted in pasing in c#9, I think it's at least theoretically possible to > make this an automated workaround in the driver, by monitoring the bridge > forwarding db and adjusting the MAC filter accordingly. > > That approach would certainly be ugly, but it's the only way I can see to > work around this hardware bug. Does that method seem at all feasible to you? Hi David This approach has been considered upstream and rejected. The currently proposed solution involves libvirt monitoring the configuration of the guest and programming things appropriately. That's been deferred until 7.1. -vlad What is the status of this related to Vlad's comment 12? Is there another BZ we should track for this libvirt monitoring? Is it still on track for 7.1? Do we need this BZ updated to track 7.1? After talking to Vlad, he pointed me at these two bugzillas: https://bugzilla.redhat.com/show_bug.cgi?id=896669 (kernel) https://bugzilla.redhat.com/show_bug.cgi?id=1099210 (user-space) The kernel fixes for this have been rejected in upstream, so we're left relying on the user-space fix, thus I propose to close this bugzilla as WONTFIX and to continue monitoring the user-space fix. What do you think ? Is there another choice? I don't think there is. Closing as WONTFIX and going forward with the user-space solution mentioned in comment #14 |