Bug 1059532
| Summary: | ixgbe virtual function driver advertises vlan tagging offload when unavailable due to HV configuration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | David Gibson <dgibson> |
| Component: | kernel | Assignee: | Nikolay Aleksandrov <naleksan> |
| Status: | CLOSED WONTFIX | QA Contact: | Network QE <network-qe> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 6.5 | CC: | alex.williamson, aquini, ccui, jogreene, john.ronciak, mst, naleksan, peterm, rhod, tbowling, vyasevic, ypei |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-08-12 13:50:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 994246 | ||
| Attachments: | |||
|
Description
David Gibson
2014-01-30 05:18:32 UTC
*** Bug 1059531 has been marked as a duplicate of this bug. *** Actually, having looked at the Intel 82599 datasheet (http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html) it appears that when a VF is configured to have a VLAN tag inserted, any attempt by the guest using the VF to set a tag - either by hardware or software - will be rejected. So, if the hypervisor has set a VLAN tag to be inserted, the ixgbevf driver should advertise NETIF_F_VLAN_CHALLENGED instead of the other VLAN features. It would be nice if the hardware had a setting to treat tags set by the guest as inner tags, and emit QinQ frames, but it seems it doesn't. Nik, can you take a look at this for the customer? I can confirm the issue with upstream kernel as well, the upstream ixgbevf driver also advertises hw vlan offloading. With <vlan> specified: [ 51.329686] 8021q: 802.1Q VLAN Support v1.8 [ 51.330516] 8021q: adding VLAN 0 to HW filter on device ens6 [ 51.347735] ixgbevf 0000:00:06.0: Last Request of type 04 to PF Nacked [ 51.348787] 8021q: adding VLAN 0 to HW filter on device eth1 [ 51.366503] ixgbevf 0000:00:06.0: Last Request of type 04 to PF Nacked ERROR: trying to add VLAN #100 to IF -:ens6:- error: Permission denied Without <vlan> specified it works. Nikolay, When you say upstream, do you mean RHEL 7, Fedora, or linux.org? Could you clarify what driver version of ixgbe and ixgbevf you are seeing for your output of cmt 5? (In reply to Terry Bowling from comment #6) > Nikolay, > > When you say upstream, do you mean RHEL 7, Fedora, or linux.org? > > Could you clarify what driver version of ixgbe and ixgbevf you are seeing > for your output of cmt 5? I tested with current upstream version (Dave Miller's net tree) which is the latest kernel with all fixes. It is actually normal, I could predict that would happen looking at current ixgbevf code as it advertises the same offloads. I'll talk to Andy and we'll probably have to contact Intel about this. I'll let you know as soon as there's some development on the issue. There is no customer specific information in this case, so I'm making it public. >Yes, unfortunately I read the same over the weekend. I think this should be an >option to be configured on the host part, but it would probably require libvirt >extension as well to support it after that so it can be enabled on per VM/VF >basis. Um..? I thought we just concluded from the datasheet that this isn't possible, regardless of what we do in the driver or libvirt. If the host is inserting tags, then it's ignoring tags from the guest. All we could do is lobby Intel to add this feature to the firmware. >I tested this, they don't stack but override, in order to stack a different >configuration of the host must be done where double tagging is enabled which >changes a lot of stuff. As I thought. For the double tagging are you talking about "Double VLAN mode" as described in §7.4.5 of the datasheet? My understanding from the description was that this wasn't useful for our purposes, because the outer VLAN it specifies is applied across the entire NIC, so there's no way we could restrict it to a single VM/VF. >We could, but I don't see what we win since the anti-spoofing is still there in >this BZ's case, so we won't be able to Tx any different tag than the configured. Sorry, that was more for the benefit of bug 1069028 than for this one. >IMO we can close this as NOTABUG. What do you think ? No, we certainly can't close as NOTABUG - check the original description of this bug. The firmware may not do what we really want, but we should still advertise our capabilities properly in the driver. When a host-applied VLAN is specified, we should be advertising NETIF_F_VLAN_CHALLENGED, instead of the vlan offload capabilities. Oh, btw, the customer would also like an actual statement from Intel confirming that guest/host QinQ really isn't possible. Andy, Nik, do you know who the right contacts would be to try and get that? (In reply to David Gibson from comment #14) > >Yes, unfortunately I read the same over the weekend. I think this should be an > >option to be configured on the host part, but it would probably require libvirt > >extension as well to support it after that so it can be enabled on per VM/VF >basis. > > Um..? I thought we just concluded from the datasheet that this isn't > possible, regardless of what we do in the driver or libvirt. If the host is > inserting tags, then it's ignoring tags from the guest. All we could do is > lobby Intel to add this feature to the firmware. > Yes, it is, but with double VLAN mode it won't ignore them just leave them as the "outer" tags, and apply the configured VLAN as the "inner" tag according to the datasheet but that would be global. > >I tested this, they don't stack but override, in order to stack a different > >configuration of the host must be done where double tagging is enabled which > >changes a lot of stuff. > > As I thought. For the double tagging are you talking about "Double VLAN > mode" as described in §7.4.5 of the datasheet? My understanding from the > description was that this wasn't useful for our purposes, because the outer > VLAN it specifies is applied across the entire NIC, so there's no way we > could restrict it to a single VM/VF. > Right, that's why I wrote it was global. > >We could, but I don't see what we win since the anti-spoofing is still there in > >this BZ's case, so we won't be able to Tx any different tag than the configured. > > Sorry, that was more for the benefit of bug 1069028 than for this one. > > >IMO we can close this as NOTABUG. What do you think ? > > No, we certainly can't close as NOTABUG - check the original description of > this bug. The firmware may not do what we really want, but we should still > advertise our capabilities properly in the driver. When a host-applied VLAN > is specified, we should be advertising NETIF_F_VLAN_CHALLENGED, instead of > the vlan offload capabilities. Well, to be fair there's vlan offloading since the VLAN is applied automatically and stripped automatically. I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log is the warning saying that the vlan won't work which should give the user a clue. Yeah, double VLAN mode looks useless to me - it just doesn't match the configuration model that Linux wants. Nor does it match this customer's configuration - the BZs haven't covered it since it seemed extraneous, but they usually want multiple outer tags as well as multiple inner tags. > Well, to be fair there's vlan offloading since the VLAN is applied automatically > and stripped automatically. Not from the guest's perspective. When the tag is host imposed, it's invisible to the guest, so it really doesn't care whether there was hardware assistance helping it be that way. > I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log > is the warning saying that the vlan won't work which should give the user a clue. From the guest's perspective, VLANing simply doesn't work, so absolutely NETIF_F_VLAN_CHALLENGED should be advertised - that's what the flag's for. (In reply to David Gibson from comment #17) > Yeah, double VLAN mode looks useless to me - it just doesn't match the > configuration model that Linux wants. Nor does it match this customer's > configuration - the BZs haven't covered it since it seemed extraneous, but > they usually want multiple outer tags as well as multiple inner tags. > > > Well, to be fair there's vlan offloading since the VLAN is applied automatically > > and stripped automatically. > > Not from the guest's perspective. When the tag is host imposed, it's > invisible to the guest, so it really doesn't care whether there was hardware > assistance helping it be that way. > Right. > > I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log > > is the warning saying that the vlan won't work which should give the user a clue. > > From the guest's perspective, VLANing simply doesn't work, so absolutely > NETIF_F_VLAN_CHALLENGED should be advertised - that's what the flag's for. Okay, I see your point but there's no reliable way to tell if we're in a host VLAN in order to advertise that. For now short of adding a new mbox message, I don't see any other way to tell for certain how the host has configured the guest from the guest's POV. I think we have to bite the bullet and add the mbox message. Ugly, but the confusion without it is even uglier. (In reply to David Gibson from comment #19) > I think we have to bite the bullet and add the mbox message. Ugly, but the > confusion without it is even uglier. I agree, my plan is this: create mbox message - IXGBE_VF_GET_PORT_TYPE with possible values: IXGBE_VF_PORT_UNTAGGED <- no vlans configured IXGBE_VF_PORT_ACCESS <- hypervisor vlan IXGBE_VF_PORT_TRUNK <- trunk mode In the beginning I'll make PORT_TRUNK and PORT_UNTAGGED the same thing as currently the virtualization software out there can't make use of that and we must be backwards compatible. I'll make the patch and suggest it to Intel, see what they have to say about the issue. When this becomes completely utilized then a VF in UNTAGGED/ACCESS modes should advertise VLAN_CHALLENGED, but for a start it should do so for the ACCESS mode, that won't break anything and fix the issue of this BZ. Sounds ok. I'm not quite clear on the distinction you're making between UNTAGGED and TRUNK. Can you clarify? (In reply to David Gibson from comment #21) > Sounds ok. I'm not quite clear on the distinction you're making between > UNTAGGED and TRUNK. Can you clarify? UNTAGGED is supposed to be a port which doesn't have the <vlan type='trunk'> or the <vlan> tag entirely, it should also advertise VLAN_CHALLENGED later on when support gets added for it in the virtualization software. I think I'm getting ahead a bit making this similar to switch ports without any support from the virt software, for the purposes of this BZ I'll only leave the ACCESS and TRUNK types. Hmm.. ok. It doesn't make sense to me to define the inter-kernel communications in terms of the libvirt configuration. It should be based on the actual semantic differences between the modes and I'm not clear what those are. AFAICT, with no <vlan> specific in the libvirt config, we'll have soemthing that should act as a trunk port. Created attachment 872709 [details]
ixgbe/ixgbevf: add new reset parameter for vlan capability [test patch]
This is a test patch for RHEL6 which adds a new parameter to the MAC permaddr
PF -> VF message which denotes if the VF should advertise VLAN_CHALLENGED that is
if the VF is in a host VLAN/qos then it should advertise VLAN_CHALLENGED. I've
tested this locally on a 82599 card. Any comments and suggestions are very
welcome.
Thanks,
Nik
(In reply to Nikolay Aleksandrov from comment #24) > Created attachment 872709 [details] > ixgbe/ixgbevf: add new reset parameter for vlan capability [test patch] > > This is a test patch for RHEL6 which adds a new parameter to the MAC permaddr > PF -> VF message which denotes if the VF should advertise VLAN_CHALLENGED > that is > if the VF is in a host VLAN/qos then it should advertise VLAN_CHALLENGED. > I've > tested this locally on a 82599 card. Any comments and suggestions are very > welcome. > > Thanks, > Nik One note: the patch is only a test version, I think it'd be nice to bump the API version so to know if the drivers need to look for the new parameter. That is to cover the cases old ixgbe - new ixgbevf and vice versa. If/when the time comes for official posting I'll take care of this. Taking a look at upstream, this is resolved by checking the return value of ndo_vlan_rx_add_vid. Since we can't change the netdev structure, I'll make a new patch which adds a new ndo_vlan_rx_add_vid to the netdev_extended one which can override the old one, if present. I'll make the vlan code use it and check the return value if it's defined for the interface, otherwise it'll fallback to the old one. Ok. Do I undestand correctly that that will avoid extending the mailbox protocol? If so, that sounds like the way forward, although workarounds to preserve kABI will be pretty awkward. (In reply to David Gibson from comment #27) > Ok. > > Do I undestand correctly that that will avoid extending the mailbox > protocol? If so, that sounds like the way forward, although workarounds to > preserve kABI will be pretty awkward. Yes, that's the idea. And I'm not happy about the kABI workarounds at all, they all look like hacks at best, but I think this is the cleanest way to go. Also the new mbox message isn't going to be accepted upstream since that is solved there. Yeah. Sometimes ugly hacks are the best you can do. Let me know when you have something I can test. *** Bug 1074564 has been marked as a duplicate of this bug. *** Created attachment 875517 [details]
Add new rx_add_vid method with check of return value
There're 2 patches in the archive, the first one adds the new ndo_vlan_rx_add_vid_ret to netdev_extended which, if present, is used instead of the other one in the vlan code with the difference that the return value of this method is checked afterwards and if there was an error the vlan won't get created.
Patch 02 adds support for this new method to ixgbevf.
I ran local tests by putting a VM in a hypervisor vlan and it was no longer possible to create vlans on top of the attached ixgbevf interface since they get NACKed by the host and an error is returned by the new rx_add_vid method.
Please give these a try, I'll continue testing them.
Thanks,
Nik
Btw, have we heard from Intel confirming that the hardware/firmware behaves like we thinks it does. The customer would really like a statement from Intel stating that it's not possible to apply an outer tag from the host and an inner one from the guest. It is not possible to apply an outer tag from the host and the inner tag from the guest. So comment #33 is correct. This cannot be done with out HW. John, Thanks for the statement, good to have that cleared up. Any chance of requesting that feature in a future firmware? Even if it requires that the inner tag be software-applied without the usual vlan hw acceleration, it seems like it could be quite useful for virtualized environments. Since you're here, would you also care to comment on bug 1067802? Although we have a workaround, the underlying cause there I'd consider a bug in the firmware, albeit a documented one. Created attachment 879793 [details]
Add new rx_add_vid method with check of return value v2
This is the updated patch-set, some notes:
after these patches both methods (old and new) should be kept up-to-date and
supported since some devices (e.g. bonding, bridge) call in the old methods directly. If the new method is used it should be kept in mind that vlgrp might
not be available when it's called so it should not be dependent on it.
Andy, I'm unclear are you saying the technical problems extend to v2 of the patchset, or just the original version? The problem with just issuing a warning to the kernel log is that it's very non-obvious from a user perspective: vconfig appears to have succeded, and there's no indication that the kernel log will contain more information on why things aren't working. In comment #37 I'm not sure what the feature request is. Having the HW be able to insert multiple VLAN tags by both the quests and hosts? Not sure how that would be workable with the Linux stack today. Also, not sure how that would get managed in the HW as there would seem be lots of edge cases and network configurations that would break. Please give an exact description of the feature and how you would expect it to work. @Vlad, note that in this case the switch configuration is fine. The customer is already using QinQ in production via Linux bridge devices on top of the raw (host) ixgbe. The difficulty is combining that with SR-IOV. We understand from Intel that it's simply not possible in the hardware, so this bug is now just about it fail in a way that's less cryptic. @John, > In comment #37 I'm not sure what the feature request is. Having the HW be able to insert multiple VLAN tags by both the quests and hosts? Not sure how that would be workable with the Linux stack today. Also, not sure how that would get managed in the HW as there would seem be lots of edge cases and network configurations that would break. > Please give an exact description of the feature and how you would expect it to work. * The idea is that you can add one VLAN tag from host, and one from guest. * I'd envisage this as a new mode in the PFVMVIR, instructing guest supplied tags to be inner tags to the host supplied outer tag, instead of the guest tag being either discarded or overriding the host tag. * Two ways it could work 1) Allow the guest to use the normal tagging features, even when the host also uses them, with guest set tags being applied as inner tags to the host outer tags. 2) Allowing guest VLAN spoof detection to be configured in such a way that packets given VLAN tags in guest software (rather than using the VF side VLAN tagging acceleration) will be encapsulated as-is in the host's outer VLAN frame, rather than being dropped as a spoof attempt. Ideally this would also need a guest visible bit so that it knows it must use software tagging rather than the usual hardware accelerated paths. The situation we have now is that the customer has an existing QinQ setup. This works correctly, provided that: * Both layers of tag are set by the host (one with hardware accel, one in software only) OR * (once bug 106928 is fixed) both layers of tag are set in the guest (again, one using the hw accel, one in software) It seems perverse that there's no way to have the exact same tags set, but with one configured from the host, and one from the guest. No customer confidential information in this case, so switching to public at customer request. Created attachment 926073 [details]
Add new rx_add_vid method with check of return value v3
Attaching the final patch version.
Closing this as WONTFIX. It's been NACK'ed in internal review by a number of developers. However, proposed patches that do address the problem are posted here by Nik (well done Nik) if you wish to use them. They would be rejected by upstream since the issue is solved there already later kernels (and RHEL 7 as well) already. So there are 2 possible solutions. Patches for RHEL6 Upgrade to RHEL7 The development team as a whole regrets this, but we find no good solution at this time in RHEL 6. Should something arise later, I may revisit this or note anything that develops later that may be helpful. |