Bug 1341248
| Summary: | igb driver forbids resetting VF MAC address back to 00:00:00:00:00:00, which was its original value | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Laine Stump <laine> | ||||||
| Component: | kernel | Assignee: | Corinna Vinschen <vinschen> | ||||||
| kernel sub component: | NIC Drivers | QA Contact: | LiLiang <liali> | ||||||
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |||||||
| Severity: | unspecified | ||||||||
| Priority: | unspecified | CC: | bhu, bmcclain, danken, jshortt, laine, liali, linville, mburman, network-qe, sassmann, vinschen, yanghliu, ylavi | ||||||
| Version: | 7.2 | ||||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2018-08-22 16:52:51 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1410076, 1415609 | ||||||||
| Attachments: |
|
||||||||
|
Description
Laine Stump
2016-05-31 14:26:14 UTC
Afaics, enic is not affected:
if (enic_is_dynamic(enic) || enic_is_sriov_vf(enic)) {
if (!is_valid_ether_addr(addr) && !is_zero_ether_addr(addr))
return -EADDRNOTAVAIL;
This looks like enic allows setting the MAC to all-zero in case of a VF.
As far as igb/igbvf is concerned, a patch looks like a two-liner, see
attached patch. Any chance you could give it a try? I'll submit it
upstream in a couple of days then.
Thanks,
Corinna
Created attachment 1163316 [details]
Allow resetting MAC of igb VF to all-zero
Sure, I can test that. Would you prefer that I test on a pure upstream kernel first (on a Fedora23 machine), or on a RHEL7.x kernel (3.10.whatever)? I'm working on starting an upstream build since that's the simplest for me. Upstream build is super. The patch definitely applies cleanly to upstream net-next, just a few lines off. Thanks a lot, Corinna Yes, that patch on an upstream kernel solves the issue for igb. (I'm a kernel outsider, so not knowing any better I just updated to the tip of master).
I also found a machine with enic (although it's running RHEL6.x) and verified that you're correct - I had remembered incorrectly. (The issue with enic is different - they don't implement setting of VF mac addresses via netlink RTM_SETLINK to the PF *at all*, so you have to set their MAC address using the netdev name of the VF directly. But that's unrelated to the current problem)
I got someone with access to a machine with an X540 to verify that the ixgbe driver forbids MAC address 00:00:00:00:00:00 as well.
Now that this works, I'm wondering what to do about the fact that, while the VF MAC is now reset, the MAC of the netdev associated with the VF remains at the value libvirt originally set. The kernel gives this message when setting the VF MAC:
igb 0000:02:00.0: setting MAC 00:00:00:00:00:00 on VF 0
igb 0000:02:00.0: Reload the VF driver to make this change effective.
If I unbind the VF from the host driver, then rebind it, a new random MAC address is generated. That's an awfully heavy hammer though (and trying to set mac via the netdev name of the VF (after having previously set it via SETLINK for the PFname+VF#) gives this message:
igb 0000:02:00.1: VF 0 attempted to override administratively set MAC address
Reload the VF driver to resume operations
(and anyway even if that did work, trying to set all 0's via the VF's netdev name still gets an error from the driver, and I think there may be even less likelyhood of upstream allowing a patch that permits setting any netdev mac to 0's)
Hi Laine, these messages are generated by the driver no matter what MAC you set on the VF. They have nothing to do with the all-zero MAC. "Reload the VF driver to make this change effective." is always printed if you set the address via the PF netdev. What *also* happens in this case is that a flag is set which indicates that the PF set the MAC of the VF. This is treated as authoritative. And that in turn explains why you get the second message, "VF 0 attempted to override administratively set MAC address". When you're trying to set the VF MAC via the VF netdev, the igb driver checks the aforementioned flag. If it's set, it prints this message... and refuses the request. And here's the deal: If you set the VF MAC immediately on the VF netdev name, you shouldn't see either of those messages. Does that help? Corinna Hi Laine, I had another look into this: If you fetch the VF's MAC via the PF's netdev, the MAC information is bogus. It's set to all-zero allright, but that's the MAC storage of the PF's driver (igb) for its VFs. It's *not* shared with the VF's driver (igbvf) storage for the device. And here's the problem: When the VF is initialized, igbvf randomizes the MAC, but it doesn't tell the PF's driver. That's why `ip link show' on the PF shows the all-zero MACs, while in fact the VFs have valid, randomized MACs to start with. Bottom line, do not set the VF MAC via RTM_SETLINK on the PF. It's not doing what you want it to do and, worse, resetting the MAC to all-zero this way is plain wrong. Set the MAC via RTM_SETLINK on the VF's netdev. Does that work for you? Corinna To clarify: The VF doesn't inform the PF about it's new MAC as long as the PF is not up. As soon as the PF goes into up state, or if the PF is up when instantaiting the VF, the VF propagates its MAC to the PF. You can observe this by adding and removing VFs and looking into the output from `ip link show <PF>' in up vs. down states of the PF. Since this behaviour is by design, I think we should close this BZ as NOTABUG if you're ok with that. Thanks, Corinna Sorry - in my attempt at brevity I neglected to say that I understand the source of the "administratively set blah blah" message (a flag set any time the MAC is set via the PF+VF# avenue) and the reason for the "reload the VF driver" message (and that it's unrelated to the MAC address being 0). I just wanted to describe enough that you'd understand the situation I'm talking about. The problem is that we can't set the MAC address using the VF netdev directly, because that doesn't work when you are assigning the device to a guest with vfio. The problem has multiple layers, but here are the most important parts: * you need the netdev visible (i.e. the host netdev driver bound to the VF) in order to set the MAC via the netdev name, which means that you need to do it prior to unbinding the host driver (igbeixgbe/whatever) from the VF (which must be done in order to assign the VF to a virtual guest with vfio) * Any MAC that is set in this manner while the host netdev driver is bound to the device will be lost when you unbind it - the guest ends up with "some other random MAC address". * Because the MAC address of the VF is reinitialized when the VF is unbound from the host driver, what libvirt has to do is this: - unbind the VF from the host driver - bind the VF to the vfio-pci driver - save the MAC address that was stored in the VF table (which happens to be 0, but *isn't* the MAC previously used by the VF's netdev) - set the MAC address of the VF via PFnetdev+VF# - (qemu uses vfio to assign the device to the guest. It magically has the MAC address that was set via PFnetdev+VF#) - (qemu releases the device) - restore the MAC address (via PFnetdev+VF# that was saved in step 3 - unbind the VF from vfio-pci and rebind to host net driver. Any change in the ordering and it won't work. For example, if you set the MAC address before you unbind from the host driver (via *either* method) you'll find that the MAC address that shows up in the guest is (again) a new random value. And of course, once you've set the MAC address via PFNetdev+VF# to do device assignment, that device has "the mark" (the administratively set flag) and can no longer be set via the VFnetdev. So if you need to set the mac for macvtap passthrough mode (which doesn't unbind the device from the host netdev driver, but just takes it over for exclusive use by an emulated net driver in qemu), you must use the PFnetdev+VF# method for any future setting of the MAC. In the end, we just need to be able to set the MAC address that will be used by the guest, and then reset it to its original value after the guest is finished with it. Created attachment 1164530 [details]
Allow to remove administratively set MAC on VFs
Ok, thanks for the explanation, I think I understand your scenario better now. However, what I don't understand is the order in terms of saving/restoring the VF's MAC. What sense does it make to save the 0 MAC after binding the vfio driver? It has no meaning to the host. IIUC, your order does not preserve the original MAC set in the host VFnetdev. Unbinding igbvf results in losing the original MAC, so when rebinding igbvf as last step you'll get a new random MAC for the VFnetdev. For the sake of discussion let's ignore the authoritativness of setting the MAC via PFnetdev+VF#. Given that, from my POV the order should be this: - save the MAC address of the VF via the VFnetdev - unbind the VF from the host driver - bind the VF to the vfio-pci driver - set the MAC address of the VF via PFnetdev+VF# - (qemu uses vfio to assign the device to the guest) - (qemu releases the device) - ??? - unbind the VF from vfio-pci and rebind to host net driver - restore the MAC address saved in step 1 via VFnetdev Doesn't that make more sense? We just have to resolve the three question marks in the third to last step. The authoritativness of setting the MAC via PFnetdev+VF# is in the way. I'm pretty sure that this makes sense in general, but I agree that in your scenario we need a way to revert this authority. And, yeah, I think setting the MAC to 0 via PFnetdev+VF# should implement exactly that. So in place of the above three question marks, libvirt could then simply do - set the MAC address of the VF to 0 via PFnetdev+VF# I attached a new patch, replacing my previous patch, to allow exactly that in igb. It works fine in my limited testing, but it would be nice if you could apply this to your local test kernel (after removing my previous patch) and try if this works for you. However, there's a catch: By implementing this as the golden path for libvirt, we're setting a precedent for a new generic method. This would have to be implemented in all drivers providing SRIOV VFs and as such, needs upstream approval by affected parties. Corinna It just occured to me... what if an administrative MAC was already set before unbinding the VF? Ultimately you'd have to keep track of both to restore the original situation correctly, along these lines: - save administrative MAC of the VF - if administrative MAC is 0, save VFnetdev MAC - unbind VF, rebind vfio-pci - [rinse] - unbind VF, rebind to host driver - if administrative MAC from step 1 was a valid MAC, restore it. Otherwise restore MAC from step 2 via VFnetdev. Corinna Can you help with this? It's causing a bug in RHV and the last update was a long time ago. (In reply to Yaniv Dary from comment #13) > Can you help with this? > It's causing a bug in RHV and the last update was a long time ago. Urgent enough for a 7.3.z request? Yaniv, what is RHV's exact scenario? Are you doing device assignment, or macvtap passthrough? (The lapse in activity is entirely my fault - I had gotten pulled away from this in the middle of the investigation because of other items pre 7.3, and put it at a relatively lower priority because I discovered that my own problematic scenario had more basic problems rendering it completely unworkable even if this was fixed - in the absence of bug customer bug reports for this lower problem (macvtap passthrough w/o any 801.11Qb[gh]) not working *at all*) I made the rash assumption that it wasn't being used anyway, so chose to push on other things that actually were problematic to customers) In the interim, I've discovered some other oddities - for example, although the MAC address set via PFnetdev+VF# doesn't take effect until the VF is unbound from its host driver and bound to vfio-pci, the *vlan tag* set in the same manner (and displayed alongside the MAC address in ip link show output) is set and takes effect immediately. Along with being able to set a MAC address of 00:00:00:00:00:00, I do agree with Corinna that we need to have a way to override the "administratively set" restriction - currently the only way to do that is to reload the *PF* driver, but that would require resetting all of the VFs! (which is obviously not a possibility). I believe at least *some* of the other SRIOV net drivers have been modified to allow 00:00:00:00:00:00. I don't know which have the "administratively set" bit though - I believe at least igb and ixgbe. Whatever is done, we need to keep in mind that behavior should be the same for *all* SRIOV drivers - libvirt doesn't want to go down the road of separate code paths depending on which driver a device is bound to. RHV passes a VF to the VM with something like the following:
<interface type='hostdev' managed='no'>
<driver name='vfio'/>
<source>
<address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
</source>
<mac address='52:54:00:6d:90:02'/>
<vlan>
<tag id=100/>
</vlan>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0'/>
<boot order='1'/>
</interface>
After the VM is stopped (or <interface> unplugged) we expect the VF to loose its mac and to be reusable for other VMs.
+1 to the request from kernel drivers, to present a consistent API to userland, regardless of specific kernel.
Okay, so you're using it for VFIO device assignment rather than macvtap passthrough, and you never re-bind it to the host driver. So in your case, simply allowing a setting of 00:00:00:00:00:00 via PFnetdev+VF# would solve the problem. (I hesitate to suggest it since it is a horrible ugly hack, but for now you could achieve a workable setup by initializing all VF MACs to 00:00:00:00:00:01 when the host boots - that way the MAC that is saved by libvirt will be a value the kernel will accept when it's being restored (and also won't conflict with any MAC address that's actually in use). Corrina - your idea to have adminstrative MAC set to 0 also clear the "administratively set" flag is interesting. We really do need *some* way to reset that flag. Of course it would have to work the same for all drivers. I don't know if mlx has such a flag, but ixgbe does. Maybe that should be split out into a separate issue, and we can just use this BZ to add a simple patch that allows setting the admin MAC to 0 (which solves Dan's / RHV's problem). There is a separate BZ for ixgbe and 0 MAC - Bug 1401837, and for Mellanox - Bug Bug 1302166. I'm still not quite sure that the ability to reset the administrative MAC to 00:00:00:00:00:00 is really required here. Setting the MAC administratively via the PF is always possible and does not require to unload and reload the PF. Why not just set the MAC to a random value via PF after usage? It serves the purpose *and* works in future scenarios for NICs which don't allow resetting to 00:00:00:00:00:00. Corinna This all started out as a failure being reported due to libvirt attempting to save the current state of a device before user, and restore it to the exact same state afterwards. Along the way, I discovered that libvirt was doing the wrong thing in the case of macvtap passthrough, and that even in the case of vfio device assignment (where only the admin MAC is significant), just setting the admin MAC back to it's original value doesn't leave the device in the original state anyway - 1) there is still the problem of the "administratively set" flag, and 2) if we were able to set it back to 0, that would result in a *different* random MAC address being set in the vf the next time its driver was initialized. And aside from that, as has been pointed out in several different BZs now, "fixing it in this driver doesn't fix it in the others" (and its apparent that not everybody thinks that it's something that needs to be fix, so that would be an uphill battle). Based on all of that, I'm working on patches for libvirt that both "do the right thing" for macvtap passthrough mode (using something very close to what Corinna suggested in Comment 12), and also check for failed attempts to set a MAC address of 0, and retry with "almost 0". Although I still think that it should be allowed to restore any setting to its initial value, the amount of code to work around that inability is relatively small, and won't cause problems for the other drivers that don't have that limitation. So feel free to close this BZ and point at Bug 1415609, which will track all the libvirt changes. All the libvirt patches discussed here and in Bug 1415609 have been pushed, including the patch to set to 02:00:00:00:00:00 when attempts at 00:00:00:00:00:00 fail (although that now rarely ever happens due to the change in save/restore algorithm). So definitely close this BZ with whatever disposition you think is most appropriate (my opinion is "WONTFIX", although yours may be "NOTABUG" :-) Hi Laine, from my point of view changing libvirt to handle this scenario gracefully was the right thing to do. However, this doesn't make it wrong to change the driver as well. I'll send the igb patch upstream the next couple of days to ask Intel how they think of this. Thanks, Corinna |