RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1341248 - igb driver forbids resetting VF MAC address back to 00:00:00:00:00:00, which was its original value
Summary: igb driver forbids resetting VF MAC address back to 00:00:00:00:00:00, which ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Corinna Vinschen
QA Contact: LiLiang
URL:
Whiteboard:
Depends On:
Blocks: 1410076 1415609
TreeView+ depends on / blocked
 
Reported: 2016-05-31 14:26 UTC by Laine Stump
Modified: 2021-09-24 05:53 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-22 16:52:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Allow resetting MAC of igb VF to all-zero (1.05 KB, patch)
2016-05-31 17:17 UTC, Corinna Vinschen
no flags Details | Diff
Allow to remove administratively set MAC on VFs (3.51 KB, patch)
2016-06-03 14:05 UTC, Corinna Vinschen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1302166 0 medium CLOSED MAC address of VF is not editable even when attached to host 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1401837 0 medium CLOSED MAC address of VF is not reset by libvirt since ixgbe driver does not accept 00:00:00:00:00 2021-09-24 05:53:43 UTC
Red Hat Bugzilla 1415609 0 high CLOSED libvirt fails to remove guest mac from VFs of certain drivers 2021-09-24 05:39:43 UTC

Internal Links: 1302166 1401837 1415609

Description Laine Stump 2016-05-31 14:26:14 UTC
Description of problem:

Before libvirt modifies the MAC address and vlan tag for an SRIOV VF for use by a virtual machine (either using vfio device assignment or macvtap passthru mode), it saves the current MAC address and vlan tag so that it can reset them to their original value when the guest is done. (libvirt can't leave the VF MAC set to the value used by the now-defunct guest (since it may be started again later using a different VF), but it certainly shouldn't just pick any random value, either. So it saves the state of everything prior to using the VF, and resets it to that.)

The igb driver initializes the MAC addresses of all VFs to 00:00:00:00:00:00, and reports that when asked (via an RTM_GETLINK netlink message, also visible in the list of VFs in the output of "ip link show"). But when libvirt attempts to restore the MAC address back to 00:00:00:00:00:00 (using an RTM_SETLINK netlink message) the kernel responds with "Invalid argument".

If igb thinks it is acceptable for the MAC address to be all 0's when it is initialized, it should have no problem setting it *back* to this value later. 

Forbidding a reset back to the original value leaves the VF MAC at the value set for the now-defunct virtual machine. Especially on a system with NetworkManager enabled, this has very bad consequences, since NetworkManager forces all interfacess to be IFF_UP *all the time* - if the same virtual machine is restarted using a different VF (or even on a different host), there will be multiple interfaces watching for traffic with the same MAC address.

(note that the enic driver, definitely has the same problem, and most likely the mlx and igbe driver to as well. enic also initializes all VF MACs to 0 (and this info is used by at least one piece of userland code to retrieve a list of currently unused VFs). At one time igbe initialized VF MACs to 0 but it has unfortunately gone back to the (in my opinion) horribly broken practice of initializing VF MACs to random values. I'm not sure about the details of mlx, as I don't have access to an MLX card

Comment 1 Corinna Vinschen 2016-05-31 17:16:07 UTC
Afaics, enic is not affected:

  if (enic_is_dynamic(enic) || enic_is_sriov_vf(enic)) {
    if (!is_valid_ether_addr(addr) && !is_zero_ether_addr(addr))
      return -EADDRNOTAVAIL;

This looks like enic allows setting the MAC to all-zero in case of a VF.

As far as igb/igbvf is concerned, a patch looks like a two-liner, see
attached patch.  Any chance you could give it a try?  I'll submit it
upstream in a couple of days then.


Thanks,
Corinna

Comment 2 Corinna Vinschen 2016-05-31 17:17:06 UTC
Created attachment 1163316 [details]
Allow resetting MAC of igb VF to all-zero

Comment 3 Laine Stump 2016-06-01 15:15:09 UTC
Sure, I can test that. Would you prefer that I test on a pure upstream kernel first (on a Fedora23 machine), or on a RHEL7.x kernel (3.10.whatever)?

I'm working on starting an upstream build since that's the simplest for me.

Comment 4 Corinna Vinschen 2016-06-01 15:29:49 UTC
Upstream build is super.  The patch definitely applies cleanly to upstream
net-next, just a few lines off.


Thanks a lot,
Corinna

Comment 5 Laine Stump 2016-06-01 17:59:30 UTC
Yes, that patch on an upstream kernel solves the issue for igb. (I'm a kernel outsider, so not knowing any better I just updated to the tip of master).

I also found a machine with enic (although it's running RHEL6.x) and verified that you're correct - I had remembered incorrectly. (The issue with enic is different - they don't implement setting of VF mac addresses via netlink RTM_SETLINK to the PF *at all*, so you have to set their MAC address using the netdev name of the VF directly. But that's unrelated to the current problem)

I got someone with access to a machine with an X540 to verify that the ixgbe driver forbids MAC address 00:00:00:00:00:00 as well.

Now that this works, I'm wondering what to do about the fact that, while the VF MAC is now reset, the MAC of the netdev associated with the VF remains at the value libvirt originally set. The kernel gives this message when setting the VF MAC:

    igb 0000:02:00.0: setting MAC 00:00:00:00:00:00 on VF 0
    igb 0000:02:00.0: Reload the VF driver to make this change effective.

If I unbind the VF from the host driver, then rebind it, a new random MAC address is generated. That's an awfully heavy hammer though (and trying to set mac via the netdev name of the VF (after having previously set it via SETLINK for the PFname+VF#) gives this message:

 igb 0000:02:00.1: VF 0 attempted to override administratively set MAC address
               Reload the VF driver to resume operations

(and anyway even if that did work, trying to set all 0's via the VF's netdev name still gets an error from the driver, and I think there may be even less likelyhood of upstream allowing a patch that permits setting any netdev mac to 0's)

Comment 6 Corinna Vinschen 2016-06-02 13:40:44 UTC
Hi Laine,

these messages are generated by the driver no matter what MAC you
set on the VF.  They have nothing to do with the all-zero MAC.

"Reload the VF driver to make this change effective." is always printed
if you set the address via the PF netdev.

What *also* happens in this case is that a flag is set which indicates
that the PF set the MAC of the VF. This is treated as authoritative.

And that in turn explains why you get the second message,
"VF 0 attempted to override administratively set MAC address".  When
you're trying to set the VF MAC via the VF netdev, the igb driver checks
the aforementioned flag.  If it's set, it prints this message... and
refuses the request.

And here's the deal: If you set the VF MAC immediately on the VF netdev
name, you shouldn't see either of those messages.

Does that help?


Corinna

Comment 7 Corinna Vinschen 2016-06-02 17:24:16 UTC
Hi Laine,

I had another look into this:

If you fetch the VF's MAC via the PF's netdev, the MAC information is bogus. 
It's set to all-zero allright, but that's the MAC storage of the PF's driver
(igb) for its VFs.  It's *not* shared with the VF's driver (igbvf) storage
for the device.

And here's the problem: When the VF is initialized, igbvf randomizes
the MAC, but it doesn't tell the PF's driver.  That's why `ip link show'
on the PF shows the all-zero MACs, while in fact the VFs have valid,
randomized MACs to start with.

Bottom line, do not set the VF MAC via RTM_SETLINK on the PF.  It's not
doing what you want it to do and, worse, resetting the MAC to all-zero
this way is plain wrong.  Set the MAC via RTM_SETLINK on the VF's netdev.

Does that work for you?


Corinna

Comment 8 Corinna Vinschen 2016-06-02 19:22:18 UTC
To clarify: The VF doesn't inform the PF about it's new MAC as long as the
PF is not up.  As soon as the PF goes into up state, or if the PF is up
when instantaiting the VF, the VF propagates its MAC to the PF.  You can
observe this by adding and removing VFs and looking into the output from
`ip link show <PF>' in up vs. down states of the PF.

Since this behaviour is by design, I think we should close this BZ as
NOTABUG if you're ok with that.


Thanks,
Corinna

Comment 9 Laine Stump 2016-06-02 19:28:15 UTC
Sorry - in my attempt at brevity I neglected to say that I understand the source of the "administratively set blah blah" message (a flag set any time the MAC is set via the PF+VF# avenue) and the reason for the "reload the VF driver" message (and that it's unrelated to the MAC address being 0). I just wanted to describe enough that you'd understand the situation I'm talking about.

The problem is that we can't set the MAC address using the VF netdev directly, because that doesn't work when you are assigning the device to a guest with vfio. The problem has multiple layers, but here are the most important parts:

* you need the netdev visible (i.e. the host netdev driver bound to the VF) in order to set the MAC via the netdev name, which means that you need to do it prior to unbinding the host driver (igbeixgbe/whatever) from the VF (which must be done in order to assign the VF to a virtual guest with vfio)

* Any MAC that is set in this manner while the host netdev driver is bound to the device will be lost when you unbind it - the guest ends up with "some other random MAC address".

* Because the MAC address of the VF is reinitialized when the VF is unbound from the host driver, what libvirt has to do is this:

 - unbind the VF from the host driver
 - bind the VF to the vfio-pci driver
 - save the MAC address that was stored in the VF table
   (which happens to be 0, but *isn't* the MAC previously
   used by the VF's netdev)
 - set the MAC address of the VF via PFnetdev+VF#
 - (qemu uses vfio to assign the device to the guest. It magically
   has the MAC address that was set via PFnetdev+VF#)
 - (qemu releases the device)
 - restore the MAC address (via PFnetdev+VF# that was saved in step 3
 - unbind the VF from vfio-pci and rebind to host net driver.

Any change in the ordering and it won't work. For example, if you set the MAC address before you unbind from the host driver (via *either* method) you'll find that the MAC address that shows up in the guest is (again) a new random value.

And of course, once you've set the MAC address via PFNetdev+VF# to do device assignment, that device has "the mark" (the administratively set flag) and can no longer be set via the VFnetdev. So if you need to set the mac for macvtap passthrough mode (which doesn't unbind the device from the host netdev driver, but just takes it over for exclusive use by an emulated net driver in qemu), you must use the PFnetdev+VF# method for any future setting of the MAC.

In the end, we just need to be able to set the MAC address that will be used by the guest, and then reset it to its original value after the guest is finished with it.

Comment 10 Corinna Vinschen 2016-06-03 14:05:06 UTC
Created attachment 1164530 [details]
Allow to remove administratively set MAC on VFs

Comment 11 Corinna Vinschen 2016-06-03 14:06:38 UTC
Ok, thanks for the explanation, I think I understand your scenario
better now.

However, what I don't understand is the order in terms of saving/restoring
the VF's MAC.  What sense does it make to save the 0 MAC after binding 
the vfio driver?  It has no meaning to the host.  IIUC, your order does
not preserve the original MAC set in the host VFnetdev. Unbinding igbvf
results in losing the original MAC, so when rebinding igbvf as last step
you'll get a new random MAC for the VFnetdev.

For the sake of discussion let's ignore the authoritativness of setting
the MAC via PFnetdev+VF#.  Given that, from my POV the order should be
this:

 - save the MAC address of the VF via the VFnetdev
 - unbind the VF from the host driver
 - bind the VF to the vfio-pci driver
 - set the MAC address of the VF via PFnetdev+VF#
 - (qemu uses vfio to assign the device to the guest)
 - (qemu releases the device)
 - ???
 - unbind the VF from vfio-pci and rebind to host net driver
 - restore the MAC address saved in step 1 via VFnetdev

Doesn't that make more sense?

We just have to resolve the three question marks in the third to last step.
The authoritativness of setting the MAC via PFnetdev+VF# is in the
way.  I'm pretty sure that this makes sense in general, but I agree
that in your scenario we need a way to revert this authority.  And, yeah,
I think setting the MAC to 0 via PFnetdev+VF# should implement exactly
that.  So in place of the above three question marks, libvirt could
then simply do

  - set the MAC address of the VF to 0 via PFnetdev+VF#

I attached a new patch, replacing my previous patch, to allow exactly
that in igb.  It works fine in my limited testing, but it would be nice
if you could apply this to your local test kernel (after removing my
previous patch) and try if this works for you.

However, there's a catch:  By implementing this as the golden path for
libvirt, we're setting a precedent for a new generic method.  This would
have to be implemented in all drivers providing SRIOV VFs and as such,
needs upstream approval by affected parties.


Corinna

Comment 12 Corinna Vinschen 2016-06-03 14:26:49 UTC
It just occured to me... what if an administrative MAC was already set
before unbinding the VF?  Ultimately you'd have to keep track of both to
restore the original situation correctly, along these lines:

 - save administrative MAC of the VF
 - if administrative MAC is 0, save VFnetdev MAC
 - unbind VF, rebind vfio-pci
 - [rinse]
 - unbind VF, rebind to host driver
 - if administrative MAC from step 1 was a valid MAC, restore it.
   Otherwise restore MAC from step 2 via VFnetdev.


Corinna

Comment 13 Yaniv Lavi 2017-01-18 09:50:00 UTC
Can you help with this?
It's causing a bug in RHV and the last update was a long time ago.

Comment 14 Bronce McClain 2017-01-18 10:50:29 UTC
(In reply to Yaniv Dary from comment #13)
> Can you help with this?
> It's causing a bug in RHV and the last update was a long time ago.

Urgent enough for a 7.3.z request?

Comment 15 Laine Stump 2017-01-18 19:10:15 UTC
Yaniv, what is RHV's exact scenario? Are you doing device assignment, or macvtap passthrough?

(The lapse in activity is entirely my fault - I had gotten pulled away from this in the middle of the investigation because of other items pre 7.3, and put it at a relatively lower priority because I discovered that my own problematic scenario had more basic problems rendering it completely unworkable even if this was fixed - in the absence of bug customer bug reports for this lower problem (macvtap passthrough w/o any 801.11Qb[gh]) not working *at all*) I made the rash assumption that it wasn't being used anyway, so chose to push on other things that actually were problematic to customers)

In the interim, I've discovered some other oddities - for example, although the MAC address set via PFnetdev+VF# doesn't take effect until the VF is unbound from its host driver and bound to vfio-pci, the *vlan tag* set in the same manner (and displayed alongside the MAC address in ip link show output) is set and takes effect immediately.

Along with being able to set a MAC address of 00:00:00:00:00:00, I do agree with Corinna that we need to have a way to override the "administratively set" restriction - currently the only way to do that is to reload the *PF* driver, but that would require resetting all of the VFs! (which is obviously not a possibility). I believe at least *some* of the other SRIOV net drivers have been modified to allow 00:00:00:00:00:00. I don't know which have the "administratively set" bit though - I believe at least igb and ixgbe. Whatever is done, we need to keep in mind that behavior should be the same for *all* SRIOV drivers - libvirt doesn't want to go down the road of separate code paths depending on which driver a device is bound to.

Comment 16 Dan Kenigsberg 2017-01-19 15:14:34 UTC
RHV passes a VF to the VM with something like the following:

        <interface type='hostdev' managed='no'>
          <driver name='vfio'/>
          <source>
           <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
           function='0x0'/>
          </source>
          <mac address='52:54:00:6d:90:02'/>
          <vlan>
           <tag id=100/>
          </vlan>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
          function='0x0'/>
          <boot order='1'/>
         </interface>

After the VM is stopped (or <interface> unplugged) we expect the VF to loose its mac and to be reusable for other VMs.

+1 to the request from kernel drivers, to present a consistent API to userland, regardless of specific kernel.

Comment 17 Laine Stump 2017-01-19 17:00:33 UTC
Okay, so you're using it for VFIO device assignment rather than macvtap passthrough, and you never re-bind it to the host driver. So in your case, simply allowing a setting of 00:00:00:00:00:00 via PFnetdev+VF# would solve the problem.

(I hesitate to suggest it since it is a horrible ugly hack, but for now you could achieve a workable setup by initializing all VF MACs to 00:00:00:00:00:01 when the host boots - that way the MAC that is saved by libvirt will be a value the kernel will accept when it's being restored (and also won't conflict with any MAC address that's actually in use).

Comment 18 Laine Stump 2017-02-12 19:31:48 UTC
Corrina - your idea to have adminstrative MAC set to 0 also clear the "administratively set" flag is interesting. We really do need *some* way to reset that flag. Of course it would have to work the same for all drivers. I don't know if mlx has such a flag, but ixgbe does. Maybe that should be split out into a separate issue, and we can just use this BZ to add a simple patch that allows setting the admin MAC to 0 (which solves Dan's / RHV's problem).

There is a separate BZ for ixgbe and 0 MAC - Bug 1401837, and for Mellanox - Bug Bug 1302166.

Comment 19 Corinna Vinschen 2017-02-16 16:34:33 UTC
I'm still not quite sure that the ability to reset the administrative
MAC to 00:00:00:00:00:00 is really required here.  Setting the MAC
administratively via the PF is always possible and does not require
to unload and reload the PF.  Why not just set the MAC to a random
value via PF after usage?  It serves the purpose *and* works in future
scenarios for NICs which don't allow resetting to 00:00:00:00:00:00.

Corinna

Comment 20 Laine Stump 2017-02-27 19:50:38 UTC
This all started out as a failure being reported due to libvirt attempting to save the current state of a device before user, and restore it to the exact same state afterwards. Along the way, I discovered that libvirt was doing the wrong thing in the case of macvtap passthrough, and that even in the case of vfio device assignment (where only the admin MAC is significant), just setting the admin MAC back to it's original value doesn't leave the device in the original state anyway - 1) there is still the problem of the "administratively set" flag, and 2) if we were able to set it back to 0, that would result in a *different* random MAC address being set in the vf the next time its driver was initialized. And aside from that, as has been pointed out in several different BZs now, "fixing it in this driver doesn't fix it in the others" (and its apparent that not everybody thinks that it's something that needs to be fix, so that would be an uphill battle).

Based on all of that, I'm working on patches for libvirt that both "do the right thing" for macvtap passthrough mode (using something very close to what Corinna suggested in Comment 12), and also check for failed attempts to set a MAC address of 0, and retry with "almost 0". Although I still think that it should be allowed to restore any setting to its initial value, the amount of code to work around that inability is relatively small, and won't cause problems for the other drivers that don't have that limitation.

So feel free to close this BZ and point at Bug 1415609, which will track all the libvirt changes.

Comment 21 Laine Stump 2017-03-29 01:47:23 UTC
All the libvirt patches discussed here and in Bug 1415609 have been pushed, including the patch to set to 02:00:00:00:00:00 when attempts at 00:00:00:00:00:00 fail (although that now rarely ever happens due to the change in save/restore algorithm). So definitely close this BZ with whatever disposition you think is most appropriate (my opinion is "WONTFIX", although yours may be "NOTABUG" :-)

Comment 22 Corinna Vinschen 2017-03-30 07:37:33 UTC
Hi Laine,

from my point of view changing libvirt to handle this scenario gracefully
was the right thing to do.  However, this doesn't make it wrong to change
the driver as well.  I'll send the igb patch upstream the next couple of
days to ask Intel how they think of this.


Thanks,
Corinna

Comment 23 Corinna Vinschen 2017-04-04 15:27:55 UTC
http://www.spinics.net/lists/netdev/msg428294.html


Note You need to log in before you can comment on or make changes to this bug.