Bug 1926190 - [SR-IOV] [Failover] Cannot detach sriov VF with same mac as virtio
Summary: [SR-IOV] [Failover] Cannot detach sriov VF with same mac as virtio
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.4
Assignee: Laine Stump
QA Contact: yalzhang@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1688177
TreeView+ depends on / blocked
 
Reported: 2021-02-08 12:39 UTC by Ales Musil
Modified: 2021-05-25 06:47 UTC (History)
12 users (show)

Fixed In Version: libvirt-7.0.0-4.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:47:28 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ales Musil 2021-02-08 12:39:46 UTC
Description of problem:

Unplugging SR-IOV interface with same MAC as virtio interface via 
detach-device fails

Version-Release number of selected component (if applicable):
libvirt-6.6.0-13.module+el8.3.1+9548+0a8fede5.x86_64

How reproducible:
100%

Steps to Reproduce:
1. virsh attach-device --domain 1 --live virtio.xml
2. virsh attach-device --domain 1 --live sriov.xml
3. virsh detach-device --domain 1 --live sriov.xml

Actual results:
error: Failed to detach device from sriov.xml
error: operation failed: multiple devices matching MAC address 00:00:00:00:00:40 found

Expected results:
Successful detach

Additional info:

sriov.xml: 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<interface managed="no" type="hostdev">
  <mac address="00:00:00:00:00:40" />
  <source>
    <address bus="0xb3" domain="0x0000" function="0x3" slot="0x02" type="pci" />
  </source>
  <link state="up" />
  <driver name="vfio" />
  <alias name="ua-sriov-hostdev" />
  <teaming type="transient" persistent="ua-virtio-net"/>
</interface>

virtio.xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<interface type="bridge">
  <mac address="00:00:00:00:00:40"></mac>
  <model type="virtio" />
  <link state="up" />
  <source bridge="ovirtmgmt" />
  <mtu size="1500" />
  <filterref filter="vdsm-no-mac-spoofing" />
  <alias name="ua-virtio-net" />
   <teaming type="persistent"/>
</interface>

Comment 1 yalzhang@redhat.com 2021-02-09 01:53:07 UTC
This is expected, refer to https://bugzilla.redhat.com/show_bug.cgi?id=872028#c3
Libvirt does not check "type" when detach an interface, it checks mac address and pci address(if specified) during detach. In comment 0, the interface xml does not include the pci address, so it only check the mac address. As there are 2 interfaces with the same mac, the detach command failed with a reasonable error message. To detach the interface, we can use "detach-device-alias", or get the live xml with pci address from dumpxml first.

Comment 2 Laine Stump 2021-02-09 07:08:33 UTC
Everything yalzhang says in Commit 1 is correct - currently the only things that libvirt looks at to determine a unique match are MAC address and PCI address.

The problem is that in Alex's case, he doesn't have the PCI address. What he does have is the alias of the devices, and that is also unique. I think we can make this work by just expanding the device matching code for interfaces to also check for a matching alias (if it was provided). This shouldn't break any existing code, but would cause the example test to succeed (and I'm assuming would also work with whatever golang code Alex actually needs to make this work with.

Alex - Is it correct that you'll be sending the full XML as you've provided in the description when you want to detach an interface? I think I can make a simple patch to do that in the morning.

Comment 3 Laine Stump 2021-02-09 07:19:56 UTC
First: s/golang/python/ in my last comment - I was thinking about the wrong project :-)


Also, maybe not quite so trivial as I thought, but I'll still try to make it work - it turns out that the XML parser is immediately deleting any alias name that isn't prefixed with "ua-", so while the simple patch I envisioned would work in your case, it wouldn't work in the general case - I would need to remove that code from the parser and put it further up where it really should be.

Comment 4 Ales Musil 2021-02-09 07:22:58 UTC
(In reply to Laine Stump from comment #2)
> Everything yalzhang says in Commit 1 is correct - currently the only things
> that libvirt looks at to determine a unique match are MAC address and PCI
> address.
> 
> The problem is that in Alex's case, he doesn't have the PCI address. What he
> does have is the alias of the devices, and that is also unique. I think we
> can make this work by just expanding the device matching code for interfaces
> to also check for a matching alias (if it was provided). This shouldn't
> break any existing code, but would cause the example test to succeed (and
> I'm assuming would also work with whatever golang code Alex actually needs
> to make this work with.

It would be nice if the matching is extended to alias as this should be the same as detach-device-alias right? 
Changing existing code to detach-device-alias would require pretty huge code change on our side. 

> 
> Alex - Is it correct that you'll be sending the full XML as you've provided
> in the description when you want to detach an interface? I think I can make
> a simple patch to do that in the morning.

Yes, we are sending full XML in the detach so the alias should be always present. 

Thanks

Comment 5 Ales Musil 2021-02-09 07:24:04 UTC
(In reply to Laine Stump from comment #3)
> First: s/golang/python/ in my last comment - I was thinking about the wrong
> project :-)
> 
> 
> Also, maybe not quite so trivial as I thought, but I'll still try to make it
> work - it turns out that the XML parser is immediately deleting any alias
> name that isn't prefixed with "ua-", so while the simple patch I envisioned
> would work in your case, it wouldn't work in the general case - I would need
> to remove that code from the parser and put it further up where it really
> should be.

Just FYI, in our case the alias is always starting with "ua-".

Comment 6 Laine Stump 2021-02-10 21:39:31 UTC
Proposed fix posted upstream:

https://listman.redhat.com/archives/libvir-list/2021-February/msg00649.html

It turns out that the situation *was* as simple as I'd at first anticipated - the parser only clears out non "ua-" alias names when parsing "inactive" XML, so even a plain "<alias name='hostdev0'/> will be left intact by virDomainDetachDeviceFlags().

This does point out that, in the case of attempting to remove a device from the persistent (i.e. inactive) domain config, you will still have the same problem unless you give your device's explicit alias names (which of course must start with "ua-"); there's really no way around that though.

Comment 7 Laine Stump 2021-02-11 23:09:36 UTC
This is now pushed upstream:

commit 114e3b423210d316b3326816fd2c33335b1167fe
Author: Laine Stump <laine>
Date:   Wed Feb 10 14:52:25 2021 -0500

    qemu: match alias when looking for proper <interface> to detach.

Comment 11 yalzhang@redhat.com 2021-02-24 05:33:15 UTC
Hi laine,

I have tested libvirt-7.0.0-6.module+el8.4.0+10144+c3d3c217.x86_64 for below scenarios, there may be some regressions:

1. Cold update-device to add customer alias name, it succeeded in the past:
# virsh list --all
 Id   Name             State
---------------------------------
 -    avocado-vt-vm1   shut off
# virsh dumpxml avocado-vt-vm1 | grep /interface -B5
    <interface type='network'>
      <mac address='52:54:00:78:ed:b0'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

# cat interface_with_alias.xml
<interface type='network'>
      <mac address='52:54:00:78:ed:b0'/>
      <source network='default'/>
      <model type='virtio'/>
      <alias name='ua-ea491ffd-60c1-47b0-8a89-134abda8ef7f'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

# virsh update-device avocado-vt-vm1 interface_with_alias.xml
error: Failed to update device from interface_with_alias.xml
error: device not found: no device matching MAC address 52:54:00:78:ed:b0 found on 0000:01:00.0

2. live update the alias:
# virsh start avocado-vt-vm1
Domain 'avocado-vt-vm1' started

# virsh dumpxml avocado-vt-vm1 | grep /interface -B7
    <interface type='network'>
      <mac address='52:54:00:78:ed:b0'/>
      <source network='default' portid='f7ac64ce-0831-4293-b8f5-8cb4da55d0b2' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

# cat interface.xml 
<interface type='network'>
      <mac address='52:54:00:78:ed:b0'/>
      <source network='default' portid='f7ac64ce-0831-4293-b8f5-8cb4da55d0b2' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='ua-test'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>

# virsh update-device avocado-vt-vm1 interface.xml
error: Failed to update device from interface.xml
error: device not found: no device matching MAC address 52:54:00:78:ed:b0 found on 0000:01:00.0

It was not supported in the past, either, but the error message was:
"Operation not supported: cannot modify network device alias"
or "Operation forbidden: changing device alias is not allowed" which makes more sense.

Comment 12 yalzhang@redhat.com 2021-02-24 05:52:55 UTC
similar scenarios when hot unplug:

# virsh dumpxml avocado-vt-vm1 | grep /interface -B12
...
<interface type='network'>
      <mac address='52:54:00:ab:1c:27'/>
      <source network='default' portid='0b147661-cc1c-4e91-ba17-4fa48aad54d9' bridge='virbr0'/>
      <target dev='vnet4'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
# cat net6.xml
<interface type='network'>
      <mac address='52:54:00:ab:1c:27'/>
      <source network='default' portid='0b147661-cc1c-4e91-ba17-4fa48aad54d9' bridge='virbr0'/>
      <target dev='vnet4'/>
      <model type='virtio'/>
      <alias name='net9'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
# virsh detach-device avocado-vt-vm1 net6.xml
error: Failed to detach device from net6.xml
error: device not found: no device matching MAC address 52:54:00:ab:1c:27 found on 0000:04:00.0

# cat net6_withoutpci.xml
<interface type='network'>
      <mac address='52:54:00:ab:1c:27'/>
      <source network='default' portid='0b147661-cc1c-4e91-ba17-4fa48aad54d9' bridge='virbr0'/>
      <target dev='vnet4'/>
      <model type='virtio'/>
      <alias name='net9'/>
    </interface>
# virsh detach-device avocado-vt-vm1 net6_withoutpci.xml
error: Failed to detach device from net6_withoutpci.xml
error: device not found: no device matching MAC address 52:54:00:ab:1c:27 found

The error message can be improved to include the alias name, or it will be confusing.

Comment 13 Laine Stump 2021-03-02 16:25:16 UTC
Yeah, I hadn't considered how the change to the device-matching function would change virDomainUpdateDeviceFlags. I've been trying to think of the best way to deal with this. I *think* what I need to do is have a "partial match" mode (it would use alias to differentiate in case of multiple matches, but otherwise not require it to match) to the low-level function that will be used for virDomainUpdateDeviceFlags, while virDomainDetachDeviceFlags will use the more strict mode.

If I can figure this out quickly, I'll just post a fix and set this BZ directly back to POST, but if I'm unable to get it figured out and posted today, we can decide whether to move this BZ back to ASSIGNED, or ti deckare it done, and then file a separate BZ for the regression.

Comment 14 yalzhang@redhat.com 2021-03-11 08:14:35 UTC
Hi Laine, is there any updates? Thank you!

Comment 15 Laine Stump 2021-03-11 15:15:50 UTC
Sorry, I've been on PTO. I have a patch but need to test it before I post upstream.

Comment 16 Laine Stump 2021-03-15 15:54:55 UTC
Moving back to assigned until the patch I mention above is posted.

Comment 18 Laine Stump 2021-03-17 17:25:29 UTC
I have a patch to fix the regression that Yalan found, but haven't been able to test it adequately yet, which is a prerequisite for posting upstream. (the desired new functionality should already be available in  a (libvirt-7.0.0-4.el8)

Comment 21 Laine Stump 2021-03-23 14:40:02 UTC
I came up with a patch to preserve the old behavior and posted it upstream:

  https://listman.redhat.com/archives/libvir-list/2021-March/msg01177.html

and danpb replied to it saying that he believes that it shouldn't have ever been allowed to modify a device's "unique identifier" (alias name) during device update, and that if it worked before this was an accident, not intentional behavior:

  https://listman.redhat.com/archives/libvir-list/2021-March/msg01183.html

I agree with him, and so therefore I think this BZ should go back to ON_QA, and the test of updating alias name should be removed from the test suite.

As for the confusing error message (where the error says that no match for the MAC address was found, although the problem was that no match for the alias name was found), that is a valid bug, but shouldn't stop this BZ from being verified.

Comment 23 yalzhang@redhat.com 2021-03-24 09:31:56 UTC
Test on libvirt-7.0.0-10.module+el8.4.0+10417+37f6984d.x86_64 with the steps in comment 0, the result is as expected. Set the bug to be verified, and filed Bug 1942367 to track the minor issue about the error message in comment 12.

# virsh start rh
Domain 'rh' started

1. Attach the 2 interfaces one by one, then detach the hostdev type interface, it succeeds by matching the alias.
# virsh attach-device rh virtio.xml 
Device attached successfully

# virsh attach-device rh hostdev.xml
Device attached successfully

# virsh detach-device rh hostdev.xml
Device detached successfully

# cat hostdev.xml
<interface type='hostdev' managed='yes'>
      <mac address='00:00:00:00:00:40'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x82' slot='0x10' function='0x3'/>
      </source>
      <teaming type='transient' persistent='ua-virtio-net'/>
      <alias name='hostdev0'/>
    </interface>

# cat virtio.xml
<interface type='bridge'>
      <mac address='00:00:00:00:00:40'/>
      <source network='host-bridge' portid='7e6ea971-7661-43e7-99aa-2340388c6890' bridge='br0'/>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
      <alias name='ua-virtio-net'/>
    </interface>

Comment 25 errata-xmlrpc 2021-05-25 06:47:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.