Bug 1693587 tracked the addition of support in libvirt for QEMU's "virtio Failover" feature which, when combined with a capable virtio driver in the guest, transparently pairs a virtio-net emulated NIC with an SRIOV VF assigned from the host (via VFIO) into a simple failover bond; qemu then automatically unplugs the VF prior to migration (which is important because a guest with a VFIO assigned device can't be migrated), then plugs in a similar new device on the destination. The initial support required that the VF be configured in libvirt with "<interface type='hostdev'>", which has the aide effect of setting the proper MAC address for the VF via its PF on the host. Meanwhile, when CNV looked at implementing VFIO assignment of VFs, they found that they were unable to use <interface type='hostdev'>, because they run libvirtd inside a container that doesn't have access to the PF - the libvirtd in the container can only see the VF, and so is unable to initialize its MAC address. So they added VF MAC configuration to their own code that runs outside the container. Then they began implementing migration for guests with a VFIO-assigned VF. Since a guest with a VFIO-assigned device can't be migrated, this means that they must complicate their migration code by unplugging the VF, then migrating, and then plugging in a different VF on the destination host. On top of that, in order for guest network connectivity to be maintained during the migration, they must rely on the user to add an additional emulated NIC to the guest, and configure some sort of failover bond device in the guest OS' network config. As an alternative, I suggested that I could enhance libvirt to make <teaming> usable within their limitations - that is the purpose of the patches described in this BZ. It's quite simple - libvirt's parsing/formatting of the <teaming> element in <interface> is extracted into its own functions, a teaming element is added to <hostdev>, and the requisite functions are called by the parser/formatter for <hostdev>, as well as pointing the QEMU driver at the right place when building the vfio-pci device commandline. As a result, as long as the guest OS has a recent version of the virtio-net guest driver (there are versions for both Windows and Linux), CNV will be able to just specify one <interface type='bridge'> (for the virtio "persistent" device) and one <hostdev> (for the VF) in the libvirt config, set the MAC address of the VF, then start up the guest; when it is time to migrate, they won't need to do any hotplugging (which has caused difficulties for them already due to libvirtd being run unprivileged and containerized, see Bug 1916346). In the end CNV may end up not using <teaming> for their assigned VFs. But if they do, at least it won't be because of this one simple missing link.
Patches pushed upstream to support this. They will be in libvirt-7.1.0: commit 5d74e2f168d69541038896925b08f09807a1fa39 Author: Laine Stump <laine> Date: Wed Feb 10 20:08:29 2021 -0500 conf: make teaming info an official type Commit 13be68094d47693fd5346d45612d05de425e2529 Author: Laine Stump <laine> Date: Wed Feb 10 21:09:58 2021 -0500 conf: use virDomainNetTeamingInfoPtr instead of virDomainNetTeamingInfo commit dea27109119da13e8ed0c564edd7796d98bb795c Author: Laine Stump <laine> Date: Wed Feb 10 22:44:08 2021 -0500 conf: separate Parse/Format functions for virDomainNetTeamingInfo commit 5cea59b2b3cbe4218d8311da177de95403c10980 Author: Laine Stump <laine> Date: Wed Feb 10 22:59:31 2021 -0500 schema: separate teaming element definition from interface element commit db64acfbda59ad22b671580fda13968c60bb8c1a Author: Laine Stump <laine> Date: Thu Feb 11 00:58:29 2021 -0500 conf: parse/format <teaming> element in plain <hostdev> commit 010ed0856bb06f439e6fdf44e4f529f53441c398 Author: Laine Stump <laine> Date: Thu Feb 11 02:05:15 2021 -0500 qemu: plug <teaming> config from <hostdev> into qemu commandline commit bebaafd6b4a54b35f0d6676ab9156ea1489cbf5e (HEAD -> master, upstream/master, active-detach-compare-alias) Author: Laine Stump <laine> Date: Thu Feb 11 02:47:29 2021 -0500 news: document support for <teaming> in <hostdev>
Hi laine, Does it support live migration? if yes, we should ensure there is vf with the same pci address on the target host. I have tried to migrate, it failed with: # virsh migrate rhel --live --verbose qemu+ssh://dell-xx.lab.eng.pek2.redhat.com/system root.eng.pek2.redhat.com's password: error: Operation not supported: cannot migrate a domain with <hostdev mode='subsystem' type='pci'> The configuration is as below: <interface type='bridge'> <mac address='52:54:00:aa:1c:ef'/> <source network='host-bridge' portid='34e111b8-1b34-407e-8e25-b52b6e7d8c54' bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-backup0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </interface> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-backup0'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </hostdev> And when checked the doc, I found: [PATCH 5/7] conf: parse/format <teaming> element in plain <hostdev> + <hostdev mode='subsystem' type='pci' managed='no'> + <source> + <address domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> + </source> + <mac address='00:11:22:33:44:55:66'/> + <teaming type='transient' persistent='ua-backup0'/> + </interface> The mac address can not be set in hostdev device, suggest removing the mac address line here. and s/</interface>/</hostdev>
(In reply to yalzhang from comment #6) > Hi laine, > > Does it support live migration? if yes, we should ensure there is vf with > the same pci address on the target host. Yes, it should. The complicated part is that, since you don't have the abstraction of VF pools provided by <interface type='network'>, you have to either (as you suggest) have identical hardware on the destination host, or you have to put hooks into the migration to modify the XML during the migration to change the PCI address of the VF. > I have tried to migrate, it failed > with: > # virsh migrate rhel --live --verbose > qemu+ssh://dell-xx.lab.eng.pek2.redhat.com/system > root.eng.pek2.redhat.com's password: > error: Operation not supported: cannot migrate a domain with <hostdev > mode='subsystem' type='pci'> Ooh. Oops! I need to fix that! (I hadn't tested migration because my secondary migration host is currently out of commission, There is one check I forgot to add. I will make that patch, get it pushed upstream and backported, then move this BZ back to POST. > > The configuration is as below: > <interface type='bridge'> > <mac address='52:54:00:aa:1c:ef'/> > <source network='host-bridge' > portid='34e111b8-1b34-407e-8e25-b52b6e7d8c54' bridge='br0'/> > <target dev='vnet0'/> > <model type='virtio'/> > <teaming type='persistent'/> > <alias name='ua-backup0'/> > <address type='pci' domain='0x0000' bus='0x01' slot='0x00' > function='0x0'/> > </interface> > <hostdev mode='subsystem' type='pci' managed='yes'> > <driver name='vfio'/> > <source> > <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/> > </source> > <teaming type='transient' persistent='ua-backup0'/> > <alias name='hostdev0'/> > <address type='pci' domain='0x0000' bus='0x05' slot='0x00' > function='0x0'/> > </hostdev> > > And when checked the doc, I found: > [PATCH 5/7] conf: parse/format <teaming> element in plain <hostdev> > > + <hostdev mode='subsystem' type='pci' managed='no'> > + <source> > + <address domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> > + </source> > + <mac address='00:11:22:33:44:55:66'/> > + <teaming type='transient' persistent='ua-backup0'/> > + </interface> > > The mac address can not be set in hostdev device, suggest removing the mac > address line here. > and > s/</interface>/</hostdev> Sigh. Correct on both points. Thanks for the proof-reading. I cut-pasted the documentation example at the last minute and was too hasty in committing.
Corrections for both problems listed in Comment 7 have been posted upstream: https://listman.redhat.com/archives/libvir-list/2021-February/msg01133.html
Fixes for bugs found by Yalzhang in Comment 7 pushed upstream, will be in 7.1.0: commit 98e67d4d8ca933b5fa2ec4fbc35dfe7cd8b1547b Author: Laine Stump <laine> Date: Tue Feb 23 17:21:56 2021 -0500 qemu: allow migration of generic <hostdev> with <teaming> commit a0cef16787930c810263f1edd057e038cb6406e3 (HEAD -> master, upstream/master) Author: Laine Stump <laine> Date: Tue Feb 23 17:30:51 2021 -0500 docs: fix bad cut/paste in <teaming> example
Test on libvirt-7.0.0-7.module+el8.4.0+10195+258bbfb7.x86_64, the result is as expected. In addition, if the hostdev device on the target host has different pci address with the source host, we can migrate with "--xml" option with a modified xml attached. Except mac address, vlan and virtualport can not be set in hostdev device. The scenarios as below: 1. Start vm with hostdev device with teaming setting, and check the network functionality; 2. Start vm without any interface, then hotplug the bridge interface and hostdev device, check the network functionality, then unplug the hostdev device; 3. Migrate vm with hostdev device with teaming setting, succeed. Details: the hostdev device is one of the vf S1: start vm 1. Set the mac address of vf, then start the vm with interface as below: # ip link set enp130s0f1 vf 0 mac 52:54:00:96:a4:f1 Interface setting: <interface type='bridge'> <mac address='52:54:00:96:a4:f1'/> <source network='host-bridge' portid='ac87b686-47bc-4d21-ad6c-2572b0f15776' bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-backup0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-backup0'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> </hostdev> # virsh start rh Domain 'rh' started Login the vm to check the network functions: [root@bootp-73-33-113 ~]# ifconfig -a enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.157 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::bdc1:4a2b:3428:6dde prefixlen 64 scopeid 0x20<link> ether 52:54:00:96:a4:f1 txqueuelen 1000 (Ethernet) RX packets 89 bytes 18707 (18.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 140 bytes 17468 (17.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.113 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 2620:52:0:4920:383:2ff8:1058:6fb2 prefixlen 64 scopeid 0x0<global> inet6 fe80::6c6c:f483:f38b:ed80 prefixlen 64 scopeid 0x20<link> ether 52:54:00:96:a4:f1 txqueuelen 1000 (Ethernet) RX packets 245 bytes 32325 (31.5 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 171 bytes 23330 (22.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::e89b:2415:cc11:54bc prefixlen 64 scopeid 0x20<link> ether 52:54:00:96:a4:f1 txqueuelen 1000 (Ethernet) RX packets 156 bytes 13618 (13.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 31 bytes 5862 (5.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@bootp-73-33-113 ~]# ping www.baidu.com -c 2 PING www.a.shifen.com (220.181.38.150) 56(84) bytes of data. 64 bytes from 220.181.38.150 (220.181.38.150): icmp_seq=1 ttl=46 time=3.33 ms 64 bytes from 220.181.38.150 (220.181.38.150): icmp_seq=2 ttl=46 time=3.39 ms .... S2: hotunplug/hotplug Start a vm without any interfaces as below: # virsh start rh Domain 'rh' started Then do hotplug: # cat bridge_interface.xml <interface type='network'> <mac address='52:54:00:96:a4:f1'/> <source network='host-bridge'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-backup0'/> </interface> Set the mac of the vf then prepare the xml: # ip link set enp130s0f1 vf 0 mac 52:54:00:96:a4:f1 # cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-backup0'/> </hostdev> # virsh attach-device rh bridge_interface.xml Device attached successfully # virsh attach-device rh hostdev.xml Device attached successfully Login the vm to check the functionality: # ping www.baidu.com -c 2 PING www.a.shifen.com (220.181.38.150) 56(84) bytes of data. 64 bytes from 220.181.38.150 (220.181.38.150): icmp_seq=1 ttl=46 time=3.35 ms 64 bytes from 220.181.38.150 (220.181.38.150): icmp_seq=2 ttl=46 time=3.35 ms ... hotunplug: Keep the ping running on guest, and unplug the hostdev device by: # cat hostdev.xml <hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x82' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-backup0'/> </hostdev> # virsh detach-device rh hostdev.xml Device detached successfully check the ping should work on the vm as well S3: migration 1. Prepare the network, enable vfs on the target host, and set the vf mac address; if the vf’s mac address is different than the src hostdev device, modify the pci address in the xml of the src host: # virsh dumpxml rh > rh_migrate.xml # vim rh_migrate.xml (edit the pci address of the vf in the hostdev device element) # cat rh_migrate.xml | grep /hostdev -B6 <source> <address domain='0x0000' bus='0x04' slot='0x10' function='0x0'/> </source> … # virsh migrate rh --live --verbose --p2p qemu+ssh://dell-per730-37.lab.eng.pek2.redhat.com/system --xml rh_migrate.xml Migration: [100 %] Migration succeeds, and check the network functionality on the vm, it works well. Migrate back to the src host, and check the network functionality, it works well.
Test with PF but without network functionality check on vm as the hardware environment limitation.
More scenarios test as below, the result is as expected except migration with postcopy. After migration with postcopy, the hostdev device did not exists on the vm. I think it is the same issue with Bug 1817965, so I will add a comment there and track the issue there. 1. When there is no same pci device: # virsh migrate rh --live --verbose qemu+ssh://dell-per730-58.lab.eng.pek2.redhat.com/system --p2p error: Device 0000:82:10.0 not found: could not access /sys/bus/pci/devices/0000:82:10.0/config: No such file or directory 2. Cancel the migration then check the vm’s status: # virsh migrate rh --live --verbose qemu+ssh://dell-per730-58.lab.eng.pek2.redhat.com/system --p2p --xml rh_migrate.xml Migration: [ 65 %]^Cerror: operation aborted: migration out job: canceled by client Check on the vm, the hostdev device reattached and the network works well. 3. Migrate with postcopy: On first terminal: # virsh migrate rh --live --verbose qemu+ssh://dell-per730-58.lab.eng.pek2.redhat.com/system --p2p --xml rh_migrate.xml --bandwidth 10 --postcopy On the 2nd terminal: # virsh event --all --loop During the migration is running, execute the cmd on the 3nd terminal: # virsh migrate-postcopy rh Check the migration is successful on the first terminal; And there is a postcopy event on the 2nd terminal; # virsh event --all --loop event 'migration-iteration' for domain 'rh': iteration: '1' event 'lifecycle' for domain 'rh': Suspended Migrated event 'lifecycle' for domain 'rh': Suspended Post-copy event 'migration-iteration' for domain 'rh': iteration: '2' event 'lifecycle' for domain 'rh': Stopped Migrated event 'job-completed' for domain 'rh': … Check on the dst host, the hostdev interface is not exists on the vm any more: [root@bootp-73-33-220 ~]# lspci | grep Eth 01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01) [root@bootp-73-33-220 ~]# ifconfig -a enp1s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.220 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 2620:52:0:4920:8a2c:4e3b:d37b:1c86 prefixlen 64 scopeid 0x0<global> inet6 fe80::d0e6:cf2e:3c5d:399b prefixlen 64 scopeid 0x20<link> ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 207 bytes 18457 (18.0 KiB) RX errors 0 dropped 92 overruns 0 frame 0 TX packets 148 bytes 16424 (16.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp1s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 52:54:00:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 297 bytes 24920 (24.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 144 bytes 17582 (17.1 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 … check on the dst host: # readlink -s /sys/bus/pci/devices/0000:05:10.1/driver ../../../../bus/pci/drivers/vfio-pci
test managedsave and save, it never finished just as bug 1815426, will track the issue there. terminal 1: [root@dell-per730-36 ~]# virsh save rh rh.save ^Cerror: Failed to save domain 'rh' to rh.save error: operation aborted: domain save job: canceled by client terminal 2: on the vm, no difference when run the save/managedsave command until the save/managedsave canceled. Once the save/mangedsave canceled, the hostdev interface unregistered from the vm's os just as bug 1815426 [root@bootp-73-33-220 ~]# ping www.baidu.com PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data. 64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=1 ttl=46 time=3.68 ms 64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=2 ttl=46 time=3.65 ms 64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=4 ttl=46 time=3.64 ms 64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=5 ttl=46 time=3.62 ms [ 41.034476] pcieport 0000:00:02.3: Slot(0-3): Attention button pressed [ 41.035900] pcieport 0000:00:02.3: Slot(0-3): Powering off due to button press [ 41.037421] pcieport 0000:00:02.3: Slot(0-3): Card not present [ 41.051917] virtio_net virtio0 enp1s0: failover primary slave:enp4s0 unregistered From bootp-73-33-220.lab.eng.pek2.redhat.com (10.73.33.220) icmp_seq=19 Destination Host Unreachable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098