Description of problem: When we are migrating VM that has sriov with failover devices attached, the migration freezes on the source and does not want to proceed. It is happening more if before the migration we do plug/unplug of the sriov + failover pair. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Start a VM with sriov + failover vm nics 2. Unplug sriov + failover 3. Plug sriov + failover 4. Migrate VM to a different host Actual results: The migration is stuck on source host. Expected results: The migration should proceed. Additional info: Apr 06 13:48:10 host0 libvirtd[1821]: Domain id=1 name='CentOS' uuid=1170b4ab-1cbf-47b8-825a-aa4e04a0c6a6 is tainted: custom-ga-command Apr 06 13:51:31 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now Apr 06 13:51:37 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now Apr 06 13:51:55 host0 libvirtd[1821]: Cannot start job (query, none, none) for domain CentOS; current job is (async nested, none, migration out) owned by (1853 remoteDispatchDomainMigratePerform3Params, 0 <nul> Apr 06 13:51:55 host0 libvirtd[1821]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) Apr 06 13:52:38 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now Apr 06 13:52:39 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now Apr 06 13:53:44 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now Apr 06 13:56:49 host0 libvirtd[1821]: Failed to terminate process 7755 with SIGTERM: Device or resource busy Apr 06 13:57:19 host0 libvirtd[1821]: Cannot start job (destroy, none, none) for domain CentOS; current job is (async nested, none, migration out) owned by (1853 remoteDispatchDomainMigratePerform3Params, 0 <n> Apr 06 13:57:19 host0 libvirtd[1821]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) Apr 06 13:57:46 host0 libvirtd[1821]: Failed to terminate process 7755 with SIGTERM: Device or resource busy
Forgot to include the versions: libvirt-7.0.0-11.module+el8.4.0+10505+3a8d753f.x86_64 qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64
Laine, can you please have a look, thanks.
Laurent - could this be solved by one of the bugs you fixed? I'm unsure which builds of qemu do/don't have those fixes...
(In reply to Laine Stump from comment #4) > Laurent - could this be solved by one of the bugs you fixed? I'm unsure > which builds of qemu do/don't have those fixes... Fixes are in qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152 So the one that is tested here has the fixes.
(In reply to Ales Musil from comment #1) > Forgot to include the versions: > libvirt-7.0.0-11.module+el8.4.0+10505+3a8d753f.x86_64 > qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64 Could you provide the kernel logs of source guest and host?
Do you wait the end of the failover negotiation before starting the migration?
According the timestamp of the logs of comment #0 what I see in the guest kernel logs is: 1- on boot, failover is correctly set: Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Card present Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000 Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref] Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref] Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: enabling Extended Tags Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref] Apr 6 13:33:16 localhost kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref] Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05] Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: bridge window [io 0xd000-0xdfff] Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: bridge window [mem 0xf9e00000-0xf9ffffff] Apr 6 13:33:16 localhost kernel: pcieport 0000:00:02.4: bridge window [mem 0xfe200000-0xfe3fffff 64bit pref] Apr 6 13:33:16 localhost systemd-udevd[449]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable. Apr 6 13:33:16 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0 Apr 6 13:33:16 localhost kernel: iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver - version 3.2.3-k Apr 6 13:33:16 localhost kernel: Copyright (c) 2013 - 2018 Intel Corporation. Apr 6 13:33:16 localhost kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002) ... Apr 6 13:33:16 localhost kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00 Apr 6 13:33:16 localhost kernel: iavf 0000:05:00.0: GRO is enabled Apr 6 13:33:16 localhost kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0 ... Apr 6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover master:eth0 registered Apr 6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered Apr 6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered Apr 6 13:33:17 localhost kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1 Apr 6 13:33:17 localhost kernel: virtio_net virtio0 enp1s0: renamed from eth0 2- then you unplug the vfio device: Apr 6 13:33:53 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 6 13:33:53 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press Apr 6 13:33:58 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered 3- and you plug back the card: Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Card present Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000 Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref] Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref] Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: enabling Extended Tags Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref] Apr 6 13:34:18 localhost kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref] Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05] Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: bridge window [io 0xd000-0xdfff] Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: bridge window [mem 0xf9e00000-0xf9ffffff] Apr 6 13:34:18 localhost kernel: pcieport 0000:00:02.4: bridge window [mem 0xfe200000-0xfe3fffff 64bit pref] Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002) Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1 Apr 6 13:34:18 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:eth0 registered Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00 Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0: GRO is enabled Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0 Apr 6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready Apr 6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready Apr 6 13:34:18 localhost kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex Apr 6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready And after that no more logs and the system reboots at 13:51:14. If the migration happens between 13:34:18 and 13:51:14, it seems the vfio card has not been automatically removed before migration and that could explain why it freezes. I think manually unplugging the card disables the automatic unplug on migration. 1- To check the freeze comes because of this, you can try to unplug the vfio card before doing the migration and plugged it back on destination when the migration is over. 2- To try to re-enable automatic unplug on migration, when you plug back the cards, you must use the failover parameters: a- plug first the virtio-net card with the parameter "failover=on" b- then plug the vfio card with the parameter "failover_pair_id" set to the virtio-net card id c- check in the guest kernel logs you have the failover registered messages like: virtio_net virtio0 eth0: failover master:eth0 registered virtio_net virtio0 eth0: failover primary slave:enp5s0 registered virtio_net virtio0 eth0: failover standby slave:eth1 registered
I think there has been some misunderstanding, my bad I have provided full log and not only the relevant part. But the stuck migration happens on Apr 8 and there before the migration i can clearly see: Apr 8 12:34:57 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered Which suggest that the vfio device was unplugged.
1- on boot, failover is correctly set: Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Card present Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000 Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref] Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref] Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: enabling Extended Tags Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref] Apr 8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref] Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05] Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [io 0xd000-0xdfff] Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [mem 0xf9e00000-0xf9ffffff] Apr 8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [mem 0xfe200000-0xfe3fffff 64bit pref] ... Apr 8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1 Apr 8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00 Apr 8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: GRO is enabled Apr 8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0 ... Apr 8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover master:eth0 registered Apr 8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered Apr 8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered Apr 8 12:32:59 vm-30-199 kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1 Apr 8 12:32:59 vm-30-199 kernel: virtio_net virtio0 enp1s0: renamed from eth0 2- then the vfio device is unplugged: Apr 8 12:33:09 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex Apr 8 12:33:09 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready Apr 8 12:33:27 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 8 12:33:27 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press Apr 8 12:33:32 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered 3- then the virtio-net device is unplugged: Apr 8 12:33:36 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Attention button pressed Apr 8 12:33:36 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Powering off due to button press Apr 8 12:33:41 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover standby slave:enp1s0nsby unregistered Apr 8 12:33:42 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover master:enp1s0 unregistered 4- the vfio card is plugged back; Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Card present Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000 Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref] Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref] Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: enabling Extended Tags Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref] Apr 8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref] Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05] Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [io 0xd000-0xdfff] Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [mem 0xf9e00000-0xf9ffffff] Apr 8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: bridge window [mem 0xfe200000-0xfe3fffff 64bit pref] Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002) Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1 Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00 Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: GRO is enabled Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0 Apr 8 12:34:15 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready Apr 8 12:34:15 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready Apr 8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex 5- the virtio-net card is plugged back: Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Attention button pressed Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0) Powering on due to button press Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Card present Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Link Up Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: [1af4:1041] type 00 class 0x020000 Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x00000fff] Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref] Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: bridge window [io 0x1000-0x0fff] to [bus 01] add_size 1000 Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: no space for [io size 0x1000] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: failed to assign [io size 0x1000] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: no space for [io size 0x1000] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: failed to assign [io size 0x1000] Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xfa600000-0xfa63ffff pref] Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 4: assigned [mem 0xfea00000-0xfea03fff 64bit pref] Apr 8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 1: assigned [mem 0xfa640000-0xfa640fff] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: PCI bridge to [bus 01] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: bridge window [mem 0xfa600000-0xfa7fffff] Apr 8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: bridge window [mem 0xfea00000-0xfebfffff 64bit pref] Apr 8 12:34:20 vm-30-199 kernel: virtio-pci 0000:01:00.0: enabling device (0000 -> 0002) 6- and failover is enabled: Apr 8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover master:eth0 registered Apr 8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered Apr 8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered Apr 8 12:34:20 vm-30-199 kernel: virtio_net virtio0 enp1s0: renamed from eth0 Apr 8 12:34:20 vm-30-199 kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1 7- the migration starts, the vfio card is unplugged: Apr 8 12:34:21 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0nsby: link becomes ready Apr 8 12:34:21 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready Apr 8 12:34:52 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed Apr 8 12:34:52 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press Apr 8 12:34:57 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered So it seems to work correctly.
I can reproduce the issue on # rpm -q libvirt-libs qemu-kvm libvirt-libs-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64 qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64 1. Start vm with failover setting; 2. hot unplug and then hot plug the hostdev interface; 3. do migration, it fails as: # virsh migrate rh qemu+ssh://dell-per730-36.lab.eng.pek2.redhat.com/system --live --verbose error: Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory but I have checked this file before hot unplug and after hotplug, there is no changes. # ll -Z /sys/bus/pci/devices/0000:04:10.1/config -rw-r--r--. 1 root root system_u:object_r:sysfs_t:s0 4096 Apr 2 04:39 /sys/bus/pci/devices/0000:04:10.1/config the hostdev interface xml is like this: <interface type='hostdev' managed='yes'> <mac address='52:54:00:d8:50:d4'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-test'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface>
(In reply to yalzhang from comment #16) > > but I have checked this file before hot unplug and after hotplug, there is > no changes. > # ll -Z /sys/bus/pci/devices/0000:04:10.1/config > -rw-r--r--. 1 root root system_u:object_r:sysfs_t:s0 4096 Apr 2 04:39 I should check the file on target host, not on the source host. It is not exists on target host, so the migration failed. The migration fail because the "migratable" xml changes after the hot-unplug and hot-plug: 1) start the vm with the failover setting, then check the migratable xml: # virsh dumpxml rh --migratable ... <interface type='network'> <mac address='52:54:00:d8:50:d4'/> <source network='hostdevnet'/> <teaming type='transient' persistent='ua-test'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> 2)do hotunplug and hotplug, and check the migratable xml again: # virsh detach-device rh net.xml Device detached successfully # virsh attach-device rh net.xml Device attached successfully # virsh dumpxml rh --migratable ... <interface type='hostdev' managed='yes'> ====> it changes to specific pci address <mac address='52:54:00:d8:50:d4'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x1'/> </source> <teaming type='transient' persistent='ua-test'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> libvirtd log on target host: 2021-04-13 10:01:35.404+0000: 72901: error : virPCIDeviceNew:1482 : Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory 2021-04-13 10:01:35.404+0000: 72901: error : virHostdevReAttachPCIDevices:1083 : Failed to allocate PCI device list: Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory
(In reply to yalzhang from comment #17) > I should check the file on target host, not on the source host. It is not > exists on target host, so the migration failed. Right. If the exact same device (at the same PCI address on the host) isn't used on source and destination, you need to modify the XML with a hook during migration. So this is a separate unrelated problem (and is expected behavior from libvirt's PoV).
Here is the log message from the source host posted by Laurent split into multiple lines so it's easier to read. It looks like libvirt is stuck waiting for QEMU, and QEMU is waiting for ???. Jirka - do you have an idea from this error what libvirt is waiting for? During a meeting just now Igor says he may have an idea about what QEMU may possibly be waiting for as well... #012 #012During handling of the above exception, another exception occurred: #012 #012Traceback (most recent call last): #012 File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260, in run #012 ret = func(*args, **kwargs) #012 File "/usr/lib/python3.6/site-packages/vdsm/common/logutils.py", line 447, in wrapper #012 return f(*a, **kw) #012 File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 731, in run #012 self.monitor_migration() #012 File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 757, in monitor_migration #012 job_stats = self._vm.job_stats() #012 File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5814, in job_stats #012 return self._dom.jobStats() #012 File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 109, in f #012 raise toe #012vdsm.virt.virdomain.TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) Apr 8 13:35:49 caracal07 vdsm[67255]: WARN executor state: count=5 workers={<Worker name=qgapoller/2 waiting task#=11 at 0x7ff8085bbe48>, <Worker name=qgapoller/0 waiting task#=12 at 0x7ff8085bbc88>, <Worker name=qgapoller/1 waiting task#=11 at 0x7ff8085bbcf8>, <Worker name=qgapoller/3 running <Task discardable <Operation action=<bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7ff82c047048>> at 0x7ff8085bbc50> timeout=30, duration=30.00 at 0x7ff7e84b4a58> discarded task#=10 at 0x7ff8085bbf28>, <Worker name=qgapoller/4 waiting task#=0 at 0x7ff7e849d1d0>}
I can see the corresponding debug logs from libvirt are attached to this bug, but sadly they are quite sparse. They contain a lot of useless stuff (such as virObjectRef/Unref), but there's almost nothing from QEMU driver. Only some events and messages we write to QEMU monitor (but not replies to them). In the logs I can see libvirt started talking to QEMU monitor 2021-04-08 10:35:07.072+0000: 66531: debug : qemuDomainObjEnterMonitorInternal:5809 : Entering monitor (mon=0x7f224c03e0b0 vm=0x55ee4030dea0 name=CentOS) and initiated migration by sending "migrate" QMP command: 2021-04-08 10:35:07.078+0000: 67768: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f224c03e0b0 buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-441"} As a response to this QEMU apparently sent an initial MIGRATION event as libvirt processed it in 2021-04-08 10:35:07.080+0000: 67768: debug : qemuMonitorEmitMigrationStatus:1458 : mon=0x7f224c03e0b0, status=setup But that's it. The mon=0x7f224c03e0b0 monitor object is not mentioned anywhere further in the logs and mainly there's no qemuDomainObjExitMonitorInternal message that would indicate we got a reply from QEMU to the "migrate" command and processed it. The ExitMonitor call is logged elsewhere so it's not missing because of bad log filters. In other words, most likely QEMU got stuck processing the "migrate" command and never replied back except for the initial MIGRATION/setup event.
I'm reassigning to QEMU so they can try to figure out why they're apparently stuck processing the migrate command.
I thought I had an idea and tried to verify it, but it looks like it works, with current QEMU master and the latest RHEL8.5. i.e. migration progresses from wait_unplug state to active (I don't have SRIOV card, but any other nic would do for testing purpose). here is example with some tracing added: qemu-system-x86_64 -enable-kvm -m 2g -M q35 -device pcie-root-port,slot=4,id=root1 -device pcie-root-port,slot=5,id=root2 \ -device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 \ -device e1000e,id=foo,mac=52:54:00:6f:55:cc,failover_pair_id=net1,bus=root2 \ rhel8.5.latest.qcow2 -monitor stdio (qemu) device_del foo pcie_unplug_device: dev->qdev.pending_deleted_event = true (qemu) device_add e1000e,id=foo,mac=52:54:00:6f:55:cc,failover_pair_id=net1,bus=root2 (qemu) migrate "exec:gzip -c > STATEFILE.gz" failover_unplug_primary: pci_dev->partially_hotplugged = true <- automatic unplug initiated by standby nic pcie_unplug_device: dev->qdev.pending_deleted_event = true primary_unplug_pending: 1 [...] <- unplug takes several seconds to complete primary_unplug_pending: 1 pcie_unplug_device: dev->qdev.pending_deleted_event = false primary_unplug_pending: 0 <- primary is gone and that's when migration moves to 'active' state
(In reply to Ales Musil from comment #26) > (In reply to Laine Stump from comment #25) > > (In reply to Ales Musil from comment #24) > > > > > > We are explicitly unplugging the sriov interface before migration. The > > > automatic unplug does not work for RHV because it is leaving detached VFs on the host. > > > > Are you saying that after the migration is finished, the VF on the source > > host is still bound to vfio-pci? That sounds like a bug. Has that been > > investigated at all? A BZ maybe? > > No because I assume that is desired behavior when the VF is managed='no' > isn't it? Okay, yes that is correct behavior when managed='no'. But if you do want the VFs to be re-bound to their NIC driver on the host, then either 1) set managed='yes' in libvirt's XML, or 2) re-bind it to the NIC driver yourself when the migration is complete and the QEMU process on the source host is terminated. So I'm still confused about the reasoning behind manually unplugging the VF. What exactly does it provide that you wouldn't get if you allowed QEMU to implicitly unplug the VF? (As far as I can see, in the end the result would be the same). > If not I can file a BZ for it. > > > > > (an aside - for a long time I've recommended that people set managed='no' > > and pre-bind all their VFs to vfio-pci at host boot time whenever possible. > > It greatly reduces the number of moving parts (and thus potential for > > encountering strange bugs caused by races between the kernel and user > > processes)). > > > > > With explicit unplug we can reattach it. > > > > So are you then only using failover for the automatic guest-side team setup? > > Yes, we planned to use it also for the unplug but the detached VF stood in > the way. If you're doing managed='yes', then libvirt will rebind to the host NIC driver when the source QEMU exits. If you're doing managed='no', then it's always your responsibility to rebind to the host NIC driver when you're done with the VF (or just to simply *never* rebind it to the host NIC - I mean, are you ever actually using any VF as a network device directly on the host? If not, then why are you bothering to re-bind it to the host NIC driver at all?). And if you're doing managed='no' and really do need the NIC bound to the host driver for some reason when it's not used by a guest, then your code should be re-binding to the host NIC driver at the point the QEMU process exits, regardless of whether or not you're using <teaming>/failover.
(In reply to Laine Stump from comment #28) > (In reply to Ales Musil from comment #26) > > (In reply to Laine Stump from comment #25) > > > (In reply to Ales Musil from comment #24) > > > > > > > > We are explicitly unplugging the sriov interface before migration. The > > > > automatic unplug does not work for RHV because it is leaving detached VFs on the host. > > > > > > Are you saying that after the migration is finished, the VF on the source > > > host is still bound to vfio-pci? That sounds like a bug. Has that been > > > investigated at all? A BZ maybe? > > > > No because I assume that is desired behavior when the VF is managed='no' > > isn't it? > > Okay, yes that is correct behavior when managed='no'. But if you do want the > VFs to be re-bound to their NIC driver on the host, then either 1) set > managed='yes' in libvirt's XML, or 2) re-bind it to the NIC driver yourself > when the migration is complete and the QEMU process on the source host is > terminated. > > So I'm still confused about the reasoning behind manually unplugging the VF. > What exactly does it provide that you wouldn't get if you allowed QEMU to > implicitly unplug the VF? (As far as I can see, in the end the result would > be the same). Currently we don't have any way to rebind it after migration. It might be possible to do that, it was way easier for us to unplug the VF before migration and plug it back after. If this is the reason why the migration gets stuck we can change it. But it seems strange because the migration works with this flow after fresh boot it gets stuck only if I do unplug/plug of both teaming devices before migration and not only the VF. > > If not I can file a BZ for it. > > > > > > > > (an aside - for a long time I've recommended that people set managed='no' > > > and pre-bind all their VFs to vfio-pci at host boot time whenever possible. > > > It greatly reduces the number of moving parts (and thus potential for > > > encountering strange bugs caused by races between the kernel and user > > > processes)). > > > > > > > With explicit unplug we can reattach it. > > > > > > So are you then only using failover for the automatic guest-side team setup? > > > > Yes, we planned to use it also for the unplug but the detached VF stood in > > the way. > > If you're doing managed='yes', then libvirt will rebind to the host NIC > driver when the source QEMU exits. If you're doing managed='no', then it's > always your responsibility to rebind to the host NIC driver when you're done > with the VF (or just to simply *never* rebind it to the host NIC - I mean, > are you ever actually using any VF as a network device directly on the host? > If not, then why are you bothering to re-bind it to the host NIC driver at > all?). And if you're doing managed='no' and really do need the NIC bound to > the host driver for some reason when it's not used by a guest, then your > code should be re-binding to the host NIC driver at the point the QEMU > process exits, regardless of whether or not you're using <teaming>/failover. I don't remember the initial reasoning for it being managed='no', but we do allow users to attach host network to VF. I am not sure if there is ever useful flow which would require that anyway it is allowed. Second reason why we need to rebind is that we monitor how many VFs are free on the host, as free VF is considered VF binded to host nic.
(In reply to Igor Mammedov from comment #33) > (In reply to Michael Burman from comment #32) > > HI all, > > > > This bug seems to be much more severe unfortunate. Please help, as it's is > > totally blocking RHV from making the failover feature. > > The symptoms seems to be the same as Ales described and same steps of > > reproduction. > > > > The qemu process is been terminated and dead when trying to perform > > migration after failover nic unplun and plug. > > > > 1. Start VM with failover nic > > 2. unplug the nic > > 3. plug it back > > 4. Try to migrate > > Result, qemu process is terminated immediately and dead. VM is shutdown. > > This is 100% reproduced and I can't determine if this is the same bug or > > another one, but the steps and symptoms are the same, but result is much > > more severe, making the VM unusable. > > > > > > VM Vm2 is down with error. Exit message: Lost connection with qemu process." > > > > Apr 17 11:59:13 caracal07.lab.eng.tlv2.redhat.com kernel: qemu-kvm[15186]: > > segfault at 0 ip 0000000000000000 sp 00007ffca581b248 error 14 in > > qemu-kvm[55daaec72000+b13000] > this looks like different issue, > Can you install debuginfo for qemu and attach to qemu process with gdb > before starting migration > and once it crashes you should be able to capture stack trace. Here is the stack trace that I was able to catch: Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault. 0x0000000000000000 in ?? () (gdb) bt #0 0x0000000000000000 in ?? () #1 0x0000560bf72d4f04 in notifier_list_notify (list=list@entry=0x560bf7b3f4c8 <migration_state_notifiers>, data=data@entry=0x560bf993efd0) at ../util/notify.c:39 #2 0x0000560bf70208e2 in migrate_fd_connect (s=s@entry=0x560bf993efd0, error_in=<optimized out>) at ../migration/migration.c:3636 #3 0x0000560bf6fb2eaa in migration_channel_connect (s=s@entry=0x560bf993efd0, ioc=ioc@entry=0x560bf9dc9810, hostname=hostname@entry=0x0, error=<optimized out>, error@entry=0x0) at ../migration/channel.c:92 #4 0x0000560bf6f7262e in fd_start_outgoing_migration (s=0x560bf993efd0, fdname=<optimized out>, errp=<optimized out>) at ../migration/fd.c:42 #5 0x0000560bf701f056 in qmp_migrate (uri=0x560bf9d6ade0 "fd:migrate", has_blk=<optimized out>, blk=<optimized out>, has_inc=<optimized out>, inc=<optimized out>, has_detach=<optimized out>, detach=true, has_resume=false, resume=false, errp=0x7ffc3006c718) at ../migration/migration.c:2177 #6 0x0000560bf72b4a3e in qmp_marshal_migrate (args=<optimized out>, ret=<optimized out>, errp=0x7f14890bdec0) at qapi/qapi-commands-migration.c:533 #7 0x0000560bf72f87fd in do_qmp_dispatch_bh (opaque=0x7f14890bded0) at ../qapi/qmp-dispatch.c:110 #8 0x0000560bf72c9a8d in aio_bh_call (bh=0x7f13e4006080) at ../util/async.c:164 #9 aio_bh_poll (ctx=ctx@entry=0x560bf98dd340) at ../util/async.c:164 #10 0x0000560bf72cf772 in aio_dispatch (ctx=0x560bf98dd340) at ../util/aio-posix.c:381 #11 0x0000560bf72c9972 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #12 0x00007f1487f1877d in g_main_context_dispatch () from target:/lib64/libglib-2.0.so.0 #13 0x0000560bf72ca9f0 in glib_pollfds_poll () at ../util/main-loop.c:221 #14 os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:244 #15 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:520 #16 0x0000560bf71b2251 in qemu_main_loop () at ../softmmu/vl.c:1679 #17 0x0000560bf6f33942 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
> Steps to Reproduce: > 1. Start a VM with sriov + failover vm nics > 2. Unplug sriov + failover > 3. Plug sriov + failover > 4. Migrate VM to a different host I did a quick test on the qemu part but I *can not* reproduce this problem. Test env: host: 4.18.0-304.el8.x86_64 qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64 guest: 4.18.0-304.el8.x86_64 Test steps: (1) Create a bridge based on the PF and setup bridge (2) create VFs and setup the mac address of the VF 2.1 on the source host: echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs echo 0000:06:10.0 > /sys/bus/pci/devices/0000\:06\:10.0/driver/unbind echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/new_id echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/remove_id ip link set enp6s0f0 vf 0 mac 22:2b:62:bb:a9:82 2.2 on the target host: echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs echo 0000:06:01.0 > /sys/bus/pci/devices/0000\:06\:01.0/driver/unbind echo "14e4 16af" > /sys/bus/pci/drivers/vfio-pci/new_id echo "14e4 16af" > /sys/bus/pci/drivers/vfio-pci/remove_id ip link set enp6s0f0 vf 0 mac 22:2b:62:bb:a9:82 ip link show enp6s0f0 (3) On the source host, start a vm with a failover vf and a failover virtio net device ... -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \ -device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \ (4) check the failover device info in the source vm # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.115 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::1bc6:32d5:1754:f9ba prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:4920:e951:744e:6e7b:ffec prefixlen 64 scopeid 0x0<global> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 591 bytes 68355 (66.7 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 292 bytes 45104 (44.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 335 bytes 27728 (27.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 43 bytes 8534 (8.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.115 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::bf95:c1f7:6157:fc0e prefixlen 64 scopeid 0x20<link> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 256 bytes 40627 (39.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 250 bytes 36808 (35.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 # dmesg | grep -i failover [ 3.341635] virtio_net virtio1 eth0: failover master:eth0 registered [ 3.343309] virtio_net virtio1 eth0: failover standby slave:eth1 registered [ 6.749614] virtio_net virtio1 enp3s0: failover primary slave:eth0 registered (5) On the taget host, start a vm which is in listening mode ... -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \ -device vfio-pci,host=0000:06:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \ -incoming defer \ (6) hot-unplug the failover vf from the source vm qmp: {"execute":"device_del","arguments":{"id":"hostdev0"}} output: {"return": {}} {"timestamp": {"seconds": 1619096278, "microseconds": 710954}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}} (7) hot-plug the failover vf into the source vm qmp: {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:10.0","id":"hostdev0","bus":"root.4","failover_pair_id":"net0"}} output: {"return": {}} (8) start to migrate the vm with failover vf and failover virtio net device (8.1) On the source host, enable related migration capability qmp: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}} (8.2) On the target host , setup migration uri and enable related migration capability qmp: {"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5800"}} {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}} (8.3) migrate the vm from the source host to target host qmp: {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:5800"}} {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}} related timestamp: {"timestamp": {"seconds": 1619097116, "microseconds": 985001}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}} {"timestamp": {"seconds": 1619097127, "microseconds": 901467}, "event": "STOP"} (8.4) check the migration timestamp on the target host {"timestamp": {"seconds": 1619097177, "microseconds": 643704}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net0"}} {"timestamp": {"seconds": 1619097181, "microseconds": 252631}, "event": "RESUME"} (9) check the failover device info in the target vm (9.1) related dmesg when migrating the vm # dmesg [ 1242.786895] virtio_net virtio1 enp3s0: failover primary slave:eth0 unregistered [ 1299.369457] virtio_net virtio1 enp3s0: failover primary slave:eth0 registered (9.2) # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.115 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::1bc6:32d5:1754:f9ba prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:4920:e951:744e:6e7b:ffec prefixlen 64 scopeid 0x0<global> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 1598 bytes 153296 (149.7 KiB) RX errors 0 dropped 293 overruns 0 frame 0 TX packets 494 bytes 79654 (77.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 1625 bytes 125248 (122.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 206 bytes 36523 (35.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.115 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::e016:fd9d:c88f:2ac prefixlen 64 scopeid 0x20<link> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 434 bytes 52535 (51.3 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 329 bytes 52895 (51.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xfc800000-fc807fff
Micheal says the steps to reproduce are this: 1. Start VM with failover nic 2. unplug the nic 3. plug it back 4. Try to migrate I assume these steps are from the point of view of RHV, *not* QEMU correct? If so, then (as far as I understand from Ales' comments) from the point of view of QEMU step (4) is actually: 4.1 - unplug VF again 4.2 - unbind VF from vfio-pci driver 4.3 - bind VF to host VF NIC driver 4.4 - start migration (I'm not exactly clear whether or not 4.2 and 4.3 happen here, but that was kind of implied by Ales' reasoning for RHV unplugging the VF prior to starting migration). So the attempt to reproduce in Comment 37 is incomplete - at the very least, the VF should be unplugged from the guest prior to starting the migration.
(In reply to Ales Musil from comment #36) > (In reply to Igor Mammedov from comment #33) > > (In reply to Michael Burman from comment #32) > > > HI all, > > > > > > This bug seems to be much more severe unfortunate. Please help, as it's is > > > totally blocking RHV from making the failover feature. > > > The symptoms seems to be the same as Ales described and same steps of > > > reproduction. > > > > > > The qemu process is been terminated and dead when trying to perform > > > migration after failover nic unplun and plug. > > > > > > 1. Start VM with failover nic > > > 2. unplug the nic > > > 3. plug it back > > > 4. Try to migrate > > > Result, qemu process is terminated immediately and dead. VM is shutdown. > > > This is 100% reproduced and I can't determine if this is the same bug or > > > another one, but the steps and symptoms are the same, but result is much > > > more severe, making the VM unusable. > > > > > > > > > VM Vm2 is down with error. Exit message: Lost connection with qemu process." > > > > > > Apr 17 11:59:13 caracal07.lab.eng.tlv2.redhat.com kernel: qemu-kvm[15186]: > > > segfault at 0 ip 0000000000000000 sp 00007ffca581b248 error 14 in > > > qemu-kvm[55daaec72000+b13000] > > this looks like different issue, > > Can you install debuginfo for qemu and attach to qemu process with gdb > > before starting migration > > and once it crashes you should be able to capture stack trace. > > Here is the stack trace that I was able to catch: I still can't reproduce it locally, it would be better if you could provide access to the host where it reproduces and exact steps to reproduce (preferably via CLI). Also it would be better to open a new BZ for this crash, it doesn't look related to this BZ. > Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault. > 0x0000000000000000 in ?? () > (gdb) bt > #0 0x0000000000000000 in ?? () > #1 0x0000560bf72d4f04 in notifier_list_notify > (list=list@entry=0x560bf7b3f4c8 <migration_state_notifiers>, > data=data@entry=0x560bf993efd0) at ../util/notify.c:39 it looks like uninitialized callback in one of notifiers, (probably I'll be able to find which one, once I have access to to reproducer) > #2 0x0000560bf70208e2 in migrate_fd_connect (s=s@entry=0x560bf993efd0, > error_in=<optimized out>) at ../migration/migration.c:3636 > #3 0x0000560bf6fb2eaa in migration_channel_connect > (s=s@entry=0x560bf993efd0, ioc=ioc@entry=0x560bf9dc9810, > hostname=hostname@entry=0x0, error=<optimized out>, error@entry=0x0) at > ../migration/channel.c:92 > #4 0x0000560bf6f7262e in fd_start_outgoing_migration (s=0x560bf993efd0, > fdname=<optimized out>, errp=<optimized out>) at ../migration/fd.c:42 > #5 0x0000560bf701f056 in qmp_migrate (uri=0x560bf9d6ade0 "fd:migrate", > has_blk=<optimized out>, blk=<optimized out>, has_inc=<optimized out>, > inc=<optimized out>, has_detach=<optimized out>, detach=true, > has_resume=false, resume=false, > errp=0x7ffc3006c718) at ../migration/migration.c:2177 > #6 0x0000560bf72b4a3e in qmp_marshal_migrate (args=<optimized out>, > ret=<optimized out>, errp=0x7f14890bdec0) at > qapi/qapi-commands-migration.c:533 > #7 0x0000560bf72f87fd in do_qmp_dispatch_bh (opaque=0x7f14890bded0) at > ../qapi/qmp-dispatch.c:110 > #8 0x0000560bf72c9a8d in aio_bh_call (bh=0x7f13e4006080) at > ../util/async.c:164 > #9 aio_bh_poll (ctx=ctx@entry=0x560bf98dd340) at ../util/async.c:164 > #10 0x0000560bf72cf772 in aio_dispatch (ctx=0x560bf98dd340) at > ../util/aio-posix.c:381 > #11 0x0000560bf72c9972 in aio_ctx_dispatch (source=<optimized out>, > callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 > #12 0x00007f1487f1877d in g_main_context_dispatch () from > target:/lib64/libglib-2.0.so.0 > #13 0x0000560bf72ca9f0 in glib_pollfds_poll () at ../util/main-loop.c:221 > #14 os_host_main_loop_wait (timeout=<optimized out>) at > ../util/main-loop.c:244 > #15 main_loop_wait (nonblocking=nonblocking@entry=0) at > ../util/main-loop.c:520 > #16 0x0000560bf71b2251 in qemu_main_loop () at ../softmmu/vl.c:1679 > #17 0x0000560bf6f33942 in main (argc=<optimized out>, argv=<optimized out>, > envp=<optimized out>) at ../softmmu/main.c:50
Hi Ales, If possible, could you please provide the access to your machine and more detailed reproducer? That will be very helpful for me to locate the difference between your test steps and mine. Thanks a lot for your help in advance.
(In reply to Yanghang Liu from comment #40) > Hi Ales, > > If possible, could you please provide the access to your machine and more > detailed reproducer? > > That will be very helpful for me to locate the difference between your test > steps and mine. > > Thanks a lot for your help in advance. Those are exact steps from RHV point of view. I am not sure if I am able to provide detailed steps from QEMU point of view but let me try. RHV step 1. (Start a VM with sriov + failover vm nics) should roughly translate to: 1) Create bridge for failover network 2) Add VF on PF and unbind it 3) Start VM with VF and failover virtio network attached RHV step 2. (Unplug sriov + failover): 4) Unplug VF 5) Rebind VF back to network driver 6) Unplug virtio failover network RHV step 3. (Plug sriov + failover) 7) Unbind VF 8) Plug VF 9) Plug virtio failover network RHV step 4. (Migrate VM to a different host) 10) Uplug VF 11) Rebind VF back to network driver 12) Start migration But I am really not sure if the list is complete hopefully I did not miss anything. As for the machine, sure we can provide you with access, but again the reproducer is from RHV point of view not oinly QEMU.
Hi Ales, Thanks a lot for your explanation. I still have a question that I want to double confirm with you in order to prevent my misunderstanding. > RHV step 4. (Migrate VM to a different host) > 10) Uplug VF > 11) Rebind VF back to network driver > Start migration Does step 10 and step 11 mean that you *manually* hot-unplug the failover VF before migrating the vm ? In fact, on the QEMU part, I could just run the following qmp to migrate the vm: {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:$port"}} This qmp command can *automatically* hot-unplug the failover VF from the source vm and *automatically* hot-plug another VF into the target vm, which means I can migrate the failover VF + failover virtio net vm *without manually* hot-unplug/hot-plug VF.
(In reply to Yanghang Liu from comment #43) > Hi Ales, > > Thanks a lot for your explanation. > > I still have a question that I want to double confirm with you in order to > prevent my misunderstanding. > > > RHV step 4. (Migrate VM to a different host) > > 10) Uplug VF > > 11) Rebind VF back to network driver > > Start migration > > Does step 10 and step 11 mean that you *manually* hot-unplug the failover VF > before migrating the vm ? > > > In fact, on the QEMU part, I could just run the following qmp to migrate the > vm: > > {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:$port"}} > > This qmp command can *automatically* hot-unplug the failover VF from the > source vm and *automatically* hot-plug another VF into the target vm, > which means I can migrate the failover VF + failover virtio net vm *without > manually* hot-unplug/hot-plug VF. Yes, we are doing manual unplug as the automatic one does not work for us because it is not rebinding the driver back. Thanks
(In reply to Ales Musil from comment #29) > (In reply to Laine Stump from comment #28) > > (In reply to Ales Musil from comment #26) > > > (In reply to Laine Stump from comment #25) > > > > (In reply to Ales Musil from comment #24) [...] > > > If not I can file a BZ for it. > > > > > > > > > > > (an aside - for a long time I've recommended that people set managed='no' > > > > and pre-bind all their VFs to vfio-pci at host boot time whenever possible. > > > > It greatly reduces the number of moving parts (and thus potential for > > > > encountering strange bugs caused by races between the kernel and user > > > > processes)). > > > > > > > > > With explicit unplug we can reattach it. > > > > > > > > So are you then only using failover for the automatic guest-side team setup? > > > > > > Yes, we planned to use it also for the unplug but the detached VF stood in > > > the way. > > > > If you're doing managed='yes', then libvirt will rebind to the host NIC > > driver when the source QEMU exits. If you're doing managed='no', then it's > > always your responsibility to rebind to the host NIC driver when you're done > > with the VF (or just to simply *never* rebind it to the host NIC - I mean, > > are you ever actually using any VF as a network device directly on the host? > > If not, then why are you bothering to re-bind it to the host NIC driver at > > all?). And if you're doing managed='no' and really do need the NIC bound to > > the host driver for some reason when it's not used by a guest, then your > > code should be re-binding to the host NIC driver at the point the QEMU > > process exits, regardless of whether or not you're using <teaming>/failover. > > I don't remember the initial reasoning for it being managed='no', but we do > allow > users to attach host network to VF. I am not sure if there is ever useful > flow which > would require that anyway it is allowed. Second reason why we need to rebind > is that > we monitor how many VFs are free on the host, as free VF is considered VF > binded to host > nic. it looks like could've used managed=yes, so libvirt would've done rebinding for you. (i.e. you will have it on host driver as expected) (maybe find another way to count free VFs without rebinding (As Laine mentioned, less moving parts the better))
(In reply to Ales Musil from comment #41) > (In reply to Yanghang Liu from comment #40) > > Hi Ales, > > > > If possible, could you please provide the access to your machine and more > > detailed reproducer? > > > > That will be very helpful for me to locate the difference between your test > > steps and mine. > > > > Thanks a lot for your help in advance. > > Those are exact steps from RHV point of view. I am not sure if I am able to > provide detailed steps from QEMU point of view > but let me try. I finally setup SRIOV host, and tried to reproduce with steps you described (+ sequence of del/add events from libvirt log) including live migration. But I'm not able to reproduce hang (wait_unplug) issue. (speculation: another cause of wait_unplug might be guest, if it refuses to do unplug or fails to complete unplug, then QEMU will be stuck in wait_unplug) So if hang still reproduces, we probably will need access to a host where it reproduces + whatever knobs you use to trigger bug from management side. However I was able to trigger comment 36 crash [1] and one another [2] while trying to reproduce hang. Now let look at workflow below (from QEMU point of view) > RHV step 1. (Start a VM with sriov + failover vm nics) should roughly > translate to: > 1) Create bridge for failover network > 2) Add VF on PF and unbind it > 3) Start VM with VF and failover virtio network attached expected > RHV step 2. (Unplug sriov + failover): > 4) Unplug VF more or less shouldn't explode but > 5) Rebind VF back to network driver from early fail-over discussions I recall that there was a plan to hold on VF resources in QEMU until migration is complete (i.e. only hide device from guest but keep it alive in QEMU so we could plug it back in case of failure). Whether it was implemented I don't know, but this step is potentially wrong one (if not now then it might be wrong in future) > 6) Unplug virtio failover network that's totally unexpected in failover workflow, whole point of which is to keep guest network working while non-migratable primary is unplugged. if at this point you try to migrate you will trigger comment 36 crash (unexpected unplug is not excuse for crash so it is something for us to fix in QEMU) but you shouldn't do standby nic unplug in failover usecase > RHV step 3. (Plug sriov + failover) > 7) Unbind VF > 8) Plug VF > 9) Plug virtio failover network > RHV step 4. (Migrate VM to a different host) > 10) Uplug VF > 11) Rebind VF back to network driver > 12) Start migration what puzzles me is why do you remove all nics and then immediately plug them back :/ Expected failover workflow (which works) is 1. start with all nics plugged in 2. start migration 3. guest failover driver unplugs primary 4. migration completes, destination qemu closed. 5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for you) alternatively one can manually unplug primary before migration but should keep VF bound until migration completes and source QEMU is terminated, and only then do VF unbind/rebind. > But I am really not sure if the list is complete hopefully I did not miss > anything. > > As for the machine, sure we can provide you with access, but again the > reproducer > is from RHV point of view not oinly QEMU. 1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045 2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062
(In reply to Laine Stump from comment #38) > Micheal says the steps to reproduce are this: > > 1. Start VM with failover nic > 2. unplug the nic > 3. plug it back > 4. Try to migrate > > I assume these steps are from the point of view of RHV, *not* QEMU correct? > If so, then (as far as I understand from Ales' comments) from the point of > view of QEMU step (4) is actually: > > 4.1 - unplug VF again > 4.2 - unbind VF from vfio-pci driver > 4.3 - bind VF to host VF NIC driver > 4.4 - start migration > > (I'm not exactly clear whether or not 4.2 and 4.3 happen here, but that was > kind of implied by Ales' reasoning for RHV unplugging the VF prior to > starting migration). > > So the attempt to reproduce in Comment 37 is incomplete - at the very least, > the VF should be unplugged from the guest prior to starting the migration. This is correct. In order to reproduce both issues, the freeze migration and the qemu process termination, you must first unplug the sr-iov failover nic , plug it back and only then migrate. Reproduced all the time with: host- 4.18.0-304.el8.x86_64 qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64 guest- 4.18.0-240.5.el8.x86_64
FYI, as Igor suggested, I have created a new bug to track the qemu procces termination on migration, see bz 1953283 Thank you
(In reply to Igor Mammedov from comment #46) > (In reply to Ales Musil from comment #41) > > (In reply to Yanghang Liu from comment #40) > > > Hi Ales, > > > > > > If possible, could you please provide the access to your machine and more > > > detailed reproducer? > > > > > > That will be very helpful for me to locate the difference between your test > > > steps and mine. > > > > > > Thanks a lot for your help in advance. > > > > Those are exact steps from RHV point of view. I am not sure if I am able to > > provide detailed steps from QEMU point of view > > but let me try. > > I finally setup SRIOV host, and tried to reproduce with steps you described > (+ sequence of del/add events from libvirt log) including live migration. > But I'm not able to reproduce hang (wait_unplug) issue. > (speculation: another cause of wait_unplug might be guest, if it refuses > to do unplug or fails to complete unplug, then QEMU will be stuck in > wait_unplug) > > So if hang still reproduces, we probably will need access to a host where it > reproduces > + whatever knobs you use to trigger bug from management side. > > However I was able to trigger comment 36 crash [1] and one another [2] while > trying to reproduce hang. > > Now let look at workflow below (from QEMU point of view) > > > RHV step 1. (Start a VM with sriov + failover vm nics) should roughly > > translate to: > > 1) Create bridge for failover network > > 2) Add VF on PF and unbind it > > 3) Start VM with VF and failover virtio network attached > > expected > > > RHV step 2. (Unplug sriov + failover): > > 4) Unplug VF > more or less shouldn't explode but > > > 5) Rebind VF back to network driver > from early fail-over discussions I recall that there was a plan to hold on > VF resources in QEMU until migration is complete (i.e. only hide device from > guest > but keep it alive in QEMU so we could plug it back in case of failure). > Whether it was implemented I don't know, but this step is potentially > wrong one (if not now then it might be wrong in future) In this step we are not talking about migration but just about simple unplug of interface. I am kinda confused why should QEMU keep it if it was requested by user to unplug it and is no longer attach to VM? > > > 6) Unplug virtio failover network > that's totally unexpected in failover workflow, > whole point of which is to keep guest network working > while non-migratable primary is unplugged. Why is it not expected? When user wants to unplug sriov + failover from VM we should forbit it and leave the failover hanging there? Maybe it just misunderstanding but point of this part of the flow is to check that user can in fact unplug sriov + failover. It has nothing to do with migration yet. > > if at this point you try to migrate you will trigger comment 36 crash > (unexpected unplug is not excuse for crash so it is something for us to fix > in QEMU) > > but you shouldn't do standby nic unplug in failover usecase > > > RHV step 3. (Plug sriov + failover) > > 7) Unbind VF > > 8) Plug VF > > 9) Plug virtio failover network > > RHV step 4. (Migrate VM to a different host) > > 10) Uplug VF > > 11) Rebind VF back to network driver > > 12) Start migration > > what puzzles me is why do you remove all nics and then immediately plug them > back :/ To test that user can actually do that. It is important for user to be able to unplug interface whenever they need the change in guest networking. > > Expected failover workflow (which works) is > 1. start with all nics plugged in > 2. start migration > 3. guest failover driver unplugs primary > 4. migration completes, destination qemu closed. > 5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for > you) If that is the only supported path we can possibly adjust our code to do that. But in that case it should be documented as the information around this feels puzzling. > > alternatively one can manually unplug primary before migration but should > keep > VF bound until migration completes and source QEMU is terminated, > and only then do VF unbind/rebind. > > > > But I am really not sure if the list is complete hopefully I did not miss > > anything. > > > > As for the machine, sure we can provide you with access, but again the > > reproducer > > is from RHV point of view not oinly QEMU. > > > 1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045 > 2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062
(In reply to Ales Musil from comment #49) > (In reply to Igor Mammedov from comment #46) > > (In reply to Ales Musil from comment #41) > > > (In reply to Yanghang Liu from comment #40) > > > > Hi Ales, > > > > > > > > If possible, could you please provide the access to your machine and more > > > > detailed reproducer? > > > > > > > > That will be very helpful for me to locate the difference between your test > > > > steps and mine. > > > > > > > > Thanks a lot for your help in advance. > > > > > > Those are exact steps from RHV point of view. I am not sure if I am able to > > > provide detailed steps from QEMU point of view > > > but let me try. > > > > I finally setup SRIOV host, and tried to reproduce with steps you described > > (+ sequence of del/add events from libvirt log) including live migration. > > But I'm not able to reproduce hang (wait_unplug) issue. > > (speculation: another cause of wait_unplug might be guest, if it refuses > > to do unplug or fails to complete unplug, then QEMU will be stuck in > > wait_unplug) > > > > So if hang still reproduces, we probably will need access to a host where it > > reproduces > > + whatever knobs you use to trigger bug from management side. > > > > However I was able to trigger comment 36 crash [1] and one another [2] while > > trying to reproduce hang. > > > > Now let look at workflow below (from QEMU point of view) > > > > > RHV step 1. (Start a VM with sriov + failover vm nics) should roughly > > > translate to: > > > 1) Create bridge for failover network > > > 2) Add VF on PF and unbind it > > > 3) Start VM with VF and failover virtio network attached > > > > expected > > > > > RHV step 2. (Unplug sriov + failover): > > > 4) Unplug VF > > more or less shouldn't explode but > > > > > 5) Rebind VF back to network driver > > from early fail-over discussions I recall that there was a plan to hold on > > VF resources in QEMU until migration is complete (i.e. only hide device from > > guest > > but keep it alive in QEMU so we could plug it back in case of failure). > > Whether it was implemented I don't know, but this step is potentially > > wrong one (if not now then it might be wrong in future) > > In this step we are not talking about migration but just about simple unplug > of interface. > I am kinda confused why should QEMU keep it if it was requested by user to > unplug it and is > no longer attach to VM? If you do not intend keeping failover working, it should be fine to unplug both. Hence [1] should be fixed on QEMU side so it won't crash. We also should fix wait_for_unplug hang if we are able to reproduce (but it might be a bit difficult ii it's a guest side of equation). > > > 6) Unplug virtio failover network > > that's totally unexpected in failover workflow, > > whole point of which is to keep guest network working > > while non-migratable primary is unplugged. > > Why is it not expected? When user wants to unplug sriov + failover > from VM we should forbit it and leave the failover hanging there? if user wants to unplug failover pair, it should work. > Maybe it just misunderstanding but point of this part of the flow is to check > that user can in fact unplug sriov + failover. It has nothing to do > with migration yet. > > > if at this point you try to migrate you will trigger comment 36 crash > > (unexpected unplug is not excuse for crash so it is something for us to fix > > in QEMU) > > > > but you shouldn't do standby nic unplug in failover usecase > > > > > RHV step 3. (Plug sriov + failover) > > > 7) Unbind VF > > > 8) Plug VF > > > 9) Plug virtio failover network > > > RHV step 4. (Migrate VM to a different host) > > > 10) Uplug VF > > > 11) Rebind VF back to network driver > > > 12) Start migration > > > > what puzzles me is why do you remove all nics and then immediately plug them > > back :/ > > To test that user can actually do that. It is important for user to be able > to > unplug interface whenever they need the change in guest networking. It's perfectly fine for testing purposes or when user doesn't care about keeping networking connectivity in guest uninterrupted. If the later then user should configure failover to begin with. > > Expected failover workflow (which works) is > > 1. start with all nics plugged in > > 2. start migration > > 3. guest failover driver unplugs primary > > 4. migration completes, destination qemu closed. > > 5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for > > you) > > If that is the only supported path we can possibly adjust our code to do > that. I think that's expected/tested workflow which should be used in cases where failover (uninterrupted guest networking) is needed. > But in that case it should be documented as the information around this feels > puzzling. agreed, documentation could be improved. > > > > alternatively one can manually unplug primary before migration but should > > keep > > VF bound until migration completes and source QEMU is terminated, > > and only then do VF unbind/rebind. > > > > > > > But I am really not sure if the list is complete hopefully I did not miss > > > anything. > > > > > > As for the machine, sure we can provide you with access, but again the > > > reproducer > > > is from RHV point of view not oinly QEMU. > > > > > > 1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045 > > 2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062
It seems that I can use the test steps mentioned by Ales in comment 41 to reproduce this problem on qemu part: Test env: host: 4.18.0-304.el8.x86_64 qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64 guest: 4.18.0-240.5.el8.x86_64 Test step: > RHV step 1. (Start a VM with sriov + failover vm nics) > 1) Create bridge for failover network > 2) Add VF on PF and unbind it The VF can be created and bound to vfio-pci successfully. > 3) Start VM with VF and failover virtio network attached The qemu cmd line of the vm: ... -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root.3 \ -device vfio-pci,host=0000:06:01.0,id=net2,bus=root.4,addr=0x0,failover_pair_id=net1 \ > RHV step 2. (Unplug sriov + failover): > 4) Unplug VF The failover vf can be hot-unplugged from vm successfully qmp: {"execute":"device_del","arguments":{"id":"net2"}} output: {"return": {}} {"timestamp": {"seconds": 1619531311, "microseconds": 199847}, "event": "DEVICE_DELETED", "data": {"device": "net2", "path": "/machine/peripheral/net2"}} > 5) Rebind VF back to network driver The VF can be rebound to its original driver successfully > 6) Unplug virtio failover network 6.1 qmp: {"execute":"device_del","arguments":{"id":"net1"}} output: {"return": {}} {"timestamp": {"seconds": 1619531395, "microseconds": 291584}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net1/virtio-backend"}} {"timestamp": {"seconds": 1619531395, "microseconds": 342474}, "event": "DEVICE_DELETED", "data": {"device": "net1", "path": "/machine/peripheral/net1"}} 6.2 qmp: {"execute":"netdev_del","arguments":{"id":"hostnet0"}} output: {"return": {}} > RHV step 3. (Plug sriov + failover) > 7) Unbind VF The VF can be bound to vfio-pci successfully. > 8) Plug VF qmp: {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:01.0","id":"net2","bus":"root.4","addr":"0x0","failover_pair_id":"net1"}} output: {"return": {}} > 9) Plug virtio failover network 9.1 qmp: {"execute":"netdev_add","arguments":{"type":"tap","id":"hostnet0","vhost":true}} output: {"return": {}} 9.2 qmp: {"execute":"device_add","arguments":{"driver":"virtio-net-pci","failover":"on","netdev":"hostnet0","id":"net1","mac":"52:54:00:6f:55:cc","bus":"root.3","addr":"0x0"}} output: {"return": {}} Some related timestamp info: {"timestamp": {"seconds": 1619531557, "microseconds": 45758}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net1"}} {"timestamp": {"seconds": 1619531557, "microseconds": 294362}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "net1", "path": "/machine/peripheral/net1/virtio-backend"}} > RHV step 4. (Migrate VM to a different host) > 10) Uplug VF qmp: {"execute":"device_del","arguments":{"id":"net2"}} output: {"return": {}} {"timestamp": {"seconds": 1619532153, "microseconds": 922999}, "event": "DEVICE_DELETED", "data": {"device": "net2", "path": "/machine/peripheral/net2"}} > 11) Rebind VF back to network driver The VF can be bound to its original driver successfully > 12) Start migration 12.1 On target host,start a vm which is in listening mode ... -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root.3 \ -incoming defer \ 12.2 On target host,setup migration uri and enable related migration capability qmp: {"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5800"}} output: {"timestamp": {"seconds": 1619532758, "microseconds": 836868}, "event": "MIGRATION", "data": {"status": "setup"}} {"return": {}} qmp: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}} output: {"return": {}} 12.3 On the source host, enable related migration capability qmp: {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}} output: {"return": {}} 12.4 migrate the vm from the source host to target host qmp: {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:5800"}} <--- After running this qmp, the vm is stuck on the source host output: {"return": {}}
Trying to reproduce this problem on the libvirt part: Test env: host: 4.18.0-304.el8.x86_64 qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64 libvirt-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64 guest: 4.18.0-240.5.el8.x86_64 Test step: > RHV step 1. (Start a VM with sriov + failover vm nics) > 1) Create bridge for failover network 1.1 create a bridge named br0 based on the PF 1.2 create vm network # virsh net-dumpxml failover-bridge <network connections='1'> <name>failover-bridge</name> <uuid>abfa7c99-8345-497a-920f-39a1e6aeff9c</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-dumpxml failover-vf <network> <name>failover-vf</name> <uuid>f0837c10-c4ac-4bd8-886d-6b0990131452</uuid> <forward mode='hostdev' managed='yes'> <address type='pci' domain='0x0000' bus='0x06' slot='0x01' function='0x0'/> </forward> </network> > 2) Add VF on PF and unbind it The VF can be created successfully. > 3) Start VM with VF and failover virtio network attached The failover device xml is as following: ... <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-bridge'/> <model type='virtio'/> <teaming type='persistent'/> </interface> <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-vf'/> <teaming type='transient' persistent='net0'/> </interface> ... The failover device qemu cmd line is as following: ... -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 -device virtio-net-pci,failover=on,netdev=hostnet0,id=net0,mac=52:54:00:aa:1c:ef,bus=pci.1,addr=0x0 -device vfio-pci,host=0000:06:01.0,id=hostdev0,bus=pci.4,addr=0x0,failover_pair_id=net0 > RHV step 2. (Unplug sriov + failover): > 4) Unplug VF > 5) Rebind VF back to network driver # virsh detach-device-alias $domain hostdev0 Device detach request sent successfully related qmp info: {"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-385"} {"return": {}, "id": "libvirt-385"} {"timestamp": {"seconds": 1619582492, "microseconds": 654176}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}} > 6) Unplug virtio failover network # virsh detach-device-alias $domain net0 Device detach request sent successfully related qmp info: {"execute":"device_del","arguments":{"id":"net0"},"id":"libvirt-387"} {"timestamp": {"seconds": 1619582622, "microseconds": 786358}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}} {"timestamp": {"seconds": 1619582622, "microseconds": 837246}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}} {"execute":"netdev_del","arguments":{"id":"hostnet0"},"id":"libvirt-389"} {"return": {}, "id": "libvirt-389"} > RHV step 3. (Plug sriov + failover) > 7) Unbind VF > 8) Plug VF # cat failover_vf.xml <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-vf'/> <teaming type='transient' persistent='net0'/> </interface> # virsh attach-device $domain failover_vf.xml Device attached successfully related qmp info: {"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:01.0","id":"hostdev0","bus":"pci.1","addr":"0x0","failover_pair_id":"net0"},"id":"libvirt-390"} {"return": {}, "id": "libvirt-390"} > 9) Plug virtio failover network # cat failover_virtio_net_device.xml <interface type='network'> <mac address='52:54:00:aa:1c:ef'/> <source network='failover-bridge'/> <model type='virtio'/> <teaming type='persistent'/> </interface> # virsh attach-device $domain failover_virtio_net_device.xml Device attached successfully related qmp info: {"execute":"netdev_add","arguments":{"type":"tap","fd":"fd-net00","id":"hostnet0","vhost":true,"vhostfd":"vhostfd-net00"},"id":"libvirt-394"} {"return": {}, "id": "libvirt-393"} {"execute":"device_add","arguments":{"driver":"virtio-net-pci","failover":"on","netdev":"hostnet0","id":"net0","mac":"52:54:00:aa:1c:ef","bus":"pci.4","addr":"0x0"},"id":"libvirt-395"} {"return": {}, "id": "libvirt-394"} Some related timestamp info: {"timestamp": {"seconds": 1619582675, "microseconds": 407390}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net0"}} {"timestamp": {"seconds": 1619582675, "microseconds": 653024}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "net0", "path": "/machine/peripheral/net0/virtio-backend"}} > RHV step 4. (Migrate VM to a different host) > 10) Uplug VF > 11) Rebind VF back to network driver # virsh detach-device-alias $domain hostdev0 Device detach request sent successfully related qmp info: {"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-397"} {"timestamp": {"seconds": 1619582755, "microseconds": 821294}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}} > 12) Start migration 12.1 # virsh migrate --live --verbose $domain qemu+ssh://10.73.73.75/system <--- The vm is stuck after running this cmd The related qmp info that I can observe on the source host when reproduce this problem: {"execute":"query-block","id":"libvirt-399"} {"execute":"query-migrate-parameters","id":"libvirt-400"} {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-401"} {"return": {}, "id": "libvirt-401"} {"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":"","max-bandwidth":9223372036853727232},"id":"libvirt-402"} {"return": {}, "id": "libvirt-402"} {"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-403"} (fd=34) {"return": {}, "id": "libvirt-403"} {"timestamp": {"seconds": 1619582833, "microseconds": 62115}, "event": "MIGRATION", "data": {"status": "setup"}} {"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-404"} <---- the last qmp I could observe on the source host 12.2 check the domain status on the target host # virsh domstate $domain --reason paused (migrating)
Simplify the reproduction steps on the comment 52/comment 53: (1) start a vm *only with a failover virtio net device* (2) hot-unplug the failover virtio net device (3) hot-plug the failover virtio net device back (4) migrate the vm from the source vm in target vm I can use above step to reproduce this bug on qemu/libvirt part. And After use the build which Laurent provided in the comment 51, the migration is completed successfully. # rpm -q qemu-kvm qemu-kvm-5.2.0-15.el8.BZ1953045.x86_64
*** This bug has been marked as a duplicate of bug 1953045 ***