1946981 – Migration freezes with failover sriov device

Bug 1946981 - Migration freezes with failover sriov device

Summary: Migration freezes with failover sriov device

Keywords:
Status:	CLOSED DUPLICATE of bug 1953045
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	qemu-kvm
Sub Component:
Version:	8.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	rc
Target Release:	8.4
Assignee:	Laurent Vivier
QA Contact:	Yanghang Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1688177
TreeView+	depends on / blocked

Reported:	2021-04-07 12:18 UTC by Ales Musil
Modified:	2022-06-20 08:08 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-04-30 09:42:35 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)

Description Ales Musil 2021-04-07 12:18:26 UTC

Description of problem:
When we are migrating VM that has sriov with failover devices attached, the migration freezes on the source and does not want to proceed. It is happening 
more if before the migration we do plug/unplug of the sriov + failover pair. 

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Start a VM with sriov + failover vm nics
2. Unplug sriov + failover
3. Plug sriov + failover 
4. Migrate VM to a different host

Actual results:
The migration is stuck on source host.

Expected results:
The migration should proceed. 


Additional info:
Apr 06 13:48:10 host0 libvirtd[1821]: Domain id=1 name='CentOS' uuid=1170b4ab-1cbf-47b8-825a-aa4e04a0c6a6 is tainted: custom-ga-command
Apr 06 13:51:31 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now
Apr 06 13:51:37 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now
Apr 06 13:51:55 host0 libvirtd[1821]: Cannot start job (query, none, none) for domain CentOS; current job is (async nested, none, migration out) owned by (1853 remoteDispatchDomainMigratePerform3Params, 0 <nul>
Apr 06 13:51:55 host0 libvirtd[1821]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Apr 06 13:52:38 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now
Apr 06 13:52:39 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now
Apr 06 13:53:44 host0 libvirtd[1821]: Guest agent is not responding: Guest agent not available for now
Apr 06 13:56:49 host0 libvirtd[1821]: Failed to terminate process 7755 with SIGTERM: Device or resource busy
Apr 06 13:57:19 host0 libvirtd[1821]: Cannot start job (destroy, none, none) for domain CentOS; current job is (async nested, none, migration out) owned by (1853 remoteDispatchDomainMigratePerform3Params, 0 <n>
Apr 06 13:57:19 host0 libvirtd[1821]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Apr 06 13:57:46 host0 libvirtd[1821]: Failed to terminate process 7755 with SIGTERM: Device or resource busy

Comment 1 Ales Musil 2021-04-07 12:20:07 UTC

Forgot to include the versions:
libvirt-7.0.0-11.module+el8.4.0+10505+3a8d753f.x86_64
qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64

Comment 3 Jaroslav Suchanek 2021-04-08 14:29:44 UTC

Laine, can you please have a look, thanks.

Comment 4 Laine Stump 2021-04-08 18:57:06 UTC

Laurent - could this be solved by one of the bugs you fixed? I'm unsure which builds of qemu do/don't have those fixes...

Comment 5 Laurent Vivier 2021-04-09 07:57:20 UTC

(In reply to Laine Stump from comment #4)
> Laurent - could this be solved by one of the bugs you fixed? I'm unsure
> which builds of qemu do/don't have those fixes...

Fixes are in qemu-kvm-5.2.0-10.module+el8.4.0+10217+cbdd2152

So the one that is tested here has the fixes.

Comment 6 Laurent Vivier 2021-04-09 08:16:37 UTC

(In reply to Ales Musil from comment #1)
> Forgot to include the versions:
> libvirt-7.0.0-11.module+el8.4.0+10505+3a8d753f.x86_64
> qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64

Could you provide the kernel logs of source guest and host?

Comment 7 Laurent Vivier 2021-04-09 08:29:43 UTC

Do you wait the end of the failover negotiation before starting the migration?

Comment 9 Laurent Vivier 2021-04-09 10:10:15 UTC

According the timestamp of the logs of comment #0 what I see in the guest kernel logs is:

1- on boot, failover is correctly set:

Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Card present
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref]
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref]
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: enabling Extended Tags
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref]
Apr  6 13:33:16 localhost kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref]
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05]
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4:   bridge window [io  0xd000-0xdfff]
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4:   bridge window [mem 0xf9e00000-0xf9ffffff]
Apr  6 13:33:16 localhost kernel: pcieport 0000:00:02.4:   bridge window [mem 0xfe200000-0xfe3fffff 64bit pref]
Apr  6 13:33:16 localhost systemd-udevd[449]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Apr  6 13:33:16 localhost kernel: scsi 0:0:0:0: Attached scsi generic sg0 type 0
Apr  6 13:33:16 localhost kernel: iavf: Intel(R) Ethernet Adaptive Virtual Function Network Driver - version 3.2.3-k
Apr  6 13:33:16 localhost kernel: Copyright (c) 2013 - 2018 Intel Corporation.
Apr  6 13:33:16 localhost kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002)
...
Apr  6 13:33:16 localhost kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00
Apr  6 13:33:16 localhost kernel: iavf 0000:05:00.0: GRO is enabled
Apr  6 13:33:16 localhost kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0
...
Apr  6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover master:eth0 registered
Apr  6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered
Apr  6 13:33:17 localhost kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered
Apr  6 13:33:17 localhost kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1
Apr  6 13:33:17 localhost kernel: virtio_net virtio0 enp1s0: renamed from eth0

2- then you unplug the vfio device:

Apr  6 13:33:53 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  6 13:33:53 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press
Apr  6 13:33:58 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered

3- and you plug back the card:

Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Card present
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref]
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref]
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: enabling Extended Tags
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref]
Apr  6 13:34:18 localhost kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref]
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05]
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4:   bridge window [io  0xd000-0xdfff]
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4:   bridge window [mem 0xf9e00000-0xf9ffffff]
Apr  6 13:34:18 localhost kernel: pcieport 0000:00:02.4:   bridge window [mem 0xfe200000-0xfe3fffff 64bit pref]
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002)
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1
Apr  6 13:34:18 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:eth0 registered
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0: GRO is enabled
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0
Apr  6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
Apr  6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
Apr  6 13:34:18 localhost kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex
Apr  6 13:34:18 localhost kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready

And after that no more logs and the system reboots at 13:51:14.

If the migration happens between 13:34:18 and 13:51:14, it seems the vfio card has not been automatically removed before migration and that could explain why it freezes.

I think manually unplugging the card disables the automatic unplug on migration.

1- To check the freeze comes because of this, you can try to unplug the vfio card before doing the migration and plugged it back on destination when the migration is over.

2- To try to re-enable automatic unplug on migration, when you plug back the cards, you must use the failover parameters:

  a- plug first the virtio-net card with the parameter "failover=on"
  b- then plug the vfio card with the parameter "failover_pair_id" set to the virtio-net card id
  c- check in the guest kernel logs you have the failover registered messages like:

     virtio_net virtio0 eth0: failover master:eth0 registered
     virtio_net virtio0 eth0: failover primary slave:enp5s0 registered
     virtio_net virtio0 eth0: failover standby slave:eth1 registered

Comment 10 Ales Musil 2021-04-09 10:30:03 UTC

I think there has been some misunderstanding, my bad I have provided full log and not only the relevant part.
But the stuck migration happens on Apr 8 and there before the migration i can clearly see:

Apr  8 12:34:57 localhost kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered


Which suggest that the vfio device was unplugged.

Comment 11 Laurent Vivier 2021-04-09 11:25:46 UTC

1- on boot, failover is correctly set:

Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Card present
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref]
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref]
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: enabling Extended Tags
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref]
Apr  8 12:32:58 vm-30-199 kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref]
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05]
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [io  0xd000-0xdfff]
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [mem 0xf9e00000-0xf9ffffff]
Apr  8 12:32:58 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [mem 0xfe200000-0xfe3fffff 64bit pref]
...
Apr  8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1
Apr  8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00
Apr  8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0: GRO is enabled
Apr  8 12:32:58 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0
...
Apr  8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover master:eth0 registered
Apr  8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered
Apr  8 12:32:59 vm-30-199 kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered
Apr  8 12:32:59 vm-30-199 kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1
Apr  8 12:32:59 vm-30-199 kernel: virtio_net virtio0 enp1s0: renamed from eth0

2- then the vfio device is unplugged:

Apr  8 12:33:09 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex
Apr  8 12:33:09 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp5s0: link becomes ready
Apr  8 12:33:27 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  8 12:33:27 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press
Apr  8 12:33:32 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered

3- then the virtio-net device is unplugged:

Apr  8 12:33:36 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Attention button pressed
Apr  8 12:33:36 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Powering off due to button press
Apr  8 12:33:41 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover standby slave:enp1s0nsby unregistered
Apr  8 12:33:42 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover master:enp1s0 unregistered

4- the vfio card is plugged back;

Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4) Powering on due to button press
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Card present
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Link Up
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: [8086:154c] type 00 class 0x020000
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: reg 0x10: [mem 0x00000000-0x0000ffff 64bit pref]
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: reg 0x1c: [mem 0x00000000-0x00003fff 64bit pref]
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: enabling Extended Tags
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: BAR 0: assigned [mem 0xfe200000-0xfe20ffff 64bit pref]
Apr  8 12:34:15 vm-30-199 kernel: pci 0000:05:00.0: BAR 3: assigned [mem 0xfe210000-0xfe213fff 64bit pref]
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4: PCI bridge to [bus 05]
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [io  0xd000-0xdfff]
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [mem 0xf9e00000-0xf9ffffff]
Apr  8 12:34:15 vm-30-199 kernel: pcieport 0000:00:02.4:   bridge window [mem 0xfe200000-0xfe3fffff 64bit pref]
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: enabling device (0000 -> 0002)
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: Multiqueue Disabled: Queue pair count = 1
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: MAC address: 56:6f:ca:a1:00:00
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0: GRO is enabled
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: renamed from eth0
Apr  8 12:34:15 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
Apr  8 12:34:15 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_UP): enp5s0: link is not ready
Apr  8 12:34:15 vm-30-199 kernel: iavf 0000:05:00.0 enp5s0: NIC Link is Up Speed is 1000 Mbps Full Duplex

5- the virtio-net card is plugged back:

Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Attention button pressed
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0) Powering on due to button press
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Card present
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: Slot(0): Link Up
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: [1af4:1041] type 00 class 0x020000
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x00000fff]
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: bridge window [io  0x1000-0x0fff] to [bus 01] add_size 1000
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: no space for [io  size 0x1000]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: failed to assign [io  size 0x1000]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: no space for [io  size 0x1000]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: BAR 13: failed to assign [io  size 0x1000]
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 6: assigned [mem 0xfa600000-0xfa63ffff pref]
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 4: assigned [mem 0xfea00000-0xfea03fff 64bit pref]
Apr  8 12:34:20 vm-30-199 kernel: pci 0000:01:00.0: BAR 1: assigned [mem 0xfa640000-0xfa640fff]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0: PCI bridge to [bus 01]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0:   bridge window [mem 0xfa600000-0xfa7fffff]
Apr  8 12:34:20 vm-30-199 kernel: pcieport 0000:00:02.0:   bridge window [mem 0xfea00000-0xfebfffff 64bit pref]
Apr  8 12:34:20 vm-30-199 kernel: virtio-pci 0000:01:00.0: enabling device (0000 -> 0002)

6- and failover is enabled:

Apr  8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover master:eth0 registered
Apr  8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover primary slave:enp5s0 registered
Apr  8 12:34:20 vm-30-199 kernel: virtio_net virtio0 eth0: failover standby slave:eth1 registered
Apr  8 12:34:20 vm-30-199 kernel: virtio_net virtio0 enp1s0: renamed from eth0
Apr  8 12:34:20 vm-30-199 kernel: virtio_net virtio0 enp1s0nsby: renamed from eth1

7- the migration starts, the vfio card is unplugged:

Apr  8 12:34:21 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0nsby: link becomes ready
Apr  8 12:34:21 vm-30-199 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
Apr  8 12:34:52 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Attention button pressed
Apr  8 12:34:52 vm-30-199 kernel: pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press
Apr  8 12:34:57 vm-30-199 kernel: virtio_net virtio0 enp1s0: failover primary slave:enp5s0 unregistered

So it seems to work correctly.

Comment 16 yalzhang@redhat.com 2021-04-13 10:04:35 UTC

I can reproduce the issue on 
# rpm -q libvirt-libs qemu-kvm
libvirt-libs-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64

1. Start vm with failover setting;
2. hot unplug and then hot plug the hostdev interface;
3. do migration, it fails as:
# virsh migrate rh qemu+ssh://dell-per730-36.lab.eng.pek2.redhat.com/system --live --verbose 
error: Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory

but I have checked this file before hot unplug and after hotplug, there is no changes.
# ll -Z /sys/bus/pci/devices/0000:04:10.1/config
-rw-r--r--. 1 root root system_u:object_r:sysfs_t:s0 4096 Apr  2 04:39 /sys/bus/pci/devices/0000:04:10.1/config

the hostdev interface xml is like this:
<interface type='hostdev' managed='yes'>
      <mac address='52:54:00:d8:50:d4'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
      </source>
      <teaming type='transient' persistent='ua-test'/>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>

Comment 17 yalzhang@redhat.com 2021-04-13 10:13:59 UTC

(In reply to yalzhang from comment #16)

> 
> but I have checked this file before hot unplug and after hotplug, there is
> no changes.
> # ll -Z /sys/bus/pci/devices/0000:04:10.1/config
> -rw-r--r--. 1 root root system_u:object_r:sysfs_t:s0 4096 Apr  2 04:39

I should check the file on target host, not on the source host. It is not exists on target host, so the migration failed.
The migration fail because the "migratable" xml changes after the hot-unplug and hot-plug:
1) start the vm with the failover setting, then check the migratable xml:
# virsh dumpxml rh  --migratable
...
<interface type='network'>
      <mac address='52:54:00:d8:50:d4'/>
      <source network='hostdevnet'/>
      <teaming type='transient' persistent='ua-test'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>
2)do hotunplug and hotplug, and check the migratable xml again:
# virsh detach-device rh net.xml
Device detached successfully
# virsh attach-device rh net.xml
Device attached successfully

# virsh dumpxml rh  --migratable 
...
<interface type='hostdev' managed='yes'>   ====> it changes to specific pci address
      <mac address='52:54:00:d8:50:d4'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x04' slot='0x10' function='0x1'/>
      </source>
      <teaming type='transient' persistent='ua-test'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </interface>

libvirtd log on target host:
2021-04-13 10:01:35.404+0000: 72901: error : virPCIDeviceNew:1482 : Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory
2021-04-13 10:01:35.404+0000: 72901: error : virHostdevReAttachPCIDevices:1083 : Failed to allocate PCI device list: Device 0000:04:10.1 not found: could not access /sys/bus/pci/devices/0000:04:10.1/config: No such file or directory

Comment 18 Laine Stump 2021-04-13 14:04:05 UTC

(In reply to yalzhang from comment #17)

> I should check the file on target host, not on the source host. It is not
> exists on target host, so the migration failed.

Right. If the exact same device (at the same PCI address on the host) isn't used on source and destination, you need to modify the XML with a hook during migration. So this is a separate unrelated problem (and is expected behavior from libvirt's PoV).

Comment 19 Laine Stump 2021-04-13 14:05:55 UTC

Here is the log message from the source host posted by Laurent split into multiple lines so it's easier to read. It looks like libvirt is stuck waiting for QEMU, and QEMU is waiting for ???. Jirka - do you have an idea from this error what libvirt is waiting for?

During a meeting just now Igor says he may have an idea about what QEMU may possibly be waiting for as well...

#012
#012During handling of the above exception, another exception occurred:
#012
#012Traceback (most recent call last):
#012  File "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260, in run
#012    ret = func(*args, **kwargs)
#012  File "/usr/lib/python3.6/site-packages/vdsm/common/logutils.py", line 447, in wrapper
#012    return f(*a, **kw)
#012  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 731, in run
#012    self.monitor_migration()
#012  File "/usr/lib/python3.6/site-packages/vdsm/virt/migration.py", line 757, in monitor_migration
#012    job_stats = self._vm.job_stats()
#012  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 5814, in job_stats
#012    return self._dom.jobStats()
#012  File "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 109, in f
#012    raise toe
#012vdsm.virt.virdomain.TimeoutError: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params)
Apr  8 13:35:49 caracal07 vdsm[67255]:
   WARN executor state: count=5 workers={<Worker name=qgapoller/2 waiting task#=11 at 0x7ff8085bbe48>,
<Worker name=qgapoller/0 waiting task#=12 at 0x7ff8085bbc88>,
<Worker name=qgapoller/1 waiting task#=11 at 0x7ff8085bbcf8>,
<Worker name=qgapoller/3 running <Task discardable <Operation action=<bound method QemuGuestAgentPoller._poller of <vdsm.virt.qemuguestagent.QemuGuestAgentPoller object at 0x7ff82c047048>> at 0x7ff8085bbc50> timeout=30, duration=30.00 at 0x7ff7e84b4a58> discarded task#=10 at 0x7ff8085bbf28>, <Worker name=qgapoller/4 waiting task#=0 at 0x7ff7e849d1d0>}

Comment 20 Jiri Denemark 2021-04-13 14:42:00 UTC

I can see the corresponding debug logs from libvirt are attached to this bug,
but sadly they are quite sparse. They contain a lot of useless stuff (such as
virObjectRef/Unref), but there's almost nothing from QEMU driver. Only some
events and messages we write to QEMU monitor (but not replies to them).

In the logs I can see libvirt started talking to QEMU monitor

    2021-04-08 10:35:07.072+0000: 66531: debug : qemuDomainObjEnterMonitorInternal:5809 : Entering monitor
        (mon=0x7f224c03e0b0 vm=0x55ee4030dea0 name=CentOS)

and initiated migration by sending "migrate" QMP command:

    2021-04-08 10:35:07.078+0000: 67768: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f224c03e0b0
        buf={"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-441"}

As a response to this QEMU apparently sent an initial MIGRATION event as
libvirt processed it in

    2021-04-08 10:35:07.080+0000: 67768: debug : qemuMonitorEmitMigrationStatus:1458 : mon=0x7f224c03e0b0, status=setup

But that's it. The mon=0x7f224c03e0b0 monitor object is not mentioned anywhere
further in the logs and mainly there's no qemuDomainObjExitMonitorInternal
message that would indicate we got a reply from QEMU to the "migrate" command
and processed it. The ExitMonitor call is logged elsewhere so it's not missing
because of bad log filters.

In other words, most likely QEMU got stuck processing the "migrate" command
and never replied back except for the initial MIGRATION/setup event.

Comment 21 Laine Stump 2021-04-13 15:21:25 UTC

I'm reassigning to QEMU so they can try to figure out why they're apparently stuck processing the migrate command.

Comment 23 Igor Mammedov 2021-04-13 16:15:37 UTC

I thought I had an idea and tried to verify it,
but it looks like it works, with current QEMU master and the latest RHEL8.5.
i.e. migration progresses from wait_unplug state to active
(I don't have SRIOV card, but any other nic would do for testing purpose).

here is example with some tracing added:

qemu-system-x86_64 -enable-kvm -m 2g -M q35 -device pcie-root-port,slot=4,id=root1 -device pcie-root-port,slot=5,id=root2 \
     -device virtio-net-pci,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root1 \
     -device e1000e,id=foo,mac=52:54:00:6f:55:cc,failover_pair_id=net1,bus=root2 \
     rhel8.5.latest.qcow2 -monitor stdio

(qemu) device_del foo
pcie_unplug_device: dev->qdev.pending_deleted_event = true
(qemu) device_add e1000e,id=foo,mac=52:54:00:6f:55:cc,failover_pair_id=net1,bus=root2
(qemu) migrate "exec:gzip -c > STATEFILE.gz"
failover_unplug_primary: pci_dev->partially_hotplugged = true  <- automatic unplug initiated by standby nic
pcie_unplug_device: dev->qdev.pending_deleted_event = true
primary_unplug_pending: 1
[...]                                                          <- unplug takes several seconds to complete
primary_unplug_pending: 1
pcie_unplug_device: dev->qdev.pending_deleted_event = false
primary_unplug_pending: 0                                      <- primary is gone and that's when migration moves to 'active' state

Comment 28 Laine Stump 2021-04-15 18:14:53 UTC

(In reply to Ales Musil from comment #26)
> (In reply to Laine Stump from comment #25)
> > (In reply to Ales Musil from comment #24)
> > > 
> > > We are explicitly unplugging the sriov interface before migration. The
> > > automatic unplug does not work for RHV because it is leaving detached VFs on the host.
> > 
> > Are you saying that after the migration is finished, the VF on the source
> > host is still bound to vfio-pci?  That sounds like a bug. Has that been
> > investigated at all? A BZ maybe?
> 
> No because I assume that is desired behavior when the VF is managed='no'
> isn't it? 

Okay, yes that is correct behavior when managed='no'. But if you do want the VFs to be re-bound to their NIC driver on the host, then either 1) set managed='yes' in libvirt's XML, or 2) re-bind it to the NIC driver yourself when the migration is complete and the QEMU process on the source host is terminated.

So I'm still confused about the reasoning behind manually unplugging the VF. What exactly does it provide that you wouldn't get if you allowed QEMU to implicitly unplug the VF? (As far as I can see, in the end the result would be the same).

> If not I can file a BZ for it. 
> 
> > 
> > (an aside - for a long time I've recommended that people set managed='no'
> > and pre-bind all their VFs to vfio-pci at host boot time whenever possible.
> > It greatly reduces the number of moving parts (and thus potential for
> > encountering strange bugs caused by races between the kernel and user
> > processes)).
> > 
> > > With explicit unplug we can reattach it.
> > 
> > So are you then only using failover for the automatic guest-side team setup?
> 
> Yes, we planned to use it also for the unplug but the detached VF stood in
> the way.

If you're doing managed='yes', then libvirt will rebind to the host NIC driver when the source QEMU exits. If you're doing managed='no', then it's always your responsibility to rebind to the host NIC driver when you're done with the VF (or just to simply *never* rebind it to the host NIC - I mean, are you ever actually using any VF as a network device directly on the host? If not, then why are you bothering to re-bind it to the host NIC driver at all?). And if you're doing managed='no' and really do need the NIC bound to the host driver for some reason when it's not used by a guest, then your code should be re-binding to the host NIC driver at the point the QEMU process exits, regardless of whether or not you're using <teaming>/failover.

Comment 29 Ales Musil 2021-04-16 05:43:36 UTC

(In reply to Laine Stump from comment #28)
> (In reply to Ales Musil from comment #26)
> > (In reply to Laine Stump from comment #25)
> > > (In reply to Ales Musil from comment #24)
> > > > 
> > > > We are explicitly unplugging the sriov interface before migration. The
> > > > automatic unplug does not work for RHV because it is leaving detached VFs on the host.
> > > 
> > > Are you saying that after the migration is finished, the VF on the source
> > > host is still bound to vfio-pci?  That sounds like a bug. Has that been
> > > investigated at all? A BZ maybe?
> > 
> > No because I assume that is desired behavior when the VF is managed='no'
> > isn't it? 
> 
> Okay, yes that is correct behavior when managed='no'. But if you do want the
> VFs to be re-bound to their NIC driver on the host, then either 1) set
> managed='yes' in libvirt's XML, or 2) re-bind it to the NIC driver yourself
> when the migration is complete and the QEMU process on the source host is
> terminated.
> 
> So I'm still confused about the reasoning behind manually unplugging the VF.
> What exactly does it provide that you wouldn't get if you allowed QEMU to
> implicitly unplug the VF? (As far as I can see, in the end the result would
> be the same).
 
Currently we don't have any way to rebind it after migration. It might be possible
to do that, it was way easier for us to unplug the VF before migration 
and plug it back after. 

If this is the reason why the migration gets stuck we can change it. But it seems
strange because the migration works with this flow after fresh boot it gets stuck
only if I do unplug/plug of both teaming devices before migration and not only the VF.


> > If not I can file a BZ for it. 
> > 
> > > 
> > > (an aside - for a long time I've recommended that people set managed='no'
> > > and pre-bind all their VFs to vfio-pci at host boot time whenever possible.
> > > It greatly reduces the number of moving parts (and thus potential for
> > > encountering strange bugs caused by races between the kernel and user
> > > processes)).
> > > 
> > > > With explicit unplug we can reattach it.
> > > 
> > > So are you then only using failover for the automatic guest-side team setup?
> > 
> > Yes, we planned to use it also for the unplug but the detached VF stood in
> > the way.
> 
> If you're doing managed='yes', then libvirt will rebind to the host NIC
> driver when the source QEMU exits. If you're doing managed='no', then it's
> always your responsibility to rebind to the host NIC driver when you're done
> with the VF (or just to simply *never* rebind it to the host NIC - I mean,
> are you ever actually using any VF as a network device directly on the host?
> If not, then why are you bothering to re-bind it to the host NIC driver at
> all?). And if you're doing managed='no' and really do need the NIC bound to
> the host driver for some reason when it's not used by a guest, then your
> code should be re-binding to the host NIC driver at the point the QEMU
> process exits, regardless of whether or not you're using <teaming>/failover.

I don't remember the initial reasoning for it being managed='no', but we do allow
users to attach host network to VF. I am not sure if there is ever useful flow which
would require that anyway it is allowed. Second reason why we need to rebind is that
we monitor how many VFs are free on the host, as free VF is considered VF binded to host
nic.

Comment 36 Ales Musil 2021-04-22 07:00:36 UTC

(In reply to Igor Mammedov from comment #33)
> (In reply to Michael Burman from comment #32)
> > HI all,
> > 
> > This bug seems to be much more severe unfortunate. Please help, as it's is
> > totally blocking RHV from making the failover feature.
> > The symptoms seems to be the same as Ales described and same steps of
> > reproduction.
> > 
> > The qemu process is been terminated and dead when trying to perform
> > migration after failover nic unplun and plug.
> > 
> > 1. Start VM with failover nic
> > 2. unplug the nic
> > 3. plug it back
> > 4. Try to migrate
> > Result, qemu process is terminated immediately and dead. VM is shutdown. 
> > This is 100% reproduced and I can't determine if this is the same bug or
> > another one, but the steps and symptoms are the same, but result is much
> > more severe, making the VM unusable.
> > 
> > 
> > VM Vm2 is down with error. Exit message: Lost connection with qemu process."
> > 
> > Apr 17 11:59:13 caracal07.lab.eng.tlv2.redhat.com kernel: qemu-kvm[15186]:
> > segfault at 0 ip 0000000000000000 sp 00007ffca581b248 error 14 in
> > qemu-kvm[55daaec72000+b13000]
> this looks like different issue,
> Can you install debuginfo for qemu and attach to qemu process with gdb
> before starting migration
> and once it crashes you should be able to capture stack trace.

Here is the stack trace that I was able to catch:

Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000560bf72d4f04 in notifier_list_notify (list=list@entry=0x560bf7b3f4c8 <migration_state_notifiers>, data=data@entry=0x560bf993efd0) at ../util/notify.c:39
#2  0x0000560bf70208e2 in migrate_fd_connect (s=s@entry=0x560bf993efd0, error_in=<optimized out>) at ../migration/migration.c:3636
#3  0x0000560bf6fb2eaa in migration_channel_connect (s=s@entry=0x560bf993efd0, ioc=ioc@entry=0x560bf9dc9810, hostname=hostname@entry=0x0, error=<optimized out>, error@entry=0x0) at ../migration/channel.c:92
#4  0x0000560bf6f7262e in fd_start_outgoing_migration (s=0x560bf993efd0, fdname=<optimized out>, errp=<optimized out>) at ../migration/fd.c:42
#5  0x0000560bf701f056 in qmp_migrate (uri=0x560bf9d6ade0 "fd:migrate", has_blk=<optimized out>, blk=<optimized out>, has_inc=<optimized out>, inc=<optimized out>, has_detach=<optimized out>, detach=true, has_resume=false, resume=false, 
    errp=0x7ffc3006c718) at ../migration/migration.c:2177
#6  0x0000560bf72b4a3e in qmp_marshal_migrate (args=<optimized out>, ret=<optimized out>, errp=0x7f14890bdec0) at qapi/qapi-commands-migration.c:533
#7  0x0000560bf72f87fd in do_qmp_dispatch_bh (opaque=0x7f14890bded0) at ../qapi/qmp-dispatch.c:110
#8  0x0000560bf72c9a8d in aio_bh_call (bh=0x7f13e4006080) at ../util/async.c:164
#9  aio_bh_poll (ctx=ctx@entry=0x560bf98dd340) at ../util/async.c:164
#10 0x0000560bf72cf772 in aio_dispatch (ctx=0x560bf98dd340) at ../util/aio-posix.c:381
#11 0x0000560bf72c9972 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306
#12 0x00007f1487f1877d in g_main_context_dispatch () from target:/lib64/libglib-2.0.so.0
#13 0x0000560bf72ca9f0 in glib_pollfds_poll () at ../util/main-loop.c:221
#14 os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:244
#15 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:520
#16 0x0000560bf71b2251 in qemu_main_loop () at ../softmmu/vl.c:1679
#17 0x0000560bf6f33942 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50

Comment 37 Yanghang Liu 2021-04-22 13:31:14 UTC

> Steps to Reproduce:
> 1. Start a VM with sriov + failover vm nics
> 2. Unplug sriov + failover
> 3. Plug sriov + failover 
> 4. Migrate VM to a different host

I did a quick test on the qemu part but I *can not* reproduce this problem.

Test env:
host:
4.18.0-304.el8.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
guest:
4.18.0-304.el8.x86_64

Test steps:
(1) Create a bridge based on the PF and setup bridge
(2) create VFs and setup the mac address of the VF

2.1 on the source host:
echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs
echo 0000:06:10.0 > /sys/bus/pci/devices/0000\:06\:10.0/driver/unbind
echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "8086 10ed" > /sys/bus/pci/drivers/vfio-pci/remove_id
ip link set enp6s0f0  vf 0  mac 22:2b:62:bb:a9:82

2.2 on the target host:
echo 1 > /sys/bus/pci/devices/0000\:06\:00.0/sriov_numvfs
echo 0000:06:01.0 > /sys/bus/pci/devices/0000\:06\:01.0/driver/unbind
echo "14e4 16af" > /sys/bus/pci/drivers/vfio-pci/new_id
echo "14e4 16af" > /sys/bus/pci/drivers/vfio-pci/remove_id
ip link set enp6s0f0 vf 0  mac 22:2b:62:bb:a9:82
ip link show enp6s0f0

(3) On the source host, start a vm with a failover vf and a failover virtio net device 
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \

(4) check the failover device info in the source vm

# ifconfig 
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.115  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 fe80::1bc6:32d5:1754:f9ba  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4920:e951:744e:6e7b:ffec  prefixlen 64  scopeid 0x0<global>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 591  bytes 68355 (66.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 292  bytes 45104 (44.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 335  bytes 27728 (27.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 43  bytes 8534 (8.3 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.115  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 fe80::bf95:c1f7:6157:fc0e  prefixlen 64  scopeid 0x20<link>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 256  bytes 40627 (39.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 250  bytes 36808 (35.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

# dmesg | grep -i failover
[    3.341635] virtio_net virtio1 eth0: failover master:eth0 registered
[    3.343309] virtio_net virtio1 eth0: failover standby slave:eth1 registered
[    6.749614] virtio_net virtio1 enp3s0: failover primary slave:eth0 registered


(5) On the taget host, start a vm which is in listening mode
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
-device vfio-pci,host=0000:06:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \
-incoming defer \

(6) hot-unplug the failover vf from the source vm

qmp:
{"execute":"device_del","arguments":{"id":"hostdev0"}} 

output:
{"return": {}}
{"timestamp": {"seconds": 1619096278, "microseconds": 710954}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}

 
(7) hot-plug the failover vf into the source vm
qmp:
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:10.0","id":"hostdev0","bus":"root.4","failover_pair_id":"net0"}}

output:
{"return": {}}


(8) start to migrate the vm with failover vf and failover virtio net device

(8.1) On the source host, enable related migration capability
qmp:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}


(8.2) On the target host , setup migration uri and enable related  migration capability
qmp:
{"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5800"}}
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}}

（8.3） migrate the vm from the source host to target host
qmp:
{"execute": "migrate","arguments":{"uri": "tcp:$ip_address:5800"}}
{"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}

related timestamp:
{"timestamp": {"seconds": 1619097116, "microseconds": 985001}, "event": "UNPLUG_PRIMARY", "data": {"device-id": "hostdev0"}}
{"timestamp": {"seconds": 1619097127, "microseconds": 901467}, "event": "STOP"}



(8.4) check the migration timestamp on the target host
{"timestamp": {"seconds": 1619097177, "microseconds": 643704}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net0"}}
{"timestamp": {"seconds": 1619097181, "microseconds": 252631}, "event": "RESUME"}


(9) check the failover device info in the target vm

(9.1) related dmesg when migrating the vm
# dmesg
[ 1242.786895] virtio_net virtio1 enp3s0: failover primary slave:eth0 unregistered
[ 1299.369457] virtio_net virtio1 enp3s0: failover primary slave:eth0 registered

(9.2)
# ifconfig 
enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.115  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 fe80::1bc6:32d5:1754:f9ba  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:4920:e951:744e:6e7b:ffec  prefixlen 64  scopeid 0x0<global>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 1598  bytes 153296 (149.7 KiB)
        RX errors 0  dropped 293  overruns 0  frame 0
        TX packets 494  bytes 79654 (77.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 1625  bytes 125248 (122.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 206  bytes 36523 (35.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.73.33.115  netmask 255.255.254.0  broadcast 10.73.33.255
        inet6 fe80::e016:fd9d:c88f:2ac  prefixlen 64  scopeid 0x20<link>
        ether 22:2b:62:bb:a9:82  txqueuelen 1000  (Ethernet)
        RX packets 434  bytes 52535 (51.3 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 329  bytes 52895 (51.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xfc800000-fc807fff

Comment 38 Laine Stump 2021-04-22 19:46:49 UTC

Micheal says the steps to reproduce are this:

1. Start VM with failover nic
2. unplug the nic
3. plug it back
4. Try to migrate

I assume these steps are from the point of view of RHV, *not* QEMU correct? If so, then (as far as I understand from Ales' comments) from the point of view of QEMU step (4) is actually:

4.1 - unplug VF again
4.2 - unbind VF from vfio-pci driver
4.3 - bind VF to host VF NIC driver
4.4 - start migration

(I'm not exactly clear whether or not 4.2 and 4.3 happen here, but that was kind of implied by Ales' reasoning for RHV unplugging the VF prior to starting migration).

So the attempt to reproduce in Comment 37 is incomplete - at the very least, the VF should be unplugged from the guest prior to starting the migration.

Comment 39 Igor Mammedov 2021-04-22 21:43:53 UTC

(In reply to Ales Musil from comment #36)
> (In reply to Igor Mammedov from comment #33)
> > (In reply to Michael Burman from comment #32)
> > > HI all,
> > > 
> > > This bug seems to be much more severe unfortunate. Please help, as it's is
> > > totally blocking RHV from making the failover feature.
> > > The symptoms seems to be the same as Ales described and same steps of
> > > reproduction.
> > > 
> > > The qemu process is been terminated and dead when trying to perform
> > > migration after failover nic unplun and plug.
> > > 
> > > 1. Start VM with failover nic
> > > 2. unplug the nic
> > > 3. plug it back
> > > 4. Try to migrate
> > > Result, qemu process is terminated immediately and dead. VM is shutdown. 
> > > This is 100% reproduced and I can't determine if this is the same bug or
> > > another one, but the steps and symptoms are the same, but result is much
> > > more severe, making the VM unusable.
> > > 
> > > 
> > > VM Vm2 is down with error. Exit message: Lost connection with qemu process."
> > > 
> > > Apr 17 11:59:13 caracal07.lab.eng.tlv2.redhat.com kernel: qemu-kvm[15186]:
> > > segfault at 0 ip 0000000000000000 sp 00007ffca581b248 error 14 in
> > > qemu-kvm[55daaec72000+b13000]
> > this looks like different issue,
> > Can you install debuginfo for qemu and attach to qemu process with gdb
> > before starting migration
> > and once it crashes you should be able to capture stack trace.
> 
> Here is the stack trace that I was able to catch:

I still can't reproduce it locally,
it would be better if you could provide access to the host where it reproduces
and exact steps to reproduce (preferably via CLI).

Also it would be better to open a new BZ for this crash,
it doesn't look related to this BZ.

> Thread 1 "qemu-kvm" received signal SIGSEGV, Segmentation fault.
> 0x0000000000000000 in ?? ()
> (gdb) bt
> #0  0x0000000000000000 in ?? ()
> #1  0x0000560bf72d4f04 in notifier_list_notify
> (list=list@entry=0x560bf7b3f4c8 <migration_state_notifiers>,
> data=data@entry=0x560bf993efd0) at ../util/notify.c:39

it looks like uninitialized callback in one of notifiers,
(probably I'll be able to find which one, once I have access to to reproducer)

> #2  0x0000560bf70208e2 in migrate_fd_connect (s=s@entry=0x560bf993efd0,
> error_in=<optimized out>) at ../migration/migration.c:3636
> #3  0x0000560bf6fb2eaa in migration_channel_connect
> (s=s@entry=0x560bf993efd0, ioc=ioc@entry=0x560bf9dc9810,
> hostname=hostname@entry=0x0, error=<optimized out>, error@entry=0x0) at
> ../migration/channel.c:92
> #4  0x0000560bf6f7262e in fd_start_outgoing_migration (s=0x560bf993efd0,
> fdname=<optimized out>, errp=<optimized out>) at ../migration/fd.c:42
> #5  0x0000560bf701f056 in qmp_migrate (uri=0x560bf9d6ade0 "fd:migrate",
> has_blk=<optimized out>, blk=<optimized out>, has_inc=<optimized out>,
> inc=<optimized out>, has_detach=<optimized out>, detach=true,
> has_resume=false, resume=false, 
>     errp=0x7ffc3006c718) at ../migration/migration.c:2177
> #6  0x0000560bf72b4a3e in qmp_marshal_migrate (args=<optimized out>,
> ret=<optimized out>, errp=0x7f14890bdec0) at
> qapi/qapi-commands-migration.c:533
> #7  0x0000560bf72f87fd in do_qmp_dispatch_bh (opaque=0x7f14890bded0) at
> ../qapi/qmp-dispatch.c:110
> #8  0x0000560bf72c9a8d in aio_bh_call (bh=0x7f13e4006080) at
> ../util/async.c:164
> #9  aio_bh_poll (ctx=ctx@entry=0x560bf98dd340) at ../util/async.c:164
> #10 0x0000560bf72cf772 in aio_dispatch (ctx=0x560bf98dd340) at
> ../util/aio-posix.c:381
> #11 0x0000560bf72c9972 in aio_ctx_dispatch (source=<optimized out>,
> callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306
> #12 0x00007f1487f1877d in g_main_context_dispatch () from
> target:/lib64/libglib-2.0.so.0
> #13 0x0000560bf72ca9f0 in glib_pollfds_poll () at ../util/main-loop.c:221
> #14 os_host_main_loop_wait (timeout=<optimized out>) at
> ../util/main-loop.c:244
> #15 main_loop_wait (nonblocking=nonblocking@entry=0) at
> ../util/main-loop.c:520
> #16 0x0000560bf71b2251 in qemu_main_loop () at ../softmmu/vl.c:1679
> #17 0x0000560bf6f33942 in main (argc=<optimized out>, argv=<optimized out>,
> envp=<optimized out>) at ../softmmu/main.c:50

Comment 40 Yanghang Liu 2021-04-23 03:08:40 UTC

Hi Ales,

If possible, could you please provide the access to your machine and more detailed reproducer?

That will be very helpful for me to locate the difference between your test steps and mine.

Thanks a lot for your help in advance.

Comment 41 Ales Musil 2021-04-23 05:30:28 UTC

(In reply to Yanghang Liu from comment #40)
> Hi Ales,
> 
> If possible, could you please provide the access to your machine and more
> detailed reproducer?
> 
> That will be very helpful for me to locate the difference between your test
> steps and mine.
> 
> Thanks a lot for your help in advance.

Those are exact steps from RHV point of view. I am not sure if I am able to provide detailed steps from QEMU point of view 
but let me try. 

RHV step 1. (Start a VM with sriov + failover vm nics) should roughly translate to:
1) Create bridge for failover network
2) Add VF on PF and unbind it
3) Start VM with VF and failover virtio network attached
 
RHV step 2. (Unplug sriov + failover):
4) Unplug VF
5) Rebind VF back to network driver
6) Unplug virtio failover network

RHV step 3. (Plug sriov + failover) 
7) Unbind VF
8) Plug VF
9) Plug virtio failover network 

RHV step 4. (Migrate VM to a different host)
10) Uplug VF 
11) Rebind VF back to network driver
12) Start migration 

But I am really not sure if the list is complete hopefully I did not miss anything. 

As for the machine, sure we can provide you with access, but again the reproducer
is from RHV point of view not oinly QEMU.

Comment 43 Yanghang Liu 2021-04-23 08:29:51 UTC

Hi Ales，

Thanks a lot for your explanation.

I still have a question that I want to double confirm with you in order to prevent my misunderstanding.

> RHV step 4. (Migrate VM to a different host)
> 10) Uplug VF 
> 11) Rebind VF back to network driver
> Start migration 

Does step 10 and step 11 mean that you *manually* hot-unplug the failover VF before migrating the vm ?


In fact, on the QEMU part, I could just run the following qmp to migrate the vm:

    {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:$port"}}

This qmp command can *automatically* hot-unplug the failover VF from the source vm and *automatically* hot-plug another VF into the target vm,
which means I can migrate the failover VF + failover virtio net vm *without manually* hot-unplug/hot-plug VF.

Comment 44 Ales Musil 2021-04-23 09:04:31 UTC

(In reply to Yanghang Liu from comment #43)
> Hi Ales，
> 
> Thanks a lot for your explanation.
> 
> I still have a question that I want to double confirm with you in order to
> prevent my misunderstanding.
> 
> > RHV step 4. (Migrate VM to a different host)
> > 10) Uplug VF 
> > 11) Rebind VF back to network driver
> > Start migration 
> 
> Does step 10 and step 11 mean that you *manually* hot-unplug the failover VF
> before migrating the vm ?
> 
> 
> In fact, on the QEMU part, I could just run the following qmp to migrate the
> vm:
> 
>     {"execute": "migrate","arguments":{"uri": "tcp:$ip_address:$port"}}
> 
> This qmp command can *automatically* hot-unplug the failover VF from the
> source vm and *automatically* hot-plug another VF into the target vm,
> which means I can migrate the failover VF + failover virtio net vm *without
> manually* hot-unplug/hot-plug VF.

Yes, we are doing manual unplug as the automatic one does not work for us because it is 
not rebinding the driver back. 

Thanks

Comment 45 Igor Mammedov 2021-04-23 18:36:38 UTC

(In reply to Ales Musil from comment #29)
> (In reply to Laine Stump from comment #28)
> > (In reply to Ales Musil from comment #26)
> > > (In reply to Laine Stump from comment #25)
> > > > (In reply to Ales Musil from comment #24)
[...]
> > > If not I can file a BZ for it. 
> > > 
> > > > 
> > > > (an aside - for a long time I've recommended that people set managed='no'
> > > > and pre-bind all their VFs to vfio-pci at host boot time whenever possible.
> > > > It greatly reduces the number of moving parts (and thus potential for
> > > > encountering strange bugs caused by races between the kernel and user
> > > > processes)).
> > > > 
> > > > > With explicit unplug we can reattach it.
> > > > 
> > > > So are you then only using failover for the automatic guest-side team setup?
> > > 
> > > Yes, we planned to use it also for the unplug but the detached VF stood in
> > > the way.
> > 
> > If you're doing managed='yes', then libvirt will rebind to the host NIC
> > driver when the source QEMU exits. If you're doing managed='no', then it's
> > always your responsibility to rebind to the host NIC driver when you're done
> > with the VF (or just to simply *never* rebind it to the host NIC - I mean,
> > are you ever actually using any VF as a network device directly on the host?
> > If not, then why are you bothering to re-bind it to the host NIC driver at
> > all?). And if you're doing managed='no' and really do need the NIC bound to
> > the host driver for some reason when it's not used by a guest, then your
> > code should be re-binding to the host NIC driver at the point the QEMU
> > process exits, regardless of whether or not you're using <teaming>/failover.
> 
> I don't remember the initial reasoning for it being managed='no', but we do
> allow
> users to attach host network to VF. I am not sure if there is ever useful
> flow which
> would require that anyway it is allowed. Second reason why we need to rebind
> is that
> we monitor how many VFs are free on the host, as free VF is considered VF
> binded to host
> nic.

it looks like could've used managed=yes, so libvirt would've done rebinding
for you. (i.e. you will have it on host driver as expected)
(maybe find another way to count free VFs without rebinding (As Laine mentioned,
less moving parts the better))

Comment 46 Igor Mammedov 2021-04-23 21:02:31 UTC

(In reply to Ales Musil from comment #41)
> (In reply to Yanghang Liu from comment #40)
> > Hi Ales,
> > 
> > If possible, could you please provide the access to your machine and more
> > detailed reproducer?
> > 
> > That will be very helpful for me to locate the difference between your test
> > steps and mine.
> > 
> > Thanks a lot for your help in advance.
> 
> Those are exact steps from RHV point of view. I am not sure if I am able to
> provide detailed steps from QEMU point of view 
> but let me try. 

I finally setup SRIOV host, and tried to reproduce with steps you described
(+ sequence of del/add events from libvirt log) including live migration.
But I'm not able to reproduce hang (wait_unplug) issue.
(speculation: another cause of wait_unplug might be guest, if it refuses
to do unplug or fails to complete unplug, then QEMU will be stuck in wait_unplug)

So if hang still reproduces, we probably will need access to a host where it reproduces
+ whatever knobs you use to trigger bug from management side.

However I was able to trigger comment 36 crash [1] and one another [2] while trying to reproduce hang.

Now let look at workflow below (from QEMU point of view) 
 
> RHV step 1. (Start a VM with sriov + failover vm nics) should roughly
> translate to:
> 1) Create bridge for failover network
> 2) Add VF on PF and unbind it
> 3) Start VM with VF and failover virtio network attached

expected

> RHV step 2. (Unplug sriov + failover):
> 4) Unplug VF
more or less shouldn't explode but

> 5) Rebind VF back to network driver
from early fail-over discussions I recall that there was a plan to hold on
VF resources in QEMU until migration is complete (i.e. only hide device from guest
but keep it alive in QEMU so we could plug it back in case of failure).
Whether it was implemented I don't know, but this step is potentially
wrong one (if not now then it might be wrong in future)

> 6) Unplug virtio failover network
that's totally unexpected in failover workflow,
whole point of which is to keep guest network working
while non-migratable primary is unplugged.

if at this point you try to migrate you will trigger comment 36 crash
(unexpected unplug is not excuse for crash so it is something for us to fix in QEMU)

but you shouldn't do standby nic unplug in failover usecase

> RHV step 3. (Plug sriov + failover) 
> 7) Unbind VF
> 8) Plug VF
> 9) Plug virtio failover network 
> RHV step 4. (Migrate VM to a different host)
> 10) Uplug VF 
> 11) Rebind VF back to network driver
> 12) Start migration 

what puzzles me is why do you remove all nics and then immediately plug them back :/

Expected failover workflow (which works) is 
 1. start with all nics plugged in
 2. start migration
 3. guest failover driver unplugs primary
 4. migration completes, destination qemu closed.
 5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for you)

alternatively one can manually unplug primary before migration but should keep
VF bound until migration completes and source QEMU is terminated, 
and only then do VF unbind/rebind.

 
> But I am really not sure if the list is complete hopefully I did not miss
> anything. 
> 
> As for the machine, sure we can provide you with access, but again the
> reproducer
> is from RHV point of view not oinly QEMU.


1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045
2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062

Comment 47 Michael Burman 2021-04-25 06:57:29 UTC

(In reply to Laine Stump from comment #38)
> Micheal says the steps to reproduce are this:
> 
> 1. Start VM with failover nic
> 2. unplug the nic
> 3. plug it back
> 4. Try to migrate
> 
> I assume these steps are from the point of view of RHV, *not* QEMU correct?
> If so, then (as far as I understand from Ales' comments) from the point of
> view of QEMU step (4) is actually:
> 
> 4.1 - unplug VF again
> 4.2 - unbind VF from vfio-pci driver
> 4.3 - bind VF to host VF NIC driver
> 4.4 - start migration
> 
> (I'm not exactly clear whether or not 4.2 and 4.3 happen here, but that was
> kind of implied by Ales' reasoning for RHV unplugging the VF prior to
> starting migration).
> 
> So the attempt to reproduce in Comment 37 is incomplete - at the very least,
> the VF should be unplugged from the guest prior to starting the migration.

This is correct. In order to reproduce both issues, the freeze migration and the qemu process termination, you must first unplug the sr-iov failover nic , plug it back and only then migrate. 
Reproduced all the time with:
host-
4.18.0-304.el8.x86_64
qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64

guest-
4.18.0-240.5.el8.x86_64

Comment 48 Michael Burman 2021-04-25 08:01:56 UTC

FYI, as Igor suggested, I have created a new bug to track the qemu procces termination on migration, see bz 1953283

Thank you

Comment 49 Ales Musil 2021-04-26 05:12:11 UTC

(In reply to Igor Mammedov from comment #46)
> (In reply to Ales Musil from comment #41)
> > (In reply to Yanghang Liu from comment #40)
> > > Hi Ales,
> > > 
> > > If possible, could you please provide the access to your machine and more
> > > detailed reproducer?
> > > 
> > > That will be very helpful for me to locate the difference between your test
> > > steps and mine.
> > > 
> > > Thanks a lot for your help in advance.
> > 
> > Those are exact steps from RHV point of view. I am not sure if I am able to
> > provide detailed steps from QEMU point of view 
> > but let me try. 
> 
> I finally setup SRIOV host, and tried to reproduce with steps you described
> (+ sequence of del/add events from libvirt log) including live migration.
> But I'm not able to reproduce hang (wait_unplug) issue.
> (speculation: another cause of wait_unplug might be guest, if it refuses
> to do unplug or fails to complete unplug, then QEMU will be stuck in
> wait_unplug)
> 
> So if hang still reproduces, we probably will need access to a host where it
> reproduces
> + whatever knobs you use to trigger bug from management side.
> 
> However I was able to trigger comment 36 crash [1] and one another [2] while
> trying to reproduce hang.
> 
> Now let look at workflow below (from QEMU point of view) 
>  
> > RHV step 1. (Start a VM with sriov + failover vm nics) should roughly
> > translate to:
> > 1) Create bridge for failover network
> > 2) Add VF on PF and unbind it
> > 3) Start VM with VF and failover virtio network attached
> 
> expected
> 
> > RHV step 2. (Unplug sriov + failover):
> > 4) Unplug VF
> more or less shouldn't explode but
> 
> > 5) Rebind VF back to network driver
> from early fail-over discussions I recall that there was a plan to hold on
> VF resources in QEMU until migration is complete (i.e. only hide device from
> guest
> but keep it alive in QEMU so we could plug it back in case of failure).
> Whether it was implemented I don't know, but this step is potentially
> wrong one (if not now then it might be wrong in future)

In this step we are not talking about migration but just about simple unplug of interface.
I am kinda confused why should QEMU keep it if it was requested by user to unplug it and is 
no longer attach to VM?

> 
> > 6) Unplug virtio failover network
> that's totally unexpected in failover workflow,
> whole point of which is to keep guest network working
> while non-migratable primary is unplugged.

Why is it not expected? When user wants to unplug sriov + failover
from VM we should forbit it and leave the failover hanging there? 

Maybe it just misunderstanding but point of this part of the flow is to check
that user can in fact unplug sriov + failover. It has nothing to do 
with migration yet. 

> 
> if at this point you try to migrate you will trigger comment 36 crash
> (unexpected unplug is not excuse for crash so it is something for us to fix
> in QEMU)
> 
> but you shouldn't do standby nic unplug in failover usecase
> 
> > RHV step 3. (Plug sriov + failover) 
> > 7) Unbind VF
> > 8) Plug VF
> > 9) Plug virtio failover network 
> > RHV step 4. (Migrate VM to a different host)
> > 10) Uplug VF 
> > 11) Rebind VF back to network driver
> > 12) Start migration 
> 
> what puzzles me is why do you remove all nics and then immediately plug them
> back :/

To test that user can actually do that. It is important for user to be able to 
unplug interface whenever they need the change in guest networking. 

> 
> Expected failover workflow (which works) is 
>  1. start with all nics plugged in
>  2. start migration
>  3. guest failover driver unplugs primary
>  4. migration completes, destination qemu closed.
>  5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for
> you)

If that is the only supported path we can possibly adjust our code to do that.
But in that case it should be documented as the information around this feels
puzzling. 

> 
> alternatively one can manually unplug primary before migration but should
> keep
> VF bound until migration completes and source QEMU is terminated, 
> and only then do VF unbind/rebind.
> 
>  
> > But I am really not sure if the list is complete hopefully I did not miss
> > anything. 
> > 
> > As for the machine, sure we can provide you with access, but again the
> > reproducer
> > is from RHV point of view not oinly QEMU.
> 
> 
> 1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045
> 2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062

Comment 50 Igor Mammedov 2021-04-26 09:00:42 UTC

(In reply to Ales Musil from comment #49)
> (In reply to Igor Mammedov from comment #46)
> > (In reply to Ales Musil from comment #41)
> > > (In reply to Yanghang Liu from comment #40)
> > > > Hi Ales,
> > > > 
> > > > If possible, could you please provide the access to your machine and more
> > > > detailed reproducer?
> > > > 
> > > > That will be very helpful for me to locate the difference between your test
> > > > steps and mine.
> > > > 
> > > > Thanks a lot for your help in advance.
> > > 
> > > Those are exact steps from RHV point of view. I am not sure if I am able to
> > > provide detailed steps from QEMU point of view 
> > > but let me try. 
> > 
> > I finally setup SRIOV host, and tried to reproduce with steps you described
> > (+ sequence of del/add events from libvirt log) including live migration.
> > But I'm not able to reproduce hang (wait_unplug) issue.
> > (speculation: another cause of wait_unplug might be guest, if it refuses
> > to do unplug or fails to complete unplug, then QEMU will be stuck in
> > wait_unplug)
> > 
> > So if hang still reproduces, we probably will need access to a host where it
> > reproduces
> > + whatever knobs you use to trigger bug from management side.
> > 
> > However I was able to trigger comment 36 crash [1] and one another [2] while
> > trying to reproduce hang.
> > 
> > Now let look at workflow below (from QEMU point of view) 
> >  
> > > RHV step 1. (Start a VM with sriov + failover vm nics) should roughly
> > > translate to:
> > > 1) Create bridge for failover network
> > > 2) Add VF on PF and unbind it
> > > 3) Start VM with VF and failover virtio network attached
> > 
> > expected
> > 
> > > RHV step 2. (Unplug sriov + failover):
> > > 4) Unplug VF
> > more or less shouldn't explode but
> > 
> > > 5) Rebind VF back to network driver
> > from early fail-over discussions I recall that there was a plan to hold on
> > VF resources in QEMU until migration is complete (i.e. only hide device from
> > guest
> > but keep it alive in QEMU so we could plug it back in case of failure).
> > Whether it was implemented I don't know, but this step is potentially
> > wrong one (if not now then it might be wrong in future)
> 
> In this step we are not talking about migration but just about simple unplug
> of interface.
> I am kinda confused why should QEMU keep it if it was requested by user to
> unplug it and is 
> no longer attach to VM?

If you do not intend keeping failover working, it should be fine to unplug both.
Hence [1] should be fixed on QEMU side so it won't crash.
We also should fix wait_for_unplug hang if we are able to reproduce
(but it might be a bit difficult ii it's a guest side of equation).

> > > 6) Unplug virtio failover network
> > that's totally unexpected in failover workflow,
> > whole point of which is to keep guest network working
> > while non-migratable primary is unplugged.
> 
> Why is it not expected? When user wants to unplug sriov + failover
> from VM we should forbit it and leave the failover hanging there? 

if user wants to unplug failover pair, it should work.

> Maybe it just misunderstanding but point of this part of the flow is to check
> that user can in fact unplug sriov + failover. It has nothing to do 
> with migration yet. 
>
> > if at this point you try to migrate you will trigger comment 36 crash
> > (unexpected unplug is not excuse for crash so it is something for us to fix
> > in QEMU)
> > 
> > but you shouldn't do standby nic unplug in failover usecase
> > 
> > > RHV step 3. (Plug sriov + failover) 
> > > 7) Unbind VF
> > > 8) Plug VF
> > > 9) Plug virtio failover network 
> > > RHV step 4. (Migrate VM to a different host)
> > > 10) Uplug VF 
> > > 11) Rebind VF back to network driver
> > > 12) Start migration 
> > 
> > what puzzles me is why do you remove all nics and then immediately plug them
> > back :/
> 
> To test that user can actually do that. It is important for user to be able
> to 
> unplug interface whenever they need the change in guest networking. 

It's perfectly fine for testing purposes or when user doesn't care
about keeping networking connectivity in guest uninterrupted.
If the later then user should configure failover to begin with.

> > Expected failover workflow (which works) is 
> >  1. start with all nics plugged in
> >  2. start migration
> >  3. guest failover driver unplugs primary
> >  4. migration completes, destination qemu closed.
> >  5. do whatever VF cleanup necessary (unbind/rebind or let libvirt do it for
> > you)
> 
> If that is the only supported path we can possibly adjust our code to do
> that.

I think that's expected/tested workflow which should be used in
cases where failover (uninterrupted guest networking) is needed.

> But in that case it should be documented as the information around this feels
> puzzling. 

agreed, documentation could be improved.

> > 
> > alternatively one can manually unplug primary before migration but should
> > keep
> > VF bound until migration completes and source QEMU is terminated, 
> > and only then do VF unbind/rebind.
> > 
> >  
> > > But I am really not sure if the list is complete hopefully I did not miss
> > > anything. 
> > > 
> > > As for the machine, sure we can provide you with access, but again the
> > > reproducer
> > > is from RHV point of view not oinly QEMU.
> > 
> > 
> > 1) https://bugzilla.redhat.com/show_bug.cgi?id=1953045
> > 2) https://bugzilla.redhat.com/show_bug.cgi?id=1953062

Comment 52 Yanghang Liu 2021-04-28 02:45:39 UTC

It seems that I can use the test steps mentioned by Ales in comment 41 to reproduce this problem on qemu part:

Test env:
host:
4.18.0-304.el8.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
guest:
4.18.0-240.5.el8.x86_64



Test step:
> RHV step 1. (Start a VM with sriov + failover vm nics) 
> 1) Create bridge for failover network
> 2) Add VF on PF and unbind it

The VF can be created and bound to vfio-pci successfully.

> 3) Start VM with VF and failover virtio network attached

The qemu cmd line of the vm:
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root.3 \
-device vfio-pci,host=0000:06:01.0,id=net2,bus=root.4,addr=0x0,failover_pair_id=net1 \


> RHV step 2. (Unplug sriov + failover):
> 4) Unplug VF

The failover vf can be hot-unplugged from vm successfully

qmp:
{"execute":"device_del","arguments":{"id":"net2"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1619531311, "microseconds": 199847}, "event": "DEVICE_DELETED", "data": {"device": "net2", "path": "/machine/peripheral/net2"}}


> 5) Rebind VF back to network driver

The VF can be rebound to its original driver successfully

> 6) Unplug virtio failover network


6.1 
qmp:
{"execute":"device_del","arguments":{"id":"net1"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1619531395, "microseconds": 291584}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net1/virtio-backend"}}
{"timestamp": {"seconds": 1619531395, "microseconds": 342474}, "event": "DEVICE_DELETED", "data": {"device": "net1", "path": "/machine/peripheral/net1"}}


6.2
qmp:
{"execute":"netdev_del","arguments":{"id":"hostnet0"}}
output:
{"return": {}}


> RHV step 3. (Plug sriov + failover) 
> 7) Unbind VF

The VF can be bound to vfio-pci successfully.

> 8) Plug VF

qmp:
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:01.0","id":"net2","bus":"root.4","addr":"0x0","failover_pair_id":"net1"}}
output:
{"return": {}}

> 9) Plug virtio failover network 

9.1 
qmp:
{"execute":"netdev_add","arguments":{"type":"tap","id":"hostnet0","vhost":true}}
output:
{"return": {}}

9.2 
qmp:
{"execute":"device_add","arguments":{"driver":"virtio-net-pci","failover":"on","netdev":"hostnet0","id":"net1","mac":"52:54:00:6f:55:cc","bus":"root.3","addr":"0x0"}}
output:
{"return": {}}

Some related timestamp info:
{"timestamp": {"seconds": 1619531557, "microseconds": 45758}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net1"}}
{"timestamp": {"seconds": 1619531557, "microseconds": 294362}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "net1", "path": "/machine/peripheral/net1/virtio-backend"}}


> RHV step 4. (Migrate VM to a different host)
> 10) Uplug VF 

qmp:
{"execute":"device_del","arguments":{"id":"net2"}}
output:
{"return": {}}
{"timestamp": {"seconds": 1619532153, "microseconds": 922999}, "event": "DEVICE_DELETED", "data": {"device": "net2", "path": "/machine/peripheral/net2"}}


> 11) Rebind VF back to network driver

The VF can be bound to its original driver successfully

> 12) Start migration 


12.1 On target host,start a vm which is in listening mode
...
-netdev tap,id=hostnet0,vhost=on \
-device virtio-net-pci,netdev=hostnet0,id=net1,mac=52:54:00:6f:55:cc,failover=on,bus=root.3 \
-incoming defer \


12.2 On target host,setup migration uri and enable related  migration capability

qmp:
{"execute": "migrate-incoming","arguments": {"uri": "tcp:[::]:5800"}}
output:
{"timestamp": {"seconds": 1619532758, "microseconds": 836868}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}

qmp:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"late-block-activate","state":true}]}}
output:
{"return": {}}


12.3  On the source host, enable related migration capability 
qmp:
{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"pause-before-switchover","state":true}]}}
output:
{"return": {}}

12.4 migrate the vm from the source host to target host

qmp:
{"execute": "migrate","arguments":{"uri": "tcp:$ip_address:5800"}}   <--- After running this qmp, the vm is stuck on the source host
output:
{"return": {}}

Comment 53 Yanghang Liu 2021-04-28 06:23:20 UTC

Trying to reproduce this problem on the libvirt part:

Test env:
host:
4.18.0-304.el8.x86_64
qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0.x86_64
libvirt-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64
guest:
4.18.0-240.5.el8.x86_64



Test step:

> RHV step 1. (Start a VM with sriov + failover vm nics)
> 1) Create bridge for failover network

1.1 create a bridge named br0 based on the PF

1.2 create vm network

# virsh net-dumpxml failover-bridge 
<network connections='1'>
  <name>failover-bridge</name>
  <uuid>abfa7c99-8345-497a-920f-39a1e6aeff9c</uuid>
  <forward mode='bridge'/>
  <bridge name='br0'/>
</network>

# virsh net-dumpxml failover-vf 
<network>
  <name>failover-vf</name>
  <uuid>f0837c10-c4ac-4bd8-886d-6b0990131452</uuid>
  <forward mode='hostdev' managed='yes'>
    <address type='pci' domain='0x0000' bus='0x06' slot='0x01' function='0x0'/>
  </forward>
</network>

> 2) Add VF on PF and unbind it

The VF can be created successfully.

> 3) Start VM with VF and failover virtio network attached

The failover device xml is as following:
...
   <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-bridge'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
    </interface>

    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-vf'/>
      <teaming type='transient' persistent='net0'/>
    </interface>
...

The failover device qemu cmd line is as following:
...
-netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40
-device virtio-net-pci,failover=on,netdev=hostnet0,id=net0,mac=52:54:00:aa:1c:ef,bus=pci.1,addr=0x0
-device vfio-pci,host=0000:06:01.0,id=hostdev0,bus=pci.4,addr=0x0,failover_pair_id=net0

> RHV step 2. (Unplug sriov + failover):
> 4) Unplug VF 
> 5) Rebind VF back to network driver


# virsh detach-device-alias $domain hostdev0
Device detach request sent successfully

related qmp info:

{"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-385"}
{"return": {}, "id": "libvirt-385"}
{"timestamp": {"seconds": 1619582492, "microseconds": 654176}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}



> 6) Unplug virtio failover network

# virsh detach-device-alias $domain net0
Device detach request sent successfully

related qmp info:

{"execute":"device_del","arguments":{"id":"net0"},"id":"libvirt-387"}
{"timestamp": {"seconds": 1619582622, "microseconds": 786358}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/net0/virtio-backend"}}
{"timestamp": {"seconds": 1619582622, "microseconds": 837246}, "event": "DEVICE_DELETED", "data": {"device": "net0", "path": "/machine/peripheral/net0"}}

{"execute":"netdev_del","arguments":{"id":"hostnet0"},"id":"libvirt-389"}
{"return": {}, "id": "libvirt-389"}


> RHV step 3. (Plug sriov + failover) 
> 7) Unbind VF 
> 8) Plug VF


# cat failover_vf.xml 
    <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-vf'/>
      <teaming type='transient' persistent='net0'/>
    </interface>

# virsh attach-device $domain failover_vf.xml 
Device attached successfully


related qmp info:
{"execute":"device_add","arguments":{"driver":"vfio-pci","host":"0000:06:01.0","id":"hostdev0","bus":"pci.1","addr":"0x0","failover_pair_id":"net0"},"id":"libvirt-390"}
{"return": {}, "id": "libvirt-390"}


> 9) Plug virtio failover network 

# cat failover_virtio_net_device.xml 
   <interface type='network'>
      <mac address='52:54:00:aa:1c:ef'/>
      <source network='failover-bridge'/>
      <model type='virtio'/>
      <teaming type='persistent'/>
    </interface>


# virsh attach-device $domain failover_virtio_net_device.xml 
Device attached successfully

related qmp info:
{"execute":"netdev_add","arguments":{"type":"tap","fd":"fd-net00","id":"hostnet0","vhost":true,"vhostfd":"vhostfd-net00"},"id":"libvirt-394"}
{"return": {}, "id": "libvirt-393"}

{"execute":"device_add","arguments":{"driver":"virtio-net-pci","failover":"on","netdev":"hostnet0","id":"net0","mac":"52:54:00:aa:1c:ef","bus":"pci.4","addr":"0x0"},"id":"libvirt-395"}
{"return": {}, "id": "libvirt-394"}


Some related timestamp info:
{"timestamp": {"seconds": 1619582675, "microseconds": 407390}, "event": "FAILOVER_NEGOTIATED", "data": {"device-id": "net0"}}
{"timestamp": {"seconds": 1619582675, "microseconds": 653024}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "net0", "path": "/machine/peripheral/net0/virtio-backend"}}


> RHV step 4. (Migrate VM to a different host)
> 10) Uplug VF  
> 11) Rebind VF back to network driver

# virsh detach-device-alias $domain hostdev0 
Device detach request sent successfully


related qmp info:
{"execute":"device_del","arguments":{"id":"hostdev0"},"id":"libvirt-397"}
{"timestamp": {"seconds": 1619582755, "microseconds": 821294}, "event": "DEVICE_DELETED", "data": {"device": "hostdev0", "path": "/machine/peripheral/hostdev0"}}

> 12) Start migration 
12.1
# virsh migrate --live --verbose $domain qemu+ssh://10.73.73.75/system  <--- The vm is stuck after running this cmd 


The related qmp info that I can observe on the source host when reproduce this problem:

{"execute":"query-block","id":"libvirt-399"}

{"execute":"query-migrate-parameters","id":"libvirt-400"}

{"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"xbzrle","state":false},{"capability":"auto-converge","state":false},{"capability":"rdma-pin-all","state":false},{"capability":"postcopy-ram","state":false},{"capability":"compress","state":false},{"capability":"pause-before-switchover","state":true},{"capability":"late-block-activate","state":false},{"capability":"multifd","state":false},{"capability":"dirty-bitmaps","state":false}]},"id":"libvirt-401"}
{"return": {}, "id": "libvirt-401"}

{"execute":"migrate-set-parameters","arguments":{"tls-creds":"","tls-hostname":"","max-bandwidth":9223372036853727232},"id":"libvirt-402"}
{"return": {}, "id": "libvirt-402"}

{"execute":"getfd","arguments":{"fdname":"migrate"},"id":"libvirt-403"} (fd=34)
{"return": {}, "id": "libvirt-403"}

{"timestamp": {"seconds": 1619582833, "microseconds": 62115}, "event": "MIGRATION", "data": {"status": "setup"}}

{"execute":"migrate","arguments":{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"},"id":"libvirt-404"}  <---- the last qmp I could observe on the source host



12.2 check the domain status on the target host
# virsh domstate  $domain --reason
paused (migrating)

Comment 64 Yanghang Liu 2021-04-29 05:11:10 UTC

Simplify the reproduction steps on the comment 52/comment 53：

(1) start a vm *only with a failover virtio net device*
(2) hot-unplug the failover virtio net device
(3) hot-plug the failover virtio net device back
(4) migrate the vm from the source vm in target vm


I can use above step to reproduce this bug on qemu/libvirt part.

And After use the build which Laurent provided in the comment 51, the migration is completed successfully.

# rpm -q qemu-kvm
qemu-kvm-5.2.0-15.el8.BZ1953045.x86_64

Comment 67 Laurent Vivier 2021-04-30 09:42:35 UTC


*** This bug has been marked as a duplicate of bug 1953045 ***

Note You need to log in before you can comment on or make changes to this bug.

aadam
ailan
chayang
fdeutsch
imammedo
jdenemar
jinzhao
jsuchane
laine
lmen
lvivier
mburman
michal.skrivanek
mkalinin
mperina
mtessun
ngu
pelauter
virt-maint
xuzhang
yanghliu
ymankad