Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
The issue still exists on following packages version:
[root@dell-per440-22 vfio]# rpm -q qemu-kvm
qemu-kvm-6.1.0-8.el9.x86_64
[root@dell-per440-22 vfio]# uname -r
5.14.0-30.el9.x86_64
Move the BZ to POST as the fix is in the rebase to 6.2.0 ()
commit 9323f892b39d133eb69b301484bf7b2f3f49737d
Author: Laurent Vivier <lvivier>
Date: Thu Nov 18 14:32:23 2021 +0100
failover: fix unplug pending detection
Failover needs to detect the end of the PCI unplug to start migration
after the VFIO card has been unplugged.
To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
pcie_unplug_device().
But since
17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
we have switched to ACPI unplug and these functions are not called anymore
and the flag not set. So failover migration is not able to detect if card
is really unplugged and acts as it's done as soon as it's started. So it
doesn't wait the end of the unplug to start the migration. We don't see any
problem when we test that because ACPI unplug is faster than PCIe native
hotplug and when the migration really starts the unplug operation is
already done.
See c000a9bd06ea ("pci: mark device having guest unplug request pending")
a99c4da9fc2a ("pci: mark devices partially unplugged")
Signed-off-by: Laurent Vivier <lvivier>
Reviewed-by: Ani Sinha <ani>
Message-Id: <20211118133225.324937-4-lvivier>
Reviewed-by: Michael S. Tsirkin <mst>
Signed-off-by: Michael S. Tsirkin <mst>
(In reply to Yanhui Ma from comment #18)
> Finally verify the bug with qemu-kvm-6.2.0-8.el9.x86_64, it works well.
Could you move the BZ to VERIFIED?
Thanks
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2022:2307
Description of problem: The failover vf will be hot-unplug if the migration is cancelled when migration status is "active" Version-Release number of selected component (if applicable): qemu-kvm-6.1.0-3.el9.x86_64 kernel-5.14.0-3.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. create bridge named br0 based on the PF nmcli connection add type bridge ifname br0 con-name br0 stp off autoconnect yes nmcli connection add type bridge-slave ifname "$MAIN_CONN" con-name "$MAIN_CONN" master br0 autoconnect yes systemctl restart NetworkManager 2. create VF from the same PF # echo 1 > /sys/bus/pci/devices/0000\:d8\:00.0/sriov_numvfs 3. setup vm network # virsh net-dumpxml failover-bridge <network> <name>failover-bridge</name> <uuid>bc8a813f-415d-404f-9996-ba22e27bfea6</uuid> <forward mode='bridge'/> <bridge name='br0'/> </network> # virsh net-dumpxml failover-vf --inactive <network> <name>failover-vf</name> <uuid>8e09aebc-83af-4eda-b72f-e6061c3456a5</uuid> <forward mode='hostdev' managed='yes'> <pf dev='ens8f0'/> </forward> </network> 4. start a VM with a failover vf and a virtio net device The domain xml: <interface type='bridge'> <mac address='52:54:11:aa:1c:ef'/> <source bridge='br0'/> <target dev='vnet0'/> <model type='virtio'/> <teaming type='persistent'/> <alias name='ua-test'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </interface> <interface type='hostdev' managed='yes'> <mac address='52:54:11:aa:1c:ef'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0xd8' slot='0x10' function='0x0'/> </source> <teaming type='transient' persistent='ua-test'/> <alias name='hostdev0'/> <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> </interface> The qemu cmd line: -device virtio-net-pci,failover=on,netdev=hostua-test,id=ua-test,mac=52:54:11:aa:1c:ef,bus=pci.4,addr=0x0 -device vfio-pci,host=0000:d8:10.0,id=hostdev0,bus=pci.5,addr=0x0,failover_pair_id=ua-test 5. check the failover device info in the vm Both failover virtio net device and failover vf exist in the vm # dmesg | grep -i failover [ 3.118262] virtio_net virtio2 eth0: failover master:eth0 registered [ 3.125486] virtio_net virtio2 eth0: failover standby slave:eth1 registered [ 7.018876] virtio_net virtio2 enp4s0: failover primary slave:eth0 registered 6. migrate the vm # virsh migrate --live --verbose $domain qemu+ssh://$target_ip_address/system 7. cancel the migration when the migration status is "active" The relate cmd: # virsh migrate --live --verbose $domain qemu+ssh://$target_ip_address/system ^Cerror: operation aborted: migration out job: canceled by client <--- enter Ctrl + C to cancel the migration 8.check the failover device info in the vm again # dmesg [ 801.443185] pcieport 0000:00:02.4: Slot(0-4): Attention button pressed [ 801.450971] pcieport 0000:00:02.4: Slot(0-4): Powering off due to button press [ 804.216142] pcieport 0000:00:02.4: Slot(0-4): Attention button pressed [ 804.224091] pcieport 0000:00:02.4: Slot(0-4): Button cancel [ 804.228877] pcieport 0000:00:02.4: Slot(0-4): Action canceled due to button press [ 804.234715] pcieport 0000:00:02.4: Slot(0-4): Card not present [ 804.269636] virtio_net virtio2 enp4s0: failover primary slave:enp5s0 unregistered <-- the failover vf has been unregistered # ifconfig <-- Only failover virtio net device exist in the vm at that time enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.236 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 fe80::5197:45ad:a07f:643c prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:4920:a52d:dc6a:d47f:247c prefixlen 64 scopeid 0x0<global> inet6 2001::c5f2:5cb0:5f90:bb17 prefixlen 64 scopeid 0x0<global> ether 52:54:11:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 4582 bytes 340560 (332.5 KiB) RX errors 0 dropped 943 overruns 0 frame 0 TX packets 914 bytes 104119 (101.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::88fe:59f7:c9fc:db72 prefixlen 64 scopeid 0x20<link> ether 52:54:11:aa:1c:ef txqueuelen 1000 (Ethernet) RX packets 3575 bytes 244274 (238.5 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 466 bytes 54008 (52.7 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Actual results: The failover vf do not exist in the vm Expected results: The failover vf still exists in the vm Additional info: