Bug 1817965
| Summary: | Live post-copy migration of the vm with failover VF device fails. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanghang Liu <yanghliu> |
| Component: | qemu-kvm | Assignee: | Laurent Vivier <lvivier> |
| qemu-kvm sub component: | Live Migration | QA Contact: | Yanhui Ma <yama> |
| Status: | CLOSED MIGRATED | Docs Contact: | Jiri Herrmann <jherrman> |
| Severity: | medium | ||
| Priority: | medium | CC: | aadam, ailan, chayang, jherrman, jinzhao, juzhang, lvivier, pezhang, quintela, virt-maint, wquan, yalzhang, yanghliu |
| Version: | 9.0 | Keywords: | MigratedToJIRA, Reopened, Triaged |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Known Issue | |
| Doc Text: |
.Live post-copy migration of VMs with failover VFs fails
Currently, attempting to post-copy migrate a running virtual machine (VM) fails if the VM uses a device with the virtual function (VF) failover capability enabled. To work around the problem, use the standard migration type, rather than post-copy migration.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-09-22 16:14:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** Bug 1817986 has been marked as a duplicate of this bug. *** Postcopy has a different path for memory assignment. Looking why it is different than normal precopy (at that point it should be the same). But obviously it is not being. Moving to next version. We are out of time for this version. Migrating vm with hostdev device + teaming setting will met the same issue, refer to bug 1927984#c13 This bug can still be reproduced in the following test env: host: qemu-kvm-6.0.0-21.module+el8.5.0+11555+e0ab0d09.x86_64 4.18.0-316.el8.x86_64 libvirt-7.4.0-1.module+el8.5.0+11218+83343022.x86_64 guest: 4.18.0-314.el8.x86_64 Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. @lvivier, could you hande this one? Thanks, Juan. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Could you retest with qemu-kvm-6.2.0-6.el9 ? Thanks Summarize the current status about "Migrate with post-copy" as below. I'm not sure if there is an existing bug for step 3.
Env:
Sriov card: 82599
Guest kernel: 5.14.0-55.el9.x86_64
Source and target host package:
# rpm -q libvirt qemu-kvm
libvirt-8.0.0-3.el9.x86_64
qemu-kvm-6.2.0-7.el9.x86_64
Migrate with post-copy:
1. Migrate succeed;
# virsh migrate rhel9 qemu+ssh://${target_host}/system --live --verbose --p2p --postcopy --timeout 5 --timeout-postcopy --bandwidth 4 --postcopy-bandwidth 4
Migration: [100 %]
2. During migration after hostdev unregistered, ping can not work(track in bug 1789206);
3. After migration, guest show 2 interfaces, hostdev interface is not exists. Vm xml show both bridge and hostdev interfaces. VM's network function is broken.
# dmesg | grep register
……
[ 3.052376] virtio_net virtio0 eth0: failover master:eth0 registered
[ 3.055692] virtio_net virtio0 eth0: failover standby slave:eth1 registered
[ 8.834592] virtio_net virtio0 enp1s0: failover primary slave:eth0 registered
[ 32.035167] virtio_net virtio0 enp1s0: failover primary slave:enp3s0 unregistered
Hi Laurent, please help to check the result in above comment item 3, after migration with postcopy, there are only 2 interfaces on the vm. If we migrate *without* postcopy, no such issue and there will be 3 interfaces. (In reply to yalzhang from comment #19) > Hi Laurent, please help to check the result in above comment item 3, after > migration with postcopy, there are only 2 interfaces on the vm. If we > migrate *without* postcopy, no such issue and there will be 3 interfaces. I think there is a real issue. I'm sorry I didn't have the time to work on that but it's always on my todo list. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |
Description of problem: live post-copy migration of the vm with failover VF device fails. Version-Release number of selected component (if applicable): guest: 4.18.0-192.el8.x86_64 host: 4.18.0-192.el8.x86_64 qemu-kvm-4.2.0-16.module+el8.2.0+6092+4f2391c1.x86_64 How reproducible: 100% Steps to Reproduce: 1.On source host,create NetXtreme BCM57810 VF and set the mac address of the VF # ip link set enp131s0f0 vf 0 mac 22:2b:62:bb:a9:82 2.start a source guest with NetXtreme BCM57810 VF which enables failover /usr/libexec/qemu-kvm -name rhel8-2 -M q35 -enable-kvm \ -monitor stdio \ -nodefaults \ -m 4G \ -boot menu=on \ -cpu Haswell-noTSX-IBRS \ -device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \ -device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \ -device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \ -device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \ -device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \ -device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \ -device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \ -device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \ -smp 2,sockets=1,cores=2,threads=2,maxcpus=4 \ -qmp tcp:0:5555,server,nowait \ -blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/nfsmount/migra_test/192.qcow2,aio=threads \ -blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \ -device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root.1 \ -device VGA,id=video1,bus=root.2 \ -vnc :0 \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \ -device vfio-pci,host=0000:83:01.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \ 3.On the source host,check the network info in guest # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.214 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 2620:52:0:4920:202b:62ff:febb:a982 prefixlen 64 scopeid 0x0<global> inet6 fe80::202b:62ff:febb:a982 prefixlen 64 scopeid 0x20<link> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 5087 bytes 377754 (368.9 KiB) RX errors 0 dropped 5 overruns 0 frame 0 TX packets 101 bytes 11887 (11.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 4950 bytes 359401 (350.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 180 (180.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 137 bytes 18353 (17.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 99 bytes 11707 (11.4 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0xfc800000-fc807fff lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 4.On target host,create 82599ES VF and set the mac address of the VF # ip link set enp6s0f0 vf 0 mac 22:2b:62:bb:a9:82 5.start a target guest in listening mode in order to wait for migrating from source guest ... -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \ -device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \ -incoming tcp:0:5800 \ 6.On source host and target host, set postcopy mode on (qemu) migrate_set_capability postcopy-ram on 7.Migrate guest from source host to target host. (qemu) migrate -d tcp:10.73.73.61:5800 8. Before the migration is completed, change into postcopy mode: (qemu) migrate_start_postcopy 9.The migration is completed within several seconds after starting the post-copy 10.check the migration info on source host (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 13638 milliseconds downtime: 6 milliseconds setup: 7059 milliseconds transferred ram: 643240 kbytes throughput: 801.15 mbps remaining ram: 0 kbytes total ram: 4211528 kbytes duplicate: 894347 pages skipped: 0 pages normal: 158535 pages normal bytes: 634140 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 29470 postcopy request count: 446 11.check the migration info on the target host (qemu) qemu-kvm: VFIO_MAP_DMA: -14 qemu-kvm: VFIO_MAP_DMA: -14 qemu-kvm: VFIO_MAP_DMA: -14 qemu-kvm: VFIO_MAP_DMA: -14 qemu-kvm: VFIO_MAP_DMA: -14 qemu-kvm: warning: vfio 0000:06:10.0: failed to setup container for group 63: memory listener initialization failed: Region vga.vram: vfio_dma_map(0x55943303cee0, 0xa0000, 0x10000, 0x7f05f0a00000) = -14 (Bad address) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 0 milliseconds 12.On the target host,check the network info in guest # ifconfig enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.73.33.214 netmask 255.255.254.0 broadcast 10.73.33.255 inet6 2620:52:0:4920:202b:62ff:febb:a982 prefixlen 64 scopeid 0x0<global> inet6 fe80::202b:62ff:febb:a982 prefixlen 64 scopeid 0x20<link> ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 5087 bytes 377754 (368.9 KiB) RX errors 0 dropped 5 overruns 0 frame 0 TX packets 101 bytes 11887 (11.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet) RX packets 4950 bytes 359401 (350.9 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2 bytes 180 (180.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 Actual results: live post-copy migration of the vm with failover VF device fails. Expected results: live post-copy migration of the vm with failover VF device is complete. The failover VF works well on both the source host and target host.