Bug 2106265
Summary: | Destination QEMU got stuck processing query-status when parallel+native_tls+zerocopy migration failed | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Fangge Jin <fjin> | ||||
Component: | qemu-kvm | Assignee: | Virtualization Maintenance <virt-maint> | ||||
qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> | ||||
Status: | CLOSED DUPLICATE | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | coli, jdenemar, jinzhao, juzhang, lmen, pkrempa, virt-maint, xiaohli, yafu | ||||
Version: | 9.1 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-08-29 11:29:00 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
This issue can't be reproduced with latest versions: libvirt-8.5.0-5.el9.x86_64 qemu-kvm-7.0.0-10.el9.x86_64 (In reply to Fangge Jin from comment #2) > This issue can't be reproduced with latest versions: > libvirt-8.5.0-5.el9.x86_64 > qemu-kvm-7.0.0-10.el9.x86_64 # virsh migrate uefi qemu+tls://***/system --live --postcopy --bandwidth 10 --auto-converge --p2p --zerocopy --parallel --tls error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument (In reply to Fangge Jin from comment #2) > This issue can't be reproduced with latest versions: > libvirt-8.5.0-5.el9.x86_64 > qemu-kvm-7.0.0-10.el9.x86_64 Sorry, I can still reproduce this issue with the above version. It is just that the reproduce rate is not 100% Unfortunately I wasn't able to reproduce this issue, it always works as expected (i.e., I get the "Requested Zero Copy feature is not available: Invalid argument" error message). Anyway, the provided logs show that virtqemud is waiting in Finish phase for query-status QMP command to return. In other words, it looks like QEMU itself gets stuck processing the command and thus libvirt will never stop waiting for the reply. I'm moving this to QEMU for further investigation. But could you please capture the stack of the destination QEMU process if you can reproduce? I also can't reproduce it with: libvirt-8.5.0-5.el9.x86_64 qemu-kvm-7.0.0-11.el9.x86_64 Then I downgrade qemu-kvm to 7.0.0-10.el9.x86_64, it is reproduced. So it seems that it has been fixed. Reproduce this bug when migrate with zerocopy + multifd + postcopy enable, without postcopy enabled, won't reproduce this issue. I would mark this bug duplicated with Bug 2107466 since it fixes this bug. *** This bug has been marked as a duplicate of bug 2107466 *** |
Created attachment 1896286 [details] virtqemud log and gstack output Description of problem: Do live migration with parallel+native_tls+zerocopy enabled, migration failed as expected. But virtqemud got stuck in Finish phase. Version-Release number of selected component (if applicable): libvirt-8.5.0-1.el9.x86_64 qemu-kvm-7.0.0-8.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a vm 2. Do migration with parallel+native_tls+zerocopy enabled # virsh migrate uefi qemu+tcp://******/system --live --postcopy --bandwidth 10 --auto-converge --p2p --zerocopy --parallel --tls 3. Check migration job info: # virsh domjobinfo uefi Job type: Failed Operation: Outgoing migration Actual results: As step3, migration failed as expected, but virsh command didn't return because virtqemud got stuck in Finish phase(shown by gstack `pidof virtqemud`) Expected results: Migration failed, and virtqemud didn't get stuck Additional info: