Bug 2114852
Summary: | Src qemu crashed when do migration with zerocopy+native_tls enabled | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Fangge Jin <fjin> | ||||
Component: | qemu-kvm | Assignee: | Leonardo Bras <leobras> | ||||
qemu-kvm sub component: | Live Migration | QA Contact: | Li Xiaohui <xiaohli> | ||||
Status: | CLOSED DUPLICATE | Docs Contact: | |||||
Severity: | unspecified | ||||||
Priority: | unspecified | CC: | chayang, coli, jdenemar, jinzhao, juzhang, virt-maint, xiaohli | ||||
Version: | 8.7 | ||||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-09-01 06:34:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Additional info: 1. Can't reproduce on RHEL9.1 libvirt-client-8.5.0-4.el9.x86_64 qemu-kvm-7.0.0-9.el9.x86_64 2. Cant' reproduce when do p2p migration(with virsh migrate option --p2p) Could you please try to reproduce this with the latest 8.7 brew, to check if it's still reproducing? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821 There is a high chance the config fixes will also fix this one. I have been trying to reproduce it myself, but no luck yet. As a result of my tests: (In reply to Fangge Jin from comment #0) > Version-Release number of selected component (if applicable): > libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64 > qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64 For rhel8.7 in above versions, I ran: > 2. Do non-p2p migration with zerocopy+native_tls enabled > # virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel > --bandwidth 4 --tls The output was : error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument I could retry the migration as many times as wanted, and qemu did not crash. For rhel9.1 in versions: libvirt-8.5.0-5.el9.src.rpm qemu-kvm-7.0.0-9.el9.src.rpm I ran: virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls I had output: error: operation failed: migration out job: Requested Zero Copy feature is not available: Invalid argument I could also retry the migration as many times as wanted, and qemu did not crash. @fjin : I could not reproduce the issue. If you can still reproduce it, please try reproducing with the qemu brew provided in Comment#2. (In reply to Leonardo Bras from comment #2) > Could you please try to reproduce this with the latest 8.7 brew, to check if > it's still reproducing? > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821 > > There is a high chance the config fixes will also fix this one. > I have been trying to reproduce it myself, but no luck yet. I can't reproduce the issue with this build (In reply to Fangge Jin from comment #4) > (In reply to Leonardo Bras from comment #2) > > Could you please try to reproduce this with the latest 8.7 brew, to check if > > it's still reproducing? > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821 > > > > There is a high chance the config fixes will also fix this one. > > I have been trying to reproduce it myself, but no luck yet. > I can't reproduce the issue with this build Great! then this means this MR fixes the issue: https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/201 The above MR is associated with BZ#2110203, so fixing it also solves this BZ. Reproduce this bug on libvirt-client-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64 && qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64 when migrate through libvirt with non-p2p mode, can't reproduce bug with same qmp commands from qemu side, and can't reproduce bug through libvirt with p2p mode. Src qemu crashes after getting migration failed with zerocopy + tls enabled. The reason why we didn't reproduce this bug on qemu-kvm-6.2.0-19.module+el8.7.0+16358+eef3c6a2 is that qemu-kvm-6.2.0.19 avoids to start migration, it will give error prompt when set tls-creds through migrate-set-parameters before migration. I suspect this error is more of a libvirt problem. Jiri, can you give some explanations about the differences between p2p and non-p2p (I don't see any differences from qemu side under these two modes)? And do you know why src qemu crash only under non-p2p mode? QEMU crash cannot ever be a libvirt issue. You are right there's no difference between p2p and non-p2p migrations in libvirt's interaction with QEMU. In p2p mode a client (virsh) tells the source libvirtd to migrate a domain and this source libvirtd calls the APIs for individual migration phases either locally or by a direct connection to the destination libvirtd. On the other hand in non-p2p mode virsh calls these APIs itself. That is, it calls Begin API on the source libvird, waits for the result, then it calls Prepare on the destination followed by Perform on the source and so on. So the timing may differ, but the action performed are the same in both modes. Thanks for the explanation about p2p and non-p2p migration. I would close this bug as duplicated with Bug 2110203 since it avoids to fail migration. *** This bug has been marked as a duplicate of bug 2110203 *** (In reply to Li Xiaohui from comment #6) > [...] > Src qemu crashes after getting migration failed with zerocopy + tls enabled. Src qemu crash is a big problem, so I want to better understand this: Does it reproduce in latest (RHEL8.7) qemu version? (In reply to Leonardo Bras from comment #9) > (In reply to Li Xiaohui from comment #6) > > [...] > > Src qemu crashes after getting migration failed with zerocopy + tls enabled. > > Src qemu crash is a big problem, so I want to better understand this: > Does it reproduce in latest (RHEL8.7) qemu version? No. we don't hit this bug on the latest RHEL 8.7 qemu. |
Created attachment 1903266 [details] qemu core dump, libvirt and qemu log Description of problem: Do non-p2p migration with zerocopy+native_tls enabled, src qemu crashed. Version-Release number of selected component (if applicable): libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64 qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64 How reproducible: 100% Steps to Reproduce: 1. start a vm 2. Do non-p2p migration with zerocopy+native_tls enabled # virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls 3. Check src vm status, found qemu crashed. qemu log: 2022-08-03 08:06:41.051+0000: initiating migration 2022-08-03T08:06:41.195333Z qemu-kvm: Requested Zero Copy feature is not available: Invalid argument qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. 2022-08-03 08:06:43.151+0000: shutting down, reason=crashed Actual results: Src qemu crashed Expected results: Migration fails, but src qemu should not crash Additional info: