Bug 2114852

Summary: Src qemu crashed when do migration with zerocopy+native_tls enabled
Product: Red Hat Enterprise Linux 8 Reporter: Fangge Jin <fjin>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: chayang, coli, jdenemar, jinzhao, juzhang, virt-maint, xiaohli
Version: 8.7   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-01 06:34:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
qemu core dump, libvirt and qemu log none

Description Fangge Jin 2022-08-03 11:52:59 UTC
Created attachment 1903266 [details]
qemu core dump, libvirt and qemu log

Description of problem:
Do non-p2p migration with zerocopy+native_tls enabled, src qemu crashed.


Version-Release number of selected component (if applicable):
libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64
qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64


How reproducible:
100% 

Steps to Reproduce:
1. start a vm

2. Do non-p2p migration with zerocopy+native_tls enabled
# virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls 

3. Check src vm status, found qemu crashed. qemu log:
2022-08-03 08:06:41.051+0000: initiating migration
2022-08-03T08:06:41.195333Z qemu-kvm: Requested Zero Copy feature is not available: Invalid argument
qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
2022-08-03 08:06:43.151+0000: shutting down, reason=crashed


Actual results:
Src qemu crashed

Expected results:
Migration fails, but src qemu should not crash

Additional info:

Comment 1 Fangge Jin 2022-08-03 11:56:33 UTC
Additional info:
1. Can't reproduce on RHEL9.1
libvirt-client-8.5.0-4.el9.x86_64
qemu-kvm-7.0.0-9.el9.x86_64

2. Cant' reproduce when do p2p migration(with virsh migrate option --p2p)

Comment 2 Leonardo Bras 2022-08-10 08:56:03 UTC
Could you please try to reproduce this with the latest 8.7 brew, to check if it's still reproducing?
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821

There is a high chance the config fixes will also fix this one.
I have been trying to reproduce it myself, but no luck yet.

Comment 3 Leonardo Bras 2022-08-10 09:55:38 UTC
As a result of my tests:

(In reply to Fangge Jin from comment #0)
> Version-Release number of selected component (if applicable):
> libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64
> qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64
 
For rhel8.7 in above versions, I ran:

> 2. Do non-p2p migration with zerocopy+native_tls enabled
> # virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel
> --bandwidth 4 --tls 

The output was : 
error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument
I could retry the migration as many times as wanted, and qemu did not crash.

For rhel9.1 in versions:
libvirt-8.5.0-5.el9.src.rpm
qemu-kvm-7.0.0-9.el9.src.rpm

I ran:
virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls

I had output:
error: operation failed: migration out job: Requested Zero Copy feature is not available: Invalid argument
I could also retry the migration as many times as wanted, and qemu did not crash.


@fjin : I could not reproduce the issue. If you can still reproduce it, please try reproducing with the qemu brew provided in Comment#2.

Comment 4 Fangge Jin 2022-08-10 10:14:28 UTC
(In reply to Leonardo Bras from comment #2)
> Could you please try to reproduce this with the latest 8.7 brew, to check if
> it's still reproducing?
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821
> 
> There is a high chance the config fixes will also fix this one.
> I have been trying to reproduce it myself, but no luck yet.
I can't reproduce the issue with this build

Comment 5 Leonardo Bras 2022-08-10 21:44:23 UTC
(In reply to Fangge Jin from comment #4)
> (In reply to Leonardo Bras from comment #2)
> > Could you please try to reproduce this with the latest 8.7 brew, to check if
> > it's still reproducing?
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821
> > 
> > There is a high chance the config fixes will also fix this one.
> > I have been trying to reproduce it myself, but no luck yet.
> I can't reproduce the issue with this build

Great! then this means this MR fixes the issue:
https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/201

The above MR is associated with BZ#2110203, so fixing it also solves this BZ.

Comment 6 Li Xiaohui 2022-08-30 14:09:02 UTC
Reproduce this bug on libvirt-client-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64 && qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64 when migrate through libvirt with non-p2p mode, can't reproduce bug with same qmp commands from qemu side, and can't reproduce bug through libvirt with p2p mode.
Src qemu crashes after getting migration failed with zerocopy + tls enabled.


The reason why we didn't reproduce this bug on qemu-kvm-6.2.0-19.module+el8.7.0+16358+eef3c6a2 is that qemu-kvm-6.2.0.19 avoids to start migration, it will give error prompt when set tls-creds through migrate-set-parameters before migration.


I suspect this error is more of a libvirt problem. Jiri, can you give some explanations about the differences between p2p and non-p2p (I don't see any differences from qemu side under these two modes)? And do you know why src qemu crash only under non-p2p mode?

Comment 7 Jiri Denemark 2022-08-31 12:03:56 UTC
QEMU crash cannot ever be a libvirt issue.

You are right there's no difference between p2p and non-p2p migrations in
libvirt's interaction with QEMU. In p2p mode a client (virsh) tells the source
libvirtd to migrate a domain and this source libvirtd calls the APIs for
individual migration phases either locally or by a direct connection to the
destination libvirtd. On the other hand in non-p2p mode virsh calls these APIs
itself. That is, it calls Begin API on the source libvird, waits for the
result, then it calls Prepare on the destination followed by Perform on the
source and so on. So the timing may differ, but the action performed are the
same in both modes.

Comment 8 Li Xiaohui 2022-09-01 06:34:17 UTC
Thanks for the explanation about p2p and non-p2p migration.


I would close this bug as duplicated with Bug 2110203 since it avoids to fail migration.

*** This bug has been marked as a duplicate of bug 2110203 ***

Comment 9 Leonardo Bras 2022-09-02 13:58:48 UTC
(In reply to Li Xiaohui from comment #6)
> [...]
> Src qemu crashes after getting migration failed with zerocopy + tls enabled.

Src qemu crash is a big problem, so I want to better understand this:
Does it reproduce in latest (RHEL8.7) qemu version?

Comment 10 Li Xiaohui 2022-09-05 04:07:47 UTC
(In reply to Leonardo Bras from comment #9)
> (In reply to Li Xiaohui from comment #6)
> > [...]
> > Src qemu crashes after getting migration failed with zerocopy + tls enabled.
> 
> Src qemu crash is a big problem, so I want to better understand this:
> Does it reproduce in latest (RHEL8.7) qemu version?

No. we don't hit this bug on the latest RHEL 8.7 qemu.