RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2114852 - Src qemu crashed when do migration with zerocopy+native_tls enabled
Summary: Src qemu crashed when do migration with zerocopy+native_tls enabled
Keywords:
Status: CLOSED DUPLICATE of bug 2110203
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.7
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Leonardo Bras
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-03 11:52 UTC by Fangge Jin
Modified: 2022-09-05 04:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-01 06:34:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
qemu core dump, libvirt and qemu log (416.69 KB, application/x-bzip)
2022-08-03 11:52 UTC, Fangge Jin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-130023 0 None None None 2022-08-03 11:56:09 UTC

Description Fangge Jin 2022-08-03 11:52:59 UTC
Created attachment 1903266 [details]
qemu core dump, libvirt and qemu log

Description of problem:
Do non-p2p migration with zerocopy+native_tls enabled, src qemu crashed.


Version-Release number of selected component (if applicable):
libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64
qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64


How reproducible:
100% 

Steps to Reproduce:
1. start a vm

2. Do non-p2p migration with zerocopy+native_tls enabled
# virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls 

3. Check src vm status, found qemu crashed. qemu log:
2022-08-03 08:06:41.051+0000: initiating migration
2022-08-03T08:06:41.195333Z qemu-kvm: Requested Zero Copy feature is not available: Invalid argument
qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
2022-08-03 08:06:43.151+0000: shutting down, reason=crashed


Actual results:
Src qemu crashed

Expected results:
Migration fails, but src qemu should not crash

Additional info:

Comment 1 Fangge Jin 2022-08-03 11:56:33 UTC
Additional info:
1. Can't reproduce on RHEL9.1
libvirt-client-8.5.0-4.el9.x86_64
qemu-kvm-7.0.0-9.el9.x86_64

2. Cant' reproduce when do p2p migration(with virsh migrate option --p2p)

Comment 2 Leonardo Bras 2022-08-10 08:56:03 UTC
Could you please try to reproduce this with the latest 8.7 brew, to check if it's still reproducing?
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821

There is a high chance the config fixes will also fix this one.
I have been trying to reproduce it myself, but no luck yet.

Comment 3 Leonardo Bras 2022-08-10 09:55:38 UTC
As a result of my tests:

(In reply to Fangge Jin from comment #0)
> Version-Release number of selected component (if applicable):
> libvirt-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64
> qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64
 
For rhel8.7 in above versions, I ran:

> 2. Do non-p2p migration with zerocopy+native_tls enabled
> # virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel
> --bandwidth 4 --tls 

The output was : 
error: operation failed: job 'migration out' failed: Requested Zero Copy feature is not available: Invalid argument
I could retry the migration as many times as wanted, and qemu did not crash.

For rhel9.1 in versions:
libvirt-8.5.0-5.el9.src.rpm
qemu-kvm-7.0.0-9.el9.src.rpm

I ran:
virsh migrate vm1 qemu+tls://***/system --live --zerocopy --parallel --bandwidth 4 --tls

I had output:
error: operation failed: migration out job: Requested Zero Copy feature is not available: Invalid argument
I could also retry the migration as many times as wanted, and qemu did not crash.


@fjin : I could not reproduce the issue. If you can still reproduce it, please try reproducing with the qemu brew provided in Comment#2.

Comment 4 Fangge Jin 2022-08-10 10:14:28 UTC
(In reply to Leonardo Bras from comment #2)
> Could you please try to reproduce this with the latest 8.7 brew, to check if
> it's still reproducing?
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821
> 
> There is a high chance the config fixes will also fix this one.
> I have been trying to reproduce it myself, but no luck yet.
I can't reproduce the issue with this build

Comment 5 Leonardo Bras 2022-08-10 21:44:23 UTC
(In reply to Fangge Jin from comment #4)
> (In reply to Leonardo Bras from comment #2)
> > Could you please try to reproduce this with the latest 8.7 brew, to check if
> > it's still reproducing?
> > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47002821
> > 
> > There is a high chance the config fixes will also fix this one.
> > I have been trying to reproduce it myself, but no luck yet.
> I can't reproduce the issue with this build

Great! then this means this MR fixes the issue:
https://gitlab.com/redhat/rhel/src/qemu-kvm/qemu-kvm/-/merge_requests/201

The above MR is associated with BZ#2110203, so fixing it also solves this BZ.

Comment 6 Li Xiaohui 2022-08-30 14:09:02 UTC
Reproduce this bug on libvirt-client-8.0.0-10.module+el8.7.0+16047+746a126c.x86_64 && qemu-kvm-6.2.0-18.module+el8.7.0+15999+d24f860e.x86_64 when migrate through libvirt with non-p2p mode, can't reproduce bug with same qmp commands from qemu side, and can't reproduce bug through libvirt with p2p mode.
Src qemu crashes after getting migration failed with zerocopy + tls enabled.


The reason why we didn't reproduce this bug on qemu-kvm-6.2.0-19.module+el8.7.0+16358+eef3c6a2 is that qemu-kvm-6.2.0.19 avoids to start migration, it will give error prompt when set tls-creds through migrate-set-parameters before migration.


I suspect this error is more of a libvirt problem. Jiri, can you give some explanations about the differences between p2p and non-p2p (I don't see any differences from qemu side under these two modes)? And do you know why src qemu crash only under non-p2p mode?

Comment 7 Jiri Denemark 2022-08-31 12:03:56 UTC
QEMU crash cannot ever be a libvirt issue.

You are right there's no difference between p2p and non-p2p migrations in
libvirt's interaction with QEMU. In p2p mode a client (virsh) tells the source
libvirtd to migrate a domain and this source libvirtd calls the APIs for
individual migration phases either locally or by a direct connection to the
destination libvirtd. On the other hand in non-p2p mode virsh calls these APIs
itself. That is, it calls Begin API on the source libvird, waits for the
result, then it calls Prepare on the destination followed by Perform on the
source and so on. So the timing may differ, but the action performed are the
same in both modes.

Comment 8 Li Xiaohui 2022-09-01 06:34:17 UTC
Thanks for the explanation about p2p and non-p2p migration.


I would close this bug as duplicated with Bug 2110203 since it avoids to fail migration.

*** This bug has been marked as a duplicate of bug 2110203 ***

Comment 9 Leonardo Bras 2022-09-02 13:58:48 UTC
(In reply to Li Xiaohui from comment #6)
> [...]
> Src qemu crashes after getting migration failed with zerocopy + tls enabled.

Src qemu crash is a big problem, so I want to better understand this:
Does it reproduce in latest (RHEL8.7) qemu version?

Comment 10 Li Xiaohui 2022-09-05 04:07:47 UTC
(In reply to Leonardo Bras from comment #9)
> (In reply to Li Xiaohui from comment #6)
> > [...]
> > Src qemu crashes after getting migration failed with zerocopy + tls enabled.
> 
> Src qemu crash is a big problem, so I want to better understand this:
> Does it reproduce in latest (RHEL8.7) qemu version?

No. we don't hit this bug on the latest RHEL 8.7 qemu.


Note You need to log in before you can comment on or make changes to this bug.