RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1982993 - migrate failed on no-socket protocol when open multifd
Summary: migrate failed on no-socket protocol when open multifd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.5
Hardware: All
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 8.6
Assignee: Juan Quintela
QA Contact: Li Xiaohui
Jiri Herrmann
URL:
Whiteboard:
Depends On: 2027716
Blocks: 1875540 2021968 2025604
TreeView+ depends on / blocked
 
Reported: 2021-07-16 08:26 UTC by Yiding Liu (Fujitsu)
Modified: 2022-05-11 16:48 UTC (History)
18 users (show)

Fixed In Version: qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949
Doc Type: Bug Fix
Doc Text:
.`multifd` migration now works reliably Previously, attempting to migrate a virtual machine (VM) using the `multifd` feature of QEMU caused the migration to fail and the VM to terminate unexpectedly. The underlying code has been fixed, and `multifd` migration now works as expected.
Clone Of:
: 2021968 2025604 (view as bug list)
Environment:
Last Closed: 2022-05-10 13:20:14 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Gitlab redhat/rhel/src/qemu-kvm qemu-kvm merge_requests 57 0 None None None 2021-11-09 15:03:41 UTC
Red Hat Product Errata RHSA-2022:1759 0 None None None 2022-05-10 13:21:17 UTC

Description Yiding Liu (Fujitsu) 2021-07-16 08:26:09 UTC
Description of problem:

migrate failed on no-socket protocl when we open multifd


How reproducible: 100%


Steps to Reproduce:
1. /usr/libexec/qemu-kvm -cpu host -m 2g -monitor stdio
2. migrate_set_capability multifd on
3. migrate "exec:gzip -c > STATEFILE.gz"


Actual results:
Segmentation fault (core dumped)

Expected results:
cmd exit normally (success or fail)

Additional info:
Upstream fix:
https://lists.gnu.org/archive/html/qemu-devel/2021-07/msg04247.html

Comment 1 Li Xiaohui 2021-07-19 12:37:45 UTC
I could also get qemu core dump when do offline migration with multifd enabled on the latest rhelav-8.5.0(qemu-kvm-6.0.0-24.module+el8.5.0+11844+1e3017bd.x86_64).
But as I know, we shouldn't do offline migration with multifd enabled because multifd only supports live migration.
Juan, is right?


I also tried libvirt to test multifd + offline migration, get libvirt doesn't support such combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64).
Fangge, could you help check whether the below libvirt cmd is right?
[root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose --parallel 
error: command 'save' doesn't support option --parallel

Comment 2 Fangge Jin 2021-07-19 12:57:25 UTC
(In reply to Li Xiaohui from comment #1)

> I also tried libvirt to test multifd + offline migration, get libvirt
> doesn't support such
> combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64).
> Fangge, could you help check whether the below libvirt cmd is right?
> [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose
> --parallel 
> error: command 'save' doesn't support option --parallel

In libvirt, save doesn't support option --parallel for now. Is there any
reason to support it?

Comment 3 Li Xiaohui 2021-07-19 13:04:15 UTC
(In reply to Fangge Jin from comment #2)
> (In reply to Li Xiaohui from comment #1)
> 
> > I also tried libvirt to test multifd + offline migration, get libvirt
> > doesn't support such
> > combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64).
> > Fangge, could you help check whether the below libvirt cmd is right?
> > [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose
> > --parallel 
> > error: command 'save' doesn't support option --parallel
> 
> In libvirt, save doesn't support option --parallel for now. Is there any
> reason to support it?

Hi Yiding,
Could you help answer above question?

Comment 4 Yiding Liu (Fujitsu) 2021-07-20 01:34:42 UTC
Hi, Xiaohui

(In reply to Li Xiaohui from comment #3)
> (In reply to Fangge Jin from comment #2)
> > (In reply to Li Xiaohui from comment #1)
> > 
> > > I also tried libvirt to test multifd + offline migration, get libvirt
> > > doesn't support such
> > > combination(libvirt-client-7.5.0-1.module+el8.5.0+11664+59f87560.x86_64).
> > > Fangge, could you help check whether the below libvirt cmd is right?
> > > [root@ibm-x3250m6-07 home]# virsh save rhel850 rhel850.gzip --verbose
> > > --parallel 
> > > error: command 'save' doesn't support option --parallel
> > 
> > In libvirt, save doesn't support option --parallel for now. Is there any
> > reason to support it?

No, we don't need to support it in libvirt if it is not supported.

Just output an error in qemu like libvirt instead of core dumped. 
Otherwise users will be confused with the core dumped.

> 
> Hi Yiding,
> Could you help answer above question?

Comment 5 John Ferlan 2021-07-26 19:18:29 UTC
Assigned to Meirav for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 6 Dr. David Alan Gilbert 2021-07-29 10:56:16 UTC
There's some (unreviewed) patches on the upstream list that might fix the crash:

 935   T 16/07 Li Zhijian        (130) [PATCH 1/2] migration: allow multifd for socket protocol only
 936 O T 16/07 Li Zhijian        (  0) └─>[PATCH 2/2] migration: allow enabling mutilfd for specific protocol only
 937 O T 19/07 lizhijian@fujitsu (  0)   └─>回复: [PATCH 2/2] migration: allow enabling mutilfd for specific protocol only

Comment 7 Juan Quintela 2021-09-09 11:17:04 UTC
Hi
PULL request has been senst upstream.

Comment 8 John Ferlan 2021-09-17 15:08:15 UTC
Since this is a Fujitsu related bug, I did not create a RHEL9 specific clone or move the bug to RHEL9. If resolution needs testing in RHEL9, then a clone should be created when the bug is resolved for RHEL8.

Comment 9 Juan Quintela 2021-11-09 14:44:13 UTC
Update Product to Advanced Virtualization.

Normal RHEL don't have multifd at all.

Comment 10 Juan Quintela 2021-11-09 15:23:33 UTC
This is the brew for the merge request.

https://bugzilla.redhat.com/show_bug.cgi?id=1982224

xiaohli: Could you test?
         And once there, can you give qa+ back?
         The problem was on AV, not on normal one.

Comment 12 Li Xiaohui 2021-11-10 01:52:14 UTC
(In reply to Juan Quintela from comment #10)
> This is the brew for the merge request.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1982224

Juan, which brew does you mean?

> 
> xiaohli: Could you test?
>          And once there, can you give qa+ back?
>          The problem was on AV, not on normal one.

Comment 15 Juan Quintela 2021-11-10 13:27:58 UTC
Hi xiaohui

I posted the wrong link.


https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41116300

THis one should work.

Comment 16 Li Xiaohui 2021-11-11 12:49:11 UTC
(In reply to Juan Quintela from comment #15)
> Hi xiaohui
> 
> I posted the wrong link.
> 
> 
> https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=41116300
> 
> THis one should work.

It did work. 


When test offline migration with multifd enabled, migration will fail and hmp will give error:
1) src qmp
{"execute":"migrate", "arguments":{"uri": "exec:gzip -c > STATEFILE.gz"}}
{"return": {}}
{"execute":"query-migrate"}
{"return": {"blocked": false, "status": "failed"}}
2) src hmp:
(qemu) multifd is not supported by current protocol


Juan, can't we give a qmp error like hmp's when start offline migration rather than give a return: '{"return": {}}'?

Comment 17 Juan Quintela 2021-11-12 08:30:37 UTC
Hi Li

Thanks for the testing.

I will change upstream to give you an error.

But should we left that out of this bugzilla?  Do you want another bugzilla for the error on qmp?

Thanks, Juan.

Comment 18 Li Xiaohui 2021-11-12 10:22:05 UTC
(In reply to Juan Quintela from comment #17)
> Hi Li
> 
> Thanks for the testing.
> 
> I will change upstream to give you an error.
> 
> But should we left that out of this bugzilla? 

Agree(I mean fix it on upstream) because layered product can't reproduce, and now qemu has given migration failed, vm still works on source host.

In avoid to miss this issue, could you inform me which qemu-kvm fix it so that I could have a test after the patch merged?

> Do you want another bugzilla for the error on qmp?
> 
> Thanks, Juan.

Comment 19 Juan Quintela 2021-11-12 11:08:15 UTC
Hi Xiaohui

Ok, I will give you after the patch is merged.

Later, Juan.

Comment 21 John Ferlan 2021-11-23 20:42:14 UTC
Now that the RHEL-AV z-stream bug has been created, moving this bug to RHEL 8.6.0 for resolution/testing.

Comment 23 John Ferlan 2021-11-23 20:45:34 UTC
Oh the irony of moving the bug from RHEL-AV to RHEL - all the flags are lost...  Interestingly, the blocker+ flag stays, but we lose the release+ because qa_ack? and devel_ack? get reset. In any case, can we please just get the qa_ack+ again to make it "official".  Lots of churn with this one just to create an AV z-stream.

Comment 25 John Ferlan 2021-12-22 18:01:48 UTC
Mass update of DTM/ITM to +3 values since the rebase of qemu-6.2 into RHEL 8.6 has been delayed or slowed due to process roadblocks (authentication changes, gating issues). This avoids the DevMissed bot and worse the bot that could come along and strip release+. The +3 was chosen mainly to give a cushion. 

Also added the qemu-6.2 rebase bug 2027716 as a dependent.

Comment 28 Yanan Fu 2021-12-24 02:48:07 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 29 Li Xiaohui 2021-12-27 04:46:22 UTC
Test again on qemu-kvm-6.2.0-1.module+el8.6.0+13725+61ae1949.x86_64, test results are like Comment 16.

I would mark bz verified per Comment 16~19. 

Juan, please remind fix the qmp issue in later qemu version, thank you.

Comment 32 errata-xmlrpc 2022-05-10 13:20:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: virt:rhel and virt-devel:rhel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1759


Note You need to log in before you can comment on or make changes to this bug.