RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2196289 - Fix number of ready channels on multifd
Summary: Fix number of ready channels on multifd
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Leonardo Bras
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-08 15:59 UTC by Juan Quintela
Modified: 2023-11-07 09:21 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-8.0.0-5.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-07 08:27:35 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-156629 0 None None None 2023-05-08 16:01:17 UTC
Red Hat Product Errata RHSA-2023:6368 0 None None None 2023-11-07 08:28:33 UTC

Description Juan Quintela 2023-05-08 15:59:37 UTC
Description of problem:

When using multifd and the network is slow or it is saturated, the migration thread busy waits for a channel to become ready.

Version-Release number of selected component (if applicable):

All qemu-kvm versions with multifd enabled.

How reproducible:

100%.  The network needs to be very slow.

Steps to Reproduce:
1. Configure a network that is slow.
2. Configure number of multifd channels to a big number
3. Set the bandwidth to a very small number

Actual results:

The migration thread is busy waiting for a channel to become ready.

Expected results:

The migration thread is waiting in a semaphare for a channel to become ready without wasting CPU.

Additional info:

Comment 1 Juan Quintela 2023-05-08 16:07:30 UTC
There is an uptodate patchset to fix this issue:

https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg04562.html

Comment 2 Juan Quintela 2023-05-08 16:26:03 UTC
Upstream commit:

commit d2026ee117147893f8d80f060cede6d872ecbd7f
Author: Juan Quintela <quintela>
Date:   Wed Apr 26 12:20:36 2023 +0200

    multifd: Fix the number of channels ready

Comment 4 Li Xiaohui 2023-05-16 11:04:05 UTC
Discuss the reproduction steps through gchat with Juan:
1. set a small network like 1MB/s for migration bandwidth;
2. set very few multifd channels (1-2);


Before the fix, we would see the main migration thread is busy waiting, i.e. CPU = 100%
After fix, the cpu usage of the main migration thread should be small


I would test following the above steps before and after the fix.



Thank you Juan.

Comment 5 Li Xiaohui 2023-06-06 03:26:37 UTC
Hi Leonardo,
What's our fixed plan for this bug?  I see the ITR is set to RHEL 9.3.0. Can you help set a proper DTM?

Comment 7 Yanan Fu 2023-06-15 03:28:31 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 10 Li Xiaohui 2023-07-10 10:08:26 UTC
Extend ITM to 20 as the reproduction steps is not clear, I need more time to test and get confirmation from the Juan / Leonardo

Comment 11 Li Xiaohui 2023-07-11 10:32:14 UTC
Hi all, I did some tests on qemu-kvm-8.0.0-1.el9.x86_64 and qemu-kvm-8.0.0-7.el9.x86_64

Test steps:
1. Enable multifd capability and set multifd channel to 1 for src and dst host;
2. Set 0 (no limitation) for migration bandwidth;
3. Run stressapptest in VM;
# stressapptest -M 10000 -s 1000000
4. Start to migrate VM from src to dst host
Note: the nic support 200G bandwidth.


Before the fix (qemu-kvm-8.0.0-1.el9.x86_64), see the cpu usage of the main migration thread is busy on the src host:
live_migration thread is 85.0%, but multifdsend thread is 17.9%
after 1 second: live_migration thread changes to 18.0%, multifdsend thread changes to 4.0%

After fix (qemu-kvm-8.0.0-7.el9.x86_64), the cpu usage of the main migration thread is small:
live_migration thread is 8.3%, multifdsend thread is 9.7%



Also test the above scenario (but set multifd channel to 10) on qemu-kvm-8.0.0-7.el9.x86_64, the cpu usage is like below:
live migration thread is 9.3%,
4 multifdsend threads are 2.3%, 3 multifdsend threads are 2.0%, 3 multifdsend threads are 1.7%


Per above test results, I think we can mark this bug verified. 

Juan, Leonardo, how do you think?

Comment 12 Juan Quintela 2023-07-12 13:06:04 UTC
It looks correct.

Thanks very much.

Comment 13 Li Xiaohui 2023-07-13 02:36:19 UTC
Thanks for the reivew.

Mark bug verified per Comment 11 and Comment 12.

I would add one case to monitor cpu usage of the live migration thread and the multifdsend threads later.

Comment 15 errata-xmlrpc 2023-11-07 08:27:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368


Note You need to log in before you can comment on or make changes to this bug.