Bug 2196289

Summary: Fix number of ready channels on multifd
Product: Red Hat Enterprise Linux 9 Reporter: Juan Quintela <quintela>
Component: qemu-kvmAssignee: Leonardo Bras <leobras>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: chayang, jinzhao, juzhang, leobras, nilal, peterx, quintela, virt-maint
Version: 9.3Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-8.0.0-5.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-11-07 08:27:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Juan Quintela 2023-05-08 15:59:37 UTC
Description of problem:

When using multifd and the network is slow or it is saturated, the migration thread busy waits for a channel to become ready.

Version-Release number of selected component (if applicable):

All qemu-kvm versions with multifd enabled.

How reproducible:

100%.  The network needs to be very slow.

Steps to Reproduce:
1. Configure a network that is slow.
2. Configure number of multifd channels to a big number
3. Set the bandwidth to a very small number

Actual results:

The migration thread is busy waiting for a channel to become ready.

Expected results:

The migration thread is waiting in a semaphare for a channel to become ready without wasting CPU.

Additional info:

Comment 1 Juan Quintela 2023-05-08 16:07:30 UTC
There is an uptodate patchset to fix this issue:

https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg04562.html

Comment 2 Juan Quintela 2023-05-08 16:26:03 UTC
Upstream commit:

commit d2026ee117147893f8d80f060cede6d872ecbd7f
Author: Juan Quintela <quintela>
Date:   Wed Apr 26 12:20:36 2023 +0200

    multifd: Fix the number of channels ready

Comment 4 Li Xiaohui 2023-05-16 11:04:05 UTC
Discuss the reproduction steps through gchat with Juan:
1. set a small network like 1MB/s for migration bandwidth;
2. set very few multifd channels (1-2);


Before the fix, we would see the main migration thread is busy waiting, i.e. CPU = 100%
After fix, the cpu usage of the main migration thread should be small


I would test following the above steps before and after the fix.



Thank you Juan.

Comment 5 Li Xiaohui 2023-06-06 03:26:37 UTC
Hi Leonardo,
What's our fixed plan for this bug?  I see the ITR is set to RHEL 9.3.0. Can you help set a proper DTM?

Comment 7 Yanan Fu 2023-06-15 03:28:31 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 10 Li Xiaohui 2023-07-10 10:08:26 UTC
Extend ITM to 20 as the reproduction steps is not clear, I need more time to test and get confirmation from the Juan / Leonardo

Comment 11 Li Xiaohui 2023-07-11 10:32:14 UTC
Hi all, I did some tests on qemu-kvm-8.0.0-1.el9.x86_64 and qemu-kvm-8.0.0-7.el9.x86_64

Test steps:
1. Enable multifd capability and set multifd channel to 1 for src and dst host;
2. Set 0 (no limitation) for migration bandwidth;
3. Run stressapptest in VM;
# stressapptest -M 10000 -s 1000000
4. Start to migrate VM from src to dst host
Note: the nic support 200G bandwidth.


Before the fix (qemu-kvm-8.0.0-1.el9.x86_64), see the cpu usage of the main migration thread is busy on the src host:
live_migration thread is 85.0%, but multifdsend thread is 17.9%
after 1 second: live_migration thread changes to 18.0%, multifdsend thread changes to 4.0%

After fix (qemu-kvm-8.0.0-7.el9.x86_64), the cpu usage of the main migration thread is small:
live_migration thread is 8.3%, multifdsend thread is 9.7%



Also test the above scenario (but set multifd channel to 10) on qemu-kvm-8.0.0-7.el9.x86_64, the cpu usage is like below:
live migration thread is 9.3%,
4 multifdsend threads are 2.3%, 3 multifdsend threads are 2.0%, 3 multifdsend threads are 1.7%


Per above test results, I think we can mark this bug verified. 

Juan, Leonardo, how do you think?

Comment 12 Juan Quintela 2023-07-12 13:06:04 UTC
It looks correct.

Thanks very much.

Comment 13 Li Xiaohui 2023-07-13 02:36:19 UTC
Thanks for the reivew.

Mark bug verified per Comment 11 and Comment 12.

I would add one case to monitor cpu usage of the live migration thread and the multifdsend threads later.

Comment 15 errata-xmlrpc 2023-11-07 08:27:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: qemu-kvm security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6368