Description of problem: Multifd sync optimization Version-Release number of selected component (if applicable): 9.3 How reproducible: It is an optimiation, it is always "pessimized" in released versions. Steps to Reproduce: 1. Do a normal migration with multifd enabled Actual results: We synchonize every channel 10 times by second (once for each RAM section) Expected results: We syncrhonize once every time that we go through all the guest memory (i.e. every several seconds/minutes depending on guest RAM size/network speed) Additional info:
This is the upstream patchset that implements it: https://lists.gnu.org/archive/html/qemu-devel/2023-04/msg01488.html
Commint ids: commit 294e5a4034e81b3d8db03b4e0f691386f20d6ed3 Author: Juan Quintela <quintela> Date: Tue Jun 21 13:36:11 2022 +0200 multifd: Only flush once each full round of memory commit b05292c237030343516d073b1a1e5f49ffc017a8 Author: Juan Quintela <quintela> Date: Tue Jun 21 12:21:32 2022 +0200 multifd: Protect multifd_send_sync_main() calls commit 77c259a4cb1c9799754b48f570301ebf1de5ded8 Author: Juan Quintela <quintela> Date: Tue Jun 21 12:13:14 2022 +0200 multifd: Create property multifd-flush-after-each-section
Hi Juan, Can you help check if the fix has been downstream? If not, when will it be downstream? Please also help reset a proper DTM since it has passed.
Not downstream yet, I am going to post that during this week, sorry.
Hi Juan, I add the RFE keyword for this bug since this bug is an optimiation per the description from you. Please correct if not right
Add one needinfo to Nitesh to avoid miss ITM and release+ according to Comment 9 since don't know Juan when will be back,
(In reply to Li Xiaohui from comment #10) > Add one needinfo to Nitesh to avoid miss ITM and release+ according to > Comment 9 since don't know Juan when will be back, I think we should wait for an update till next week. If we don't have any updates then we can discuss if it's still possible to get this in 9.3 or do we want to move it to 9.4. What do you think?
(In reply to Nitesh Narayan Lal from comment #11) > (In reply to Li Xiaohui from comment #10) > > Add one needinfo to Nitesh to avoid miss ITM and release+ according to > > Comment 9 since don't know Juan when will be back, > > I think we should wait for an update till next week. > If we don't have any updates then we can discuss if it's still possible to > get this in 9.3 or do we want to move it to 9.4. > What do you think? No problem
https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53970406 Here is the brew. I think it is don.
Xiaouhi, could you test, please?
New brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=53984486 New gitlab merge request: https://gitlab.com/redhat/centos-stream/src/qemu-kvm/-/merge_requests/186
(In reply to Juan Quintela from comment #15) > Xiaouhi, could you test, please? Of course. I will provide the test results after testing. Juan, can you help give guidance on how to test the fix? What's our expectation after the fix? What would be seen before the fix?
Hi Xiaohui If you are using multiple channels (let say 16) and specially if the network is fast and you have lots of memory, you should see that you are not using all the available network bandwidth. Right now, we are flushing and synchronizing all channels 10 times a second. With the change we will be flushing only every iteration over RAM (for 1TB guest, probably every minute or so). Zero copy should be more impacted than normal multifd. If this is enough for finding differences, great. Otherwise let me know and I will try to get with a better test. Thanks, Juan.
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.
Hi you need to add "-global migration.multifd-flush-after-each-section=off" to the command line. Due to the case that we decided not to use 9.3 machine, we can't enable it by default. property=off -> new behaviour property=on -> old behaviour
Hi Juan, (In reply to Juan Quintela from comment #24) > Hi > you need to add "-global migration.multifd-flush-after-each-section=off" > to the command line. > > Due to the case that we decided not to use 9.3 machine, we can't enable it > by default. So you mean the rhel9.3 machine type will set the multifd-flush-after-each-section to off by default in the future? But now we don't have the rhel9.3 machine type on RHEL 9.3 hosts, the highest machine type is rhel9.2. multifd-flush-after-each-section is on by default on rhel9.2 machine type, so we need to set it to off, right? Note: 'by default' I mean we don't need to add the qemu cmd 'multifd-flush-after-each-section=xx'. What's more, will libvirt support multifd-flush-after-each-section=off on RHEL 9.3? I want to know when will libvirt support this optimiation. > > property=off -> new behaviour > property=on -> old behaviour
Hi Juan, I have tried to do some tests on qemu-kvm-8.0.0-7.el9.x86_64 && qemu-kvm-8.0.0-9.el9.x86_64 Hosts info: 256 cpus and 1510G memory, nic speed between src and dst host is 200G Guest info: VM with 400G memory and 64 cpus In VM, use stressapptest to load some memory workload: # stressapptest -M 20000 -s 1000000 QMP cmd: enable multifd capability and set multifd channel to 20. BTW, set max-bandwidth to 0 on src host 1. On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 (1) with '-global migration.multifd-flush-after-each-section=off' qemu cmd: I could see migration finish immediately (total time: 51701 ms) (2) with '-global migration.multifd-flush-after-each-section=on' qemu cmd: Migration needs some time to finish. The total time is much longer than multifd-flush-after-each-section=off -> total time: 305990 ms And comparing the throughput during migration is active, I can see (1) is better than (2) on the utilization of bandwidth. But (1) doesn't use all the available network bandwidth (200G), the max throughput is nearly 70G 2. On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64 (1) When I try to boot VM with -global migration.multifd-flush-after-each-section=xx, qemu would prompt: (qemu) 2023-07-27T10:11:19.936315Z qemu-kvm: can't apply global migration.multifd-flush-after-each-section=on: Property 'migration.multifd-flush-after-each-section' not found So the parameter migration.multifd-flush-after-each-section is new and brought by qemu-kvm-8.0.0-9.el9? (2) When don't add -global migration.multifd-flush-after-each-section=xx, find the migration total time and the utilization of bandwidth is similar to the result of '-global migration.multifd-flush-after-each-section=off' on qemu-kvm-8.0.0-9.el9.x86_64. migration total time: 48392 ms; the utilization of bandwidth also looks good. Regarding the test results of 1-(1) and 2-(2), I don't see any migration performance (total time and the utilization of bandwidth) improvement on the fixed qemu version (qemu-kvm-8.0.0-9.el9.x86_64). In my previous thought, I guess the migration performance should be better after fixing the bug. But now they have no performance difference. I don't quite understand. Can you help explain? So why do we introduce the new parameter migration.multifd-flush-after-each-section? What I can see is on the fixed qemu version (qemu-kvm-8.0.0-9.el9.x86_64), the migration performance gets improved with multifd-flush-after-each-section to off than to on.
(In reply to Li Xiaohui from comment #25) > Hi Juan, > > (In reply to Juan Quintela from comment #24) > > Hi > > you need to add "-global migration.multifd-flush-after-each-section=off" > > to the command line. > > > > Due to the case that we decided not to use 9.3 machine, we can't enable it > > by default. > > So you mean the rhel9.3 machine type will set the > multifd-flush-after-each-section to off by default in the future? Yeap. Until we don't have a new machine type, we can't enable it by default, will break migration from previous qemu. > But now we don't have the rhel9.3 machine type on RHEL 9.3 hosts, the > highest machine type is rhel9.2. > multifd-flush-after-each-section is on by default on rhel9.2 machine type, > so we need to set it to off, right? To use it, we need to do that. > Note: 'by default' I mean we don't need to add the qemu cmd > 'multifd-flush-after-each-section=xx'. Exactly. This "improvement" is not used by default until we have a new machine type. Until them, it needs to be used by setting the 'multifd-flush-after-each-section=off'. > What's more, will libvirt support multifd-flush-after-each-section=off on > RHEL 9.3? I don't think we are going to use that, except if CNV/Openstack are going to use it. > I want to know when will libvirt support this optimiation. We don't know. @jdenemar any idea? I know we ask late, but it was "surprising" that we don't have the new machine type.
(In reply to Li Xiaohui from comment #26) > Hi Juan, > > I have tried to do some tests on qemu-kvm-8.0.0-7.el9.x86_64 && > qemu-kvm-8.0.0-9.el9.x86_64 > > Hosts info: 256 cpus and 1510G memory, nic speed between src and dst host is > 200G > Guest info: VM with 400G memory and 64 cpus > In VM, use stressapptest to load some memory workload: # stressapptest -M > 20000 -s 1000000 > > QMP cmd: enable multifd capability and set multifd channel to 20. BTW, set > max-bandwidth to 0 on src host > > 1. On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 > (1) with '-global migration.multifd-flush-after-each-section=off' qemu cmd: > I could see migration finish immediately (total time: 51701 ms) > (2) with '-global migration.multifd-flush-after-each-section=on' qemu cmd: > Migration needs some time to finish. The total time is much longer than > multifd-flush-after-each-section=off -> total time: 305990 ms > > And comparing the throughput during migration is active, I can see (1) is > better than (2) on the utilization of bandwidth. But (1) doesn't use all the > available network bandwidth (200G), the max throughput is nearly 70G Two things, the speedup is considerable. 51 seconds vs 305 seconds. Two things, could you add the downtime of both cases? and tell me how much do you wait until you launch the migrate command? > 2. On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64 > (1) When I try to boot VM with -global > migration.multifd-flush-after-each-section=xx, qemu would prompt: > (qemu) 2023-07-27T10:11:19.936315Z qemu-kvm: can't apply global > migration.multifd-flush-after-each-section=on: Property > 'migration.multifd-flush-after-each-section' not found > > So the parameter migration.multifd-flush-after-each-section is new and > brought by qemu-kvm-8.0.0-9.el9? Yeap, it is new, added with the series for this bugzilla. > (2) When don't add -global migration.multifd-flush-after-each-section=xx, > find the migration total time and the utilization of bandwidth is similar to > the result of '-global migration.multifd-flush-after-each-section=off' on > qemu-kvm-8.0.0-9.el9.x86_64. > migration total time: 48392 ms; the utilization of bandwidth also looks good. Can you check migration from old <-> new? with migration.multifd-flush-after-each-section=on And see that it works? > Regarding the test results of 1-(1) and 2-(2), I don't see any migration > performance (total time and the utilization of bandwidth) improvement on the > fixed qemu version (qemu-kvm-8.0.0-9.el9.x86_64). > In my previous thought, I guess the migration performance should be better > after fixing the bug. But now they have no performance difference. I don't > quite understand. Can you help explain? I will look at this. Or I messed things up during the backport, or this don't make anysense. > So why do we introduce the new parameter > migration.multifd-flush-after-each-section? What I can see is on the fixed > qemu version (qemu-kvm-8.0.0-9.el9.x86_64), the migration performance gets > improved with multifd-flush-after-each-section to off than to on. optimization: variable to "off". For your previous result, two things. Could you told me how much it takes the migration for: qemu-kvm-8.0.0-7.el9.x86_64 And see if it gives you ~50 seconds or around 300 seconds.
Hi Juan, the followings are the migration information about 1-(1), 1-(2) and 2-(2) of Comment 28. I'm not sure if the below data meet your questions. You can add comments again if don't 1-(1) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 51701 ms downtime: 1643 ms setup: 1382 ms transferred ram: 52067927 kbytes throughput: 8476.88 mbps remaining ram: 0 kbytes total ram: 419451592 kbytes duplicate: 97827864 pages skipped: 0 pages normal: 12769250 pages normal bytes: 51077000 kbytes dirty sync count: 5 page size: 4 kbytes multifd bytes: 51208105 kbytes pages-per-second: 1988368 precopy ram: 859704 kbytes downtime ram: 118 kbytes 1-(2) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 305990 ms downtime: 8383 ms setup: 1966 ms transferred ram: 86113793 kbytes throughput: 2320.38 mbps remaining ram: 0 kbytes total ram: 419451592 kbytes duplicate: 98056257 pages skipped: 0 pages normal: 21249711 pages normal bytes: 84998844 kbytes dirty sync count: 8 page size: 4 kbytes multifd bytes: 85251960 kbytes pages-per-second: 1979446 precopy ram: 861627 kbytes downtime ram: 205 kbytes 2-(2) (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 48392 ms downtime: 3920 ms setup: 679 ms transferred ram: 49725847 kbytes throughput: 8537.76 mbps remaining ram: 0 kbytes total ram: 419451592 kbytes duplicate: 97840241 pages skipped: 0 pages normal: 12179958 pages normal bytes: 48719832 kbytes dirty sync count: 4 page size: 4 kbytes multifd bytes: 48865917 kbytes pages-per-second: 1916735 precopy ram: 859724 kbytes downtime ram: 205 kbytes
Keep the needinfo to @jdenemar about Comment 27
Hi, I also test without migration.multifd-flush-after-each-section parameter on qemu-kvm-8.0.0-9.el9.x86_64, find the migration total time is nearly 5 mins, total data likes the result of migration.multifd-flush-after-each-section=on. So I guess migration.multifd-flush-after-each-section is on by default on qemu-kvm-8.0.0-9.el9.x86_64
The migration info of Comment 31: (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: completed total time: 281721 ms downtime: 1158 ms setup: 1128 ms transferred ram: 76548070 kbytes throughput: 2234.87 mbps remaining ram: 0 kbytes total ram: 419451592 kbytes duplicate: 98003091 pages skipped: 0 pages normal: 18864139 pages normal bytes: 75456556 kbytes dirty sync count: 6 page size: 4 kbytes multifd bytes: 75686704 kbytes pages-per-second: 2015168 precopy ram: 861211 kbytes downtime ram: 154 kbytes
(In reply to Li Xiaohui from comment #31) > Hi, I also test without migration.multifd-flush-after-each-section parameter > on qemu-kvm-8.0.0-9.el9.x86_64, find the migration total time is nearly 5 > mins, total data likes the result of > migration.multifd-flush-after-each-section=on. > > So I guess migration.multifd-flush-after-each-section is on by default on > qemu-kvm-8.0.0-9.el9.x86_64 Then it is right.
Ok, seing all the comments, I think this bug is fixed and correct, no?
/me rereads all the numbers again. Notice that I have rearranged it: 1 - On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64: total time: 48392 ms downtime: 3920 ms transferred ram: 49725847 kbytes throughput: 8537.76 mbps duplicate: 97840241 pages normal: 12179958 pages dirty sync count: 4 multifd bytes: 48865917 kbytes 2a - On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 with '-global migration.multifd-flush-after-each-section=off total time: 51701 ms downtime: 1643 ms transferred ram: 52067927 kbytes throughput: 8476.88 mbps duplicate: 97827864 pages normal: 12769250 pages dirty sync count: 5 multifd bytes: 51208105 kbytes 2b - with '-global migration.multifd-flush-after-each-section=off total time: 305990 ms downtime: 8383 ms transferred ram: 86113793 kbytes throughput: 2320.38 mbps duplicate: 98056257 pages normal: 21249711 pages normal bytes: 84998844 kbytes dirty sync count: 8 multifd bytes: 85251960 kbytes 1 and 2b should be around the same values, but I see that 2b is way, way worse. Can I ask how many times have the test beeing run? Just wondering if the test is not good enough to detect this problem. The diffs between 2a and 2b is what I would expect for the change. But between 1 and 2b should be almost no differences, so I am getting another look at the values. I would apprecciate if you could run 3 for times each of 1 and 2b and see how consistent the values are between iterations. I am taking a look at the code right now.
(In reply to Juan Quintela from comment #35) > /me rereads all the numbers again. Notice that I have rearranged it: > > 1 - On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64: > > total time: 48392 ms > downtime: 3920 ms > transferred ram: 49725847 kbytes > throughput: 8537.76 mbps > duplicate: 97840241 pages > normal: 12179958 pages > dirty sync count: 4 > multifd bytes: 48865917 kbytes > > 2a - On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 > > with '-global migration.multifd-flush-after-each-section=off > > total time: 51701 ms > downtime: 1643 ms > transferred ram: 52067927 kbytes > throughput: 8476.88 mbps > duplicate: 97827864 pages > normal: 12769250 pages > dirty sync count: 5 > multifd bytes: 51208105 kbytes > > 2b - with '-global migration.multifd-flush-after-each-section=off I think it should be a typo by you. 2b - with '-global migration.multifd-flush-after-each-section=on' > > total time: 305990 ms > downtime: 8383 ms > transferred ram: 86113793 kbytes > throughput: 2320.38 mbps > duplicate: 98056257 pages > normal: 21249711 pages > normal bytes: 84998844 kbytes > dirty sync count: 8 > multifd bytes: 85251960 kbytes > > 1 and 2b should be around the same values, but I see that 2b is way, way > worse. > Can I ask how many times have the test beeing run? Just wondering if the > test is not good enough to detect this problem. > > The diffs between 2a and 2b is what I would expect for the change. But > between 1 and 2b should be almost no differences, so I am getting another > look at the values. I would apprecciate if you could run 3 for times each > of 1 and 2b and see how consistent the values are between iterations. I would do it now. > I am taking a look at the code right now.
Hi Juan, (In reply to Li Xiaohui from comment #36) > (In reply to Juan Quintela from comment #35) > > /me rereads all the numbers again. Notice that I have rearranged it: > > > > 1 - On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64: > > > > total time: 48392 ms > > downtime: 3920 ms > > transferred ram: 49725847 kbytes > > throughput: 8537.76 mbps > > duplicate: 97840241 pages > > normal: 12179958 pages > > dirty sync count: 4 > > multifd bytes: 48865917 kbytes > > > > 2a - On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 > > > > with '-global migration.multifd-flush-after-each-section=off > > > > total time: 51701 ms > > downtime: 1643 ms > > transferred ram: 52067927 kbytes > > throughput: 8476.88 mbps > > duplicate: 97827864 pages > > normal: 12769250 pages > > dirty sync count: 5 > > multifd bytes: 51208105 kbytes > > > > 2b - with '-global migration.multifd-flush-after-each-section=off > > I think it should be a typo by you. > 2b - with '-global migration.multifd-flush-after-each-section=on' > > > > > total time: 305990 ms > > downtime: 8383 ms > > transferred ram: 86113793 kbytes > > throughput: 2320.38 mbps > > duplicate: 98056257 pages > > normal: 21249711 pages > > normal bytes: 84998844 kbytes > > dirty sync count: 8 > > multifd bytes: 85251960 kbytes > > > > 1 and 2b should be around the same values, but I see that 2b is way, way > > worse. > > Can I ask how many times have the test beeing run? Just wondering if the > > test is not good enough to detect this problem. > > > > The diffs between 2a and 2b is what I would expect for the change. But > > between 1 and 2b should be almost no differences, so I am getting another > > look at the values. I would apprecciate if you could run 3 for times each > > of 1 and 2b and see how consistent the values are between iterations. > Repeat 6 times for 1 (qemu-kvm-8.0.0-7.el9.x86_64) and 2b (qemu-kvm-8.0.0-9.el9.x86_64 with '-global migration.multifd-flush-after-each-section=on') For 1: Total time: 49438 Downtime: 4900 Total time: 42173 Downtime: 6692 Total time: 51555 Downtime: 632 Total time: 72926 Downtime: 8296 Total time: 76313 Downtime: 5750 Total time: 69685 Downtime: 6320 For 2b: Total time: 50232 Downtime: 3383 Total time: 56592 Downtime: 7028 Total time: 263880 Downtime: 872 Total time: 289918 Downtime: 665 Total time: 72666 Downtime: 1242 Total time: 251911 Downtime: 813 > > > I am taking a look at the code right now.
(In reply to Li Xiaohui from comment #37) > Hi Juan, > > (In reply to Li Xiaohui from comment #36) > > (In reply to Juan Quintela from comment #35) > > > /me rereads all the numbers again. Notice that I have rearranged it: > > > > > > 1 - On the unfixed version -> qemu-kvm-8.0.0-7.el9.x86_64: > > > > > > total time: 48392 ms > > > downtime: 3920 ms > > > transferred ram: 49725847 kbytes > > > throughput: 8537.76 mbps > > > duplicate: 97840241 pages > > > normal: 12179958 pages > > > dirty sync count: 4 > > > multifd bytes: 48865917 kbytes > > > > > > 2a - On the fixed version -> qemu-kvm-8.0.0-9.el9.x86_64 > > > > > > with '-global migration.multifd-flush-after-each-section=off > > > > > > total time: 51701 ms > > > downtime: 1643 ms > > > transferred ram: 52067927 kbytes > > > throughput: 8476.88 mbps > > > duplicate: 97827864 pages > > > normal: 12769250 pages > > > dirty sync count: 5 > > > multifd bytes: 51208105 kbytes > > > > > > 2b - with '-global migration.multifd-flush-after-each-section=off > > > > I think it should be a typo by you. > > 2b - with '-global migration.multifd-flush-after-each-section=on' > > > > > > > > total time: 305990 ms > > > downtime: 8383 ms > > > transferred ram: 86113793 kbytes > > > throughput: 2320.38 mbps > > > duplicate: 98056257 pages > > > normal: 21249711 pages > > > normal bytes: 84998844 kbytes > > > dirty sync count: 8 > > > multifd bytes: 85251960 kbytes > > > > > > 1 and 2b should be around the same values, but I see that 2b is way, way > > > worse. > > > Can I ask how many times have the test beeing run? Just wondering if the > > > test is not good enough to detect this problem. > > > > > > The diffs between 2a and 2b is what I would expect for the change. But > > > between 1 and 2b should be almost no differences, so I am getting another > > > look at the values. I would apprecciate if you could run 3 for times each > > > of 1 and 2b and see how consistent the values are between iterations. > > > > Repeat 6 times for 1 (qemu-kvm-8.0.0-7.el9.x86_64) and 2b > (qemu-kvm-8.0.0-9.el9.x86_64 with '-global > migration.multifd-flush-after-each-section=on') > > For 1: > Total time: 49438 Downtime: 4900 > Total time: 42173 Downtime: 6692 > Total time: 51555 Downtime: 632 > Total time: 72926 Downtime: 8296 > Total time: 76313 Downtime: 5750 > Total time: 69685 Downtime: 6320 > > > For 2b: > Total time: 50232 Downtime: 3383 > Total time: 56592 Downtime: 7028 > Total time: 263880 Downtime: 872 > Total time: 289918 Downtime: 665 > Total time: 72666 Downtime: 1242 > Total time: 251911 Downtime: 813 Completely unestable, so I have to think of another way of testing it. Sniff. > > > > > > I am taking a look at the code right now.
Hi Juan, Can we mark this bug FailQA and reassign it to you? The fix isn't working well per the current test results
Juan has replied to the question of Comment 42 through Slack. So I would reassign this bug to Juan Xiaohui Li Hi, how is https://bugzilla.redhat.com/show_bug.cgi?id=2196295 going now? 4:09 Can we mark this bug failQA as I don't think current fix work well Juan Quintela 7:33 PM It is not urgent. 7:33 we can move it to 9.4 7:33 I still think that the problem is in the test, but I haven't came with anythig better yet.
Hi Juan, Can you help change the ITR to 9.4.0?