Bug 1717373 - QEMU support for KVM_CLEAR_DIRTY_LOG
Summary: QEMU support for KVM_CLEAR_DIRTY_LOG
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Peter Xu
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-05 11:22 UTC by Peter Xu
Modified: 2020-05-05 09:46 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-4.2.0-1.module+el8.2.0+4793+b09dd2fb
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:46:14 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
AA-src-get-dirty-log (60.18 KB, text/plain)
2020-01-17 10:20 UTC, Li Xiaohui
no flags Details
BB-src-get-dirty-log (105.41 KB, text/plain)
2020-01-17 10:22 UTC, Li Xiaohui
no flags Details

Description Peter Xu 2019-06-05 11:22:05 UTC
KVM_CLEAR_DIRTY_LOG is a new KVM ioctl that was recently introduced (since Linux 5.0) to allow seperate clearing of kvm dirty log for the guests.  It can bring at least two direct benefits:

1. Huge guests can avoid hangs due to slowness of KVM_GET_DIRTY_LOG

2. Guests with high dirty rate can be migrated faster.

It is tested that in some scenarios the total migration time can be drastically reduced if with this new approach [1].

This new kernel feature will require QEMU's change to really take effect, and it can be transparent to upper layers.  This bug is to track the QEMU counterpart of this work.

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03621.html

Comment 1 Peter Xu 2019-06-05 11:24:42 UTC
Latest upstream work:

https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg00077.html

Comment 8 Li Xiaohui 2019-09-11 11:28:21 UTC
Hi Danilo, 
Since this bz change Internal Target Release from 8.1.0 to 8.2.0, could you help drop it from Errata, thanks a lot.

Comment 9 Danilo de Paula 2019-09-16 15:57:29 UTC
I can drop it.

I'm also moving it back to POST and set Fixed-in to qemu-4.1 so we remember to bring it back when checking for upstream fixes for AV-8.2

Comment 12 Li Xiaohui 2019-11-27 11:14:21 UTC
Hi Peter,
See you test scenario in above documents, have some questions here:
1.Need enable/disable kvm_clear_dirty_log by manual in qmp via some commands?
2.Does kvm_clear_dirty_log function apply not only into common migration but also into sub-feature migration(postcopy, xbzrle, rdma, and so on)?
3.What environment shall I test in? I see you test in guest memory=13G, guest Bandwidth = 10G, but most environments couldn't meet this requirement. So if I test in normal environments(such as-> bandwidth=1G, guest memory<=8G, dirty rate=800MBps), could see the almost 40% reduction about migration total time? 
   -> ah, a small question, how to make accurate dirty rate in guest by manual?
4.Like question 3, how to define the reduction rate about migration total time according to different environments?
5."We should expect the guest to have these with CLEAR_LOG: (1) not hang during log_sync" -> Could see the log_sync data via qmp commands?

As above questions, will much appreciate if you can give QE a simple document guidance about this function test?

Comment 13 Peter Xu 2019-11-27 16:02:08 UTC
(In reply to Li Xiaohui from comment #12)
> Hi Peter,
> See you test scenario in above documents, have some questions here:
> 1.Need enable/disable kvm_clear_dirty_log by manual in qmp via some commands?

No.  If the host has the clear dirty log feature, then QEMU will automatically use it.  Otherwise it will still use the old interface to track dirty pages.

To verify whether the clear dirty log feature is enabled, you can run this command on the source before migration starts:

# trace-cmd record -p function -l kvm_clear_dirty_log_protect

[1]

Then if clear dirty log is enabled you should see something captured after the migration completes (it'll be regular triggered during the whole migration process).

> 2.Does kvm_clear_dirty_log function apply not only into common migration but
> also into sub-feature migration(postcopy, xbzrle, rdma, and so on)?

It should apply for most of the precopy features like xbzrle and rdma.  However it is not needed for postcopy, because postcopy does not need the source VM to track dirty bit any more (all the dirty pages will be directly on the destination after postcopy starts), so there's also no reason to use the clear dirty log interface either. 

About the test matrix: I would suggest you only do a basic precopy test with this, and we should not need to cover all the complicated migration features against clear dirty log.

> 3.What environment shall I test in? I see you test in guest memory=13G,
> guest Bandwidth = 10G, but most environments couldn't meet this requirement.
> So if I test in normal environments(such as-> bandwidth=1G, guest
> memory<=8G, dirty rate=800MBps), could see the almost 40% reduction about
> migration total time? 

Firstly if your bandwidth is 1Gbps, then you can't dirty the memory with 800MBps otherwise it won't converge.  You'll need to choose something <1Gbps (125MBps), like 80MBps.

I think it would be fine to test with lower bandwidth/mem but I'm not sure whether you can still get the same numbers.  It would be interesting to know your numbers.

>    -> ah, a small question, how to make accurate dirty rate in guest by
> manual?

You can consider to use my tool:

https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c

The command I used in my test was:

# mig_mon mm_dirty 10240 900 random

Note about two things when you run dirty tests:

  1. The dirty rate cannot be as big as you want, because there's a mem bandwidth limit of the system.

  2. When you dirty the test memory the first time, it'll need to fault in the pages first, so to make the dirty rate closer to a constant value you'll first need to pre-fault the memory region by writting sequentially to it for the first round

For (1), if your network bandwidth is 1Gbps then you should only use <128MBps dirty rate, and it will never be a problem on any modern hosts (mem controller bandwidth is far faster than this).

For (2), if you use my test tool it's done already for you.  An example:

xz-x1:mig_mon [master]$ ./mig_mon mm_dirty 1000 100 random
Test memory size:       1000 (MB)
Page size:              4096 (Bytes)
Dirty memory rate:      100 (MB/s)
Dirty pattern:  random
+------------------------+
|   Start Dirty Memory   |
+------------------------+
Finished pre-heat of first round, starting to use random access      <------------------ [a]
Dirty rate: 1000 (MB/s), duration: 1000 (ms)                         <------------------ [b]
Dirty rate: 100 (MB/s), duration: 1000 (ms)                          <------------------ [c]
Dirty rate: 100 (MB/s), duration: 1000 (ms)
Dirty rate: 100 (MB/s), duration: 1000 (ms)

Above line [a] is the pre-fault procedure that we'll prefault the 1G mem region before a constant dirty rate.  If you see [b] the dirty rate is higher than specified (1000MB/s) because it's still prefaulting.  Until [c] the dirty rate goes to the predefined 100MB/s.  You should always wait and start the migration after [c] when you see the dirty rate goes to constant.

> 4.Like question 3, how to define the reduction rate about migration total
> time according to different environments?

It's the total migration time for the same machine with the same workload.

If with clear dirty log, the precopy migration could finish faster especially there're high dirty rate workload in the guest.  If you want to compare the same QEMU, you can consider to switch between (1) kernel does not support clear dirty log, and (2) kernel does support clear dirty log.  You can use above method to detect whether clear dirty log is enabled.

> 5."We should expect the guest to have these with CLEAR_LOG: (1) not hang
> during log_sync" -> Could see the log_sync data via qmp commands?

log_sync stands for the KVM ioctl called KVM_GET_DIRTY_LOG.  It could be slow in the past because we'll need to do both GET_LOG and CLEAR_LOG in the same ioctl.

After we've introduced KVM_CLEAR_DIRTY_LOG we only do CLEAR_LOG operation in that new ioctl with finer granularity, so GET_LOG could be faster.

There is no way to see log_sync data via QMP.  However, you can also try to observe and compare the time that KVM_GET_DIRTY_LOG ioctls could take using this command:

# strace -Tf -e ioctl -p $QEMU_SOURCE_PID 2>&1 | grep KVM_GET_DIRTY_LOG

[2]

You should run this command on source host before migration starts, then you should see something like this:

[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000233>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000241>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000227>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000232>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000234>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000249>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000229>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000243>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000244>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000247>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000230>
[pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000273>

The last column shows the time used for KVM_GET_DIRTY_LOG.  On this host clear log is enabled so it's very fast.  If you use an old kernel or old QEMU that does not support clear dirty log some of these numbers could be bigger (especially if your guest has a very big memory)

> 
> As above questions, will much appreciate if you can give QE a simple
> document guidance about this function test?

Please see above [1] and [2] on how to verify clear dirty log is there.

The rest procedure will be the same as the previous precopy migration tests.  We can just make sure clear dirty log is enabled, and logically the migration could be faster to complete with high dirty rates.

Comment 14 Li Xiaohui 2020-01-17 10:13:27 UTC
Peter, thank you very much for providing testing guidance. 

Verify this bz via three situations(only kernel or qemu-kvm version is different), guest and test steps are same in different situations:
1.Three test situations:
AA: hosts-> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64
BB: hosts-> kernel-4.18.0-147.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64
CC: hosts-> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23.x86_64

2.Test steps:
(1)boot a guest(rhel8.2.0) on src host;
(2)boot a guest with "-incoming tcp..." on dst host;
(3)set migration max-bandwidth to 125M in src qmp(the max bandwidth of hosts is 1000Mbps)
(4)run dirty tests in guest:
[root@vm-198-20 home]# ./mig_mon mm_dirty 1000 100 random
...
Dirty rate: 100 (MB/s), duration: 1000 (ms)       
Dirty rate: 100 (MB/s), duration: 1000 (ms)
Dirty rate: 100 (MB/s), duration: 1000 (ms)
(5)monitor kvm_clear_dirty_log via trace-cmd on src host:
[root@dell-per430-12 ~]# trace-cmd record -p function -l kvm_clear_dirty_log_protect
(6)monitor kvm_get_dirty_log via trace on src host
[root@dell-per430-12 ~]# trace -Tf -e ioctl -p $QEMU_SOURCE_PID 2>&1 | grep KVM_GET_DIRTY_LOG
(7)after dirty test is stable in step(4), start migrate guest from src to dst host:
(8)after migration finish, stop (5)&(6), check migration total time and kvm_get_dirty_log data, check whether kvm_clear_dirty_log is enabled

3.Test results:
found there're almostly no difference between situation AA&CC.
but compare AA&BB: migration total time and the number of kvm_get_dirty_log are different, clear_dirty_log is enabled in AA but disable in BB, I test three times for AA&BB, list their migration total time, and attach one of kvm_get_dirty_log data about AA&BB in attachment(can easily find the data is bigger in BB than in AA).
AA migration total time(milliseconds)                    BB migration total time(milliseconds)
       60891                                                    101560
       72732                                                    90464
       73550                                                    86622


From above test result, I can believe migration is faster when enable clear_dirty_log. 
But of course, maybe I should test higer dirty rate when the high performance hosts are available. 
What do you think about, Peter?

Here I have only one question:
this bz fixed in qemu-kvm-4.2.0-1.module+el8.2.0, when I test in CC situation, why find clear_dirty_log is enabled and couldn't find the difference between AA&CC?

Comment 15 Li Xiaohui 2020-01-17 10:20:51 UTC
Created attachment 1653012 [details]
AA-src-get-dirty-log

Comment 16 Li Xiaohui 2020-01-17 10:22:05 UTC
Created attachment 1653013 [details]
BB-src-get-dirty-log

Comment 17 Peter Xu 2020-01-20 05:44:16 UTC
(In reply to Li Xiaohui from comment #14)
> 1.Three test situations:
> AA: hosts->
> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.
> x86_64
> BB: hosts->
> kernel-4.18.0-147.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.
> x86_64
> CC: hosts->
> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23.
> x86_64

[...]
 
> AA migration total time(milliseconds)                    BB migration total
> time(milliseconds)
>        60891                                                    101560
>        72732                                                    90464
>        73550                                                    86622
> 
> 
> From above test result, I can believe migration is faster when enable
> clear_dirty_log. 

Yes, I think the first rhel8 kernel that supports the feature is kernel-4.18.0-147.8.el8.  So in your AA test it has it (kernel-4.18.0-169.el8.x86_64), while for BB it does not (kernel-4.18.0-147.el8.x86_64).  Looks sane.

> But of course, maybe I should test higer dirty rate when the high
> performance hosts are available. 
> What do you think about, Peter?

You can continue with more tests with it, but I'd say your number above already prove it somehow.

> 
> Here I have only one question:
> this bz fixed in qemu-kvm-4.2.0-1.module+el8.2.0, when I test in CC
> situation, why find clear_dirty_log is enabled and couldn't find the
> difference between AA&CC?

I cannot even find the tag for qemu-kvm-4.2.0-1.module+el8.2.0, so I cannot tell what's the version you're using in CC (qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23):

xz-x1:virt-rhel8-qemu-kvm [rhel8.2-av]$ git tag | grep qemu-kvm-4.2.0-
qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc
qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739
qemu-kvm-4.2.0-6.module+el8.2.0+5451+991cea0d

If you want to verify this by using an old QEMU that does not support it, maybe you can consider to use an 8.0-av package (e.g., qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3).  I am pretty sure it's not there.  Note that even most of the 8.1-av packages should have the support of this feature.

Further verification should be on a low priority, since again AFAICT your number should already prove that it's working as expected.

Comment 18 Ademar Reis 2020-02-05 22:58:40 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 19 Li Xiaohui 2020-02-12 04:51:30 UTC
test steps like Comment14, but on hosts(max-bandwidth:10000Mbps, qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64) with "./mig_mon mm_dirty 7168 700 random", compare kernel-4.18.0-147 & kernel-4.18.0-175, find the results are ok, so make this bz verified, thanks.

Comment 21 errata-xmlrpc 2020-05-05 09:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.