KVM_CLEAR_DIRTY_LOG is a new KVM ioctl that was recently introduced (since Linux 5.0) to allow seperate clearing of kvm dirty log for the guests. It can bring at least two direct benefits: 1. Huge guests can avoid hangs due to slowness of KVM_GET_DIRTY_LOG 2. Guests with high dirty rate can be migrated faster. It is tested that in some scenarios the total migration time can be drastically reduced if with this new approach [1]. This new kernel feature will require QEMU's change to really take effect, and it can be transparent to upper layers. This bug is to track the QEMU counterpart of this work. [1] https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03621.html
Latest upstream work: https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg00077.html
Hi Danilo, Since this bz change Internal Target Release from 8.1.0 to 8.2.0, could you help drop it from Errata, thanks a lot.
I can drop it. I'm also moving it back to POST and set Fixed-in to qemu-4.1 so we remember to bring it back when checking for upstream fixes for AV-8.2
Hi Peter, See you test scenario in above documents, have some questions here: 1.Need enable/disable kvm_clear_dirty_log by manual in qmp via some commands? 2.Does kvm_clear_dirty_log function apply not only into common migration but also into sub-feature migration(postcopy, xbzrle, rdma, and so on)? 3.What environment shall I test in? I see you test in guest memory=13G, guest Bandwidth = 10G, but most environments couldn't meet this requirement. So if I test in normal environments(such as-> bandwidth=1G, guest memory<=8G, dirty rate=800MBps), could see the almost 40% reduction about migration total time? -> ah, a small question, how to make accurate dirty rate in guest by manual? 4.Like question 3, how to define the reduction rate about migration total time according to different environments? 5."We should expect the guest to have these with CLEAR_LOG: (1) not hang during log_sync" -> Could see the log_sync data via qmp commands? As above questions, will much appreciate if you can give QE a simple document guidance about this function test?
(In reply to Li Xiaohui from comment #12) > Hi Peter, > See you test scenario in above documents, have some questions here: > 1.Need enable/disable kvm_clear_dirty_log by manual in qmp via some commands? No. If the host has the clear dirty log feature, then QEMU will automatically use it. Otherwise it will still use the old interface to track dirty pages. To verify whether the clear dirty log feature is enabled, you can run this command on the source before migration starts: # trace-cmd record -p function -l kvm_clear_dirty_log_protect [1] Then if clear dirty log is enabled you should see something captured after the migration completes (it'll be regular triggered during the whole migration process). > 2.Does kvm_clear_dirty_log function apply not only into common migration but > also into sub-feature migration(postcopy, xbzrle, rdma, and so on)? It should apply for most of the precopy features like xbzrle and rdma. However it is not needed for postcopy, because postcopy does not need the source VM to track dirty bit any more (all the dirty pages will be directly on the destination after postcopy starts), so there's also no reason to use the clear dirty log interface either. About the test matrix: I would suggest you only do a basic precopy test with this, and we should not need to cover all the complicated migration features against clear dirty log. > 3.What environment shall I test in? I see you test in guest memory=13G, > guest Bandwidth = 10G, but most environments couldn't meet this requirement. > So if I test in normal environments(such as-> bandwidth=1G, guest > memory<=8G, dirty rate=800MBps), could see the almost 40% reduction about > migration total time? Firstly if your bandwidth is 1Gbps, then you can't dirty the memory with 800MBps otherwise it won't converge. You'll need to choose something <1Gbps (125MBps), like 80MBps. I think it would be fine to test with lower bandwidth/mem but I'm not sure whether you can still get the same numbers. It would be interesting to know your numbers. > -> ah, a small question, how to make accurate dirty rate in guest by > manual? You can consider to use my tool: https://github.com/xzpeter/clibs/blob/master/bsd/mig_mon/mig_mon.c The command I used in my test was: # mig_mon mm_dirty 10240 900 random Note about two things when you run dirty tests: 1. The dirty rate cannot be as big as you want, because there's a mem bandwidth limit of the system. 2. When you dirty the test memory the first time, it'll need to fault in the pages first, so to make the dirty rate closer to a constant value you'll first need to pre-fault the memory region by writting sequentially to it for the first round For (1), if your network bandwidth is 1Gbps then you should only use <128MBps dirty rate, and it will never be a problem on any modern hosts (mem controller bandwidth is far faster than this). For (2), if you use my test tool it's done already for you. An example: xz-x1:mig_mon [master]$ ./mig_mon mm_dirty 1000 100 random Test memory size: 1000 (MB) Page size: 4096 (Bytes) Dirty memory rate: 100 (MB/s) Dirty pattern: random +------------------------+ | Start Dirty Memory | +------------------------+ Finished pre-heat of first round, starting to use random access <------------------ [a] Dirty rate: 1000 (MB/s), duration: 1000 (ms) <------------------ [b] Dirty rate: 100 (MB/s), duration: 1000 (ms) <------------------ [c] Dirty rate: 100 (MB/s), duration: 1000 (ms) Dirty rate: 100 (MB/s), duration: 1000 (ms) Above line [a] is the pre-fault procedure that we'll prefault the 1G mem region before a constant dirty rate. If you see [b] the dirty rate is higher than specified (1000MB/s) because it's still prefaulting. Until [c] the dirty rate goes to the predefined 100MB/s. You should always wait and start the migration after [c] when you see the dirty rate goes to constant. > 4.Like question 3, how to define the reduction rate about migration total > time according to different environments? It's the total migration time for the same machine with the same workload. If with clear dirty log, the precopy migration could finish faster especially there're high dirty rate workload in the guest. If you want to compare the same QEMU, you can consider to switch between (1) kernel does not support clear dirty log, and (2) kernel does support clear dirty log. You can use above method to detect whether clear dirty log is enabled. > 5."We should expect the guest to have these with CLEAR_LOG: (1) not hang > during log_sync" -> Could see the log_sync data via qmp commands? log_sync stands for the KVM ioctl called KVM_GET_DIRTY_LOG. It could be slow in the past because we'll need to do both GET_LOG and CLEAR_LOG in the same ioctl. After we've introduced KVM_CLEAR_DIRTY_LOG we only do CLEAR_LOG operation in that new ioctl with finer granularity, so GET_LOG could be faster. There is no way to see log_sync data via QMP. However, you can also try to observe and compare the time that KVM_GET_DIRTY_LOG ioctls could take using this command: # strace -Tf -e ioctl -p $QEMU_SOURCE_PID 2>&1 | grep KVM_GET_DIRTY_LOG [2] You should run this command on source host before migration starts, then you should see something like this: [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000233> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000241> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000227> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000232> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000234> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000249> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000229> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000243> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000244> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000247> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000230> [pid 12545] ioctl(20, KVM_GET_DIRTY_LOG, 0x7f02827f8210) = 0 <0.000273> The last column shows the time used for KVM_GET_DIRTY_LOG. On this host clear log is enabled so it's very fast. If you use an old kernel or old QEMU that does not support clear dirty log some of these numbers could be bigger (especially if your guest has a very big memory) > > As above questions, will much appreciate if you can give QE a simple > document guidance about this function test? Please see above [1] and [2] on how to verify clear dirty log is there. The rest procedure will be the same as the previous precopy migration tests. We can just make sure clear dirty log is enabled, and logically the migration could be faster to complete with high dirty rates.
Peter, thank you very much for providing testing guidance. Verify this bz via three situations(only kernel or qemu-kvm version is different), guest and test steps are same in different situations: 1.Three test situations: AA: hosts-> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64 BB: hosts-> kernel-4.18.0-147.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136.x86_64 CC: hosts-> kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23.x86_64 2.Test steps: (1)boot a guest(rhel8.2.0) on src host; (2)boot a guest with "-incoming tcp..." on dst host; (3)set migration max-bandwidth to 125M in src qmp(the max bandwidth of hosts is 1000Mbps) (4)run dirty tests in guest: [root@vm-198-20 home]# ./mig_mon mm_dirty 1000 100 random ... Dirty rate: 100 (MB/s), duration: 1000 (ms) Dirty rate: 100 (MB/s), duration: 1000 (ms) Dirty rate: 100 (MB/s), duration: 1000 (ms) (5)monitor kvm_clear_dirty_log via trace-cmd on src host: [root@dell-per430-12 ~]# trace-cmd record -p function -l kvm_clear_dirty_log_protect (6)monitor kvm_get_dirty_log via trace on src host [root@dell-per430-12 ~]# trace -Tf -e ioctl -p $QEMU_SOURCE_PID 2>&1 | grep KVM_GET_DIRTY_LOG (7)after dirty test is stable in step(4), start migrate guest from src to dst host: (8)after migration finish, stop (5)&(6), check migration total time and kvm_get_dirty_log data, check whether kvm_clear_dirty_log is enabled 3.Test results: found there're almostly no difference between situation AA&CC. but compare AA&BB: migration total time and the number of kvm_get_dirty_log are different, clear_dirty_log is enabled in AA but disable in BB, I test three times for AA&BB, list their migration total time, and attach one of kvm_get_dirty_log data about AA&BB in attachment(can easily find the data is bigger in BB than in AA). AA migration total time(milliseconds) BB migration total time(milliseconds) 60891 101560 72732 90464 73550 86622 From above test result, I can believe migration is faster when enable clear_dirty_log. But of course, maybe I should test higer dirty rate when the high performance hosts are available. What do you think about, Peter? Here I have only one question: this bz fixed in qemu-kvm-4.2.0-1.module+el8.2.0, when I test in CC situation, why find clear_dirty_log is enabled and couldn't find the difference between AA&CC?
Created attachment 1653012 [details] AA-src-get-dirty-log
Created attachment 1653013 [details] BB-src-get-dirty-log
(In reply to Li Xiaohui from comment #14) > 1.Three test situations: > AA: hosts-> > kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136. > x86_64 > BB: hosts-> > kernel-4.18.0-147.el8.x86_64&qemu-img-4.2.0-6.module+el8.2.0+5453+31b2b136. > x86_64 > CC: hosts-> > kernel-4.18.0-169.el8.x86_64&qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23. > x86_64 [...] > AA migration total time(milliseconds) BB migration total > time(milliseconds) > 60891 101560 > 72732 90464 > 73550 86622 > > > From above test result, I can believe migration is faster when enable > clear_dirty_log. Yes, I think the first rhel8 kernel that supports the feature is kernel-4.18.0-147.8.el8. So in your AA test it has it (kernel-4.18.0-169.el8.x86_64), while for BB it does not (kernel-4.18.0-147.el8.x86_64). Looks sane. > But of course, maybe I should test higer dirty rate when the high > performance hosts are available. > What do you think about, Peter? You can continue with more tests with it, but I'd say your number above already prove it somehow. > > Here I have only one question: > this bz fixed in qemu-kvm-4.2.0-1.module+el8.2.0, when I test in CC > situation, why find clear_dirty_log is enabled and couldn't find the > difference between AA&CC? I cannot even find the tag for qemu-kvm-4.2.0-1.module+el8.2.0, so I cannot tell what's the version you're using in CC (qemu-img-4.2.0-0.module+el8.2.0+4755+35143c23): xz-x1:virt-rhel8-qemu-kvm [rhel8.2-av]$ git tag | grep qemu-kvm-4.2.0- qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc qemu-kvm-4.2.0-5.module+el8.2.0+5389+367d9739 qemu-kvm-4.2.0-6.module+el8.2.0+5451+991cea0d If you want to verify this by using an old QEMU that does not support it, maybe you can consider to use an 8.0-av package (e.g., qemu-kvm-4.0.0-6.module+el8.1.0+3736+a2aefea3). I am pretty sure it's not there. Note that even most of the 8.1-av packages should have the support of this feature. Further verification should be on a low priority, since again AFAICT your number should already prove that it's working as expected.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
test steps like Comment14, but on hosts(max-bandwidth:10000Mbps, qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64) with "./mig_mon mm_dirty 7168 700 random", compare kernel-4.18.0-147 & kernel-4.18.0-175, find the results are ok, so make this bz verified, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017