Description of problem: The speed doesn't take effect if set postcopy speed limit during post-copy phase If I set it during pre-copy phase or before migration starts, it can take effect after switching to post-copy mode. Version-Release number of selected component (if applicable): qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64 libvirt-5.0.0-6.module+el8+2860+4e0fe96a.x86_64 How reproducible: 100% Steps to Reproduce: 1. Start a vm, load stress in vm 2. Do live migration with postcopy enabled: # virsh migrate avocado-vt-vm1 qemu+ssh://hp-ml150gen9-01.rhts.eng.bos.redhat.com/system --live --verbose --p2p --persistent --postcopy --postcopy-bandwidth 10 3. Switch to postcopy mode: # virsh migrate-postcopy avocado-vt-vm1 4. Get domain job info: # virsh domjobinfo avocado-vt-vm1 Job type: Unbounded Operation: Outgoing migration Time elapsed: 14139 ms Data processed: 25.847 GiB Data remaining: 467.805 MiB Data total: 1.005 GiB Memory processed: 25.847 GiB Memory remaining: 467.805 MiB Memory total: 1.005 GiB Memory bandwidth: 10.020 MiB/s =====> around 10 MiB/s Dirty rate: 28584 pages/s Page size: 4096 bytes Iteration: 58 Postcopy requests: 23 Constant pages: 152210 Normal pages: 6762014 Normal data: 25.795 GiB Expected downtime: 47792 ms Setup time: 8 ms 5. Lower postcopy speed limit to 5 MiB/s: # virsh migrate-setspeed avocado-vt-vm1 5 --postcopy # virsh migrate-getspeed avocado-vt-vm1 --postcopy 5 6. Query domain job info: Job type: Unbounded Operation: Outgoing migration Time elapsed: 14233 ms Data processed: 25.848 GiB Data remaining: 466.766 MiB Data total: 1.005 GiB Memory processed: 25.848 GiB Memory remaining: 466.766 MiB Memory total: 1.005 GiB Memory bandwidth: 10.020 MiB/s =====> still around 10 MiB/s Dirty rate: 28584 pages/s Page size: 4096 bytes Iteration: 58 Postcopy requests: 25 Constant pages: 152220 Normal pages: 6762270 Normal data: 25.796 GiB Expected downtime: 47792 ms Setup time: 8 ms 7. Wait some time, query domain job info again, it is still around 10 MiB/s 8. Try to raise the speed limit to 20 MiB/s, still doesn't work. Actual results: As above Expected results: Set postcopy speed limit in post-copy phase should work. Additional info:
Fangge: Can you please describe: a) The guest that you're running including the size of the VM (in GB RAM), the program it's running b) How you're triggering the postcopy switchover c) The network connection between the hosts d) how long the postcopy phase is? e) Which is the source and destination host machines you're using?
(In reply to Dr. David Alan Gilbert from comment #1) > Fangge: > Can you please describe: > a) The guest that you're running including the size of the VM (in GB > RAM), the program it's running The size of the VM is 1048576 KiB, I run stress in VM: # stress --cpu 8 --io 4 --vm 4 --vm-bytes 128M > b) How you're triggering the postcopy switchover I use virsh command: virsh migrate-postcopy $guest, it will call QMP: migrate-start-postcopy > c) The network connection between the hosts The maximum network speed is 1000Mb/s. The NICs info is as below: Src: 2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 14:58:d0:d3:31:ab brd ff:ff:ff:ff:ff:ff inet 10.16.184.37/22 brd 10.16.187.255 scope global dynamic noprefixroute eno1 valid_lft 73170sec preferred_lft 73170sec Dest:2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether a0:2b:b8:31:26:a4 brd ff:ff:ff:ff:ff:ff inet 10.16.65.242/21 brd 10.16.71.255 scope global dynamic noprefixroute eno1 valid_lft 73352sec preferred_lft 73352sec > d) how long the postcopy phase is? I didn't pay attention to the time. But total migration time is: # virsh domjobinfo avocado-vt-vm1 --completed Job type: Completed Operation: Outgoing migration Time elapsed: 72105 ms ==> total time Time elapsed w/o network: 72097 ms Data processed: 3.133 GiB Data remaining: 0.000 B Data total: 1.005 GiB Memory processed: 3.133 GiB Memory remaining: 0.000 B Memory total: 1.005 GiB Memory bandwidth: 44.819 MiB/s Dirty rate: 0 pages/s Page size: 4096 bytes Iteration: 7 Postcopy requests: 594 Constant pages: 48095 Normal pages: 819479 Normal data: 3.126 GiB Total downtime: 275 ms Downtime w/o network: 267 ms Setup time: 28 ms The post-copy phase may be around (72105-14139=57966)ms, which is not so accurate. > e) Which is the source and destination host machines you're using? Src: hp-dl120gen9-01.khw.lab.eng.bos.redhat.com Dest: hp-ml150gen9-01.rhts.eng.bos.redhat.com
I've checked with the code, the postcopy bandwidth setting is only read at the point of switchover from precopy to postcopy. Yes we can fix that to allow it to be changed.
Need to fix migrate_params_apply to not call qemu_file_set_rate_limit for max_bandwidth if pc is active, but to call it in the code for max_postcopy_bandwdith case.
Posted upstream: migration/postcopy: Update the bandwidth during postcopy
Merged upstream as c38c1c142e64901b09f5 it'll be in qemu 4.0
Hi, all I verify this bz in environment[1], test steps are like polarion case RHEL-150076[2], the issue is gone. environment[1] src and dst host info: kernel-modules-4.18.0-95.el8.x86_64 & qemu-img-4.0.0-3.module+el8.1.0+3265+26c4ed71.x86_64 guest info: kernel-4.18.0-100.el8.x86_64 polarion case[2]: https://polarion.engineering.redhat.com/polarion/#/project/RedHatEnterpriseLinux7/workitem?id=RHEL-150076 Best regards, Li Xiaohui
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723