Description of problem: migrate_pause during postcopy phase, then do migrate recovery, after that, postcopy migration consumes entire bandwidth of NIC, doesn't honour speed limit Version-Release number of selected component (if applicable): src&dst host: kernel-4.18.0-138.el8.x86_64 & qemu-kvm-4.1.0-6.module+el8.1.0+4164+854d66f5.x86_64 guest info: kernel-4.18.0-141.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1.boot guest on src and dst host(guest on dst host with command "-incoming tcp:0:4444") 2.enable postcopy mode on both src and dst host, and set max-postcopy-bandwidth: (1)src hmp: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate_set_parameter max-postcopy-bandwidth 5M (qemu) info migrate_parameters ... max-bandwidth: 33554432 bytes/second downtime-limit: 300 milliseconds x-checkpoint-delay: 20000 block-incremental: off multifd-channels: 2 xbzrle-cache-size: 67108864 max-postcopy-bandwidth: 5242880 tls-authz: '(null)' (2)dst hmp: (qemu) migrate_set_capability postcopy-ram on 3.start postcopy migration and then pause on src host: (qemu) migrate_start_postcopy (qemu) info migrate ... capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: postcopy-active total time: 11569 milliseconds expected downtime: 317927 milliseconds setup: 18 milliseconds transferred ram: 297187 kbytes throughput: 42.04 mbps --> the real-time throughput is right(nearly 5MBps) remaining ram: 1611264 kbytes total ram: 4211528 kbytes duplicate: 581332 pages skipped: 0 pages normal: 72877 pages normal bytes: 291508 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 1440 dirty pages rate: 123652 pages postcopy request count: 483 (qemu) migrate_pause 4.recovery postcopy migration (1)dst host (qemu) migrate_recover tcp:10.66.8.208:4444 (2)src host (qemu) migrate -r tcp:10.66.8.208:4444 5.check migration status after step4 Actual results: after step5, found the real-time throught consumes entire bandwidth of NIC, not honour max-postcopy-bandwidth limit: (qemu) info migrate .. capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: off dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: postcopy-active total time: 199394 milliseconds expected downtime: 14424 milliseconds setup: 18 milliseconds transferred ram: 562338 kbytes throughput: 926.57 mbps --> the real-time through consumes entire bandwidth of NIC remaining ram: 1324472 kbytes total ram: 4211528 kbytes duplicate: 586883 pages skipped: 0 pages normal: 139023 pages normal bytes: 556092 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 29210 dirty pages rate: 123652 pages postcopy request count: 893 (qemu) info migrate ... total time: 200242 milliseconds expected downtime: 14063 milliseconds setup: 18 milliseconds transferred ram: 658229 kbytes throughput: 950.33 mbps --> the real-time through consumes entire bandwidth of NIC remaining ram: 1216024 kbytes total ram: 4211528 kbytes ... dirty pages rate: 123652 pages postcopy request count: 1066 (qemu) info migrate_parameters ... max-bandwidth: 33554432 bytes/second downtime-limit: 300 milliseconds x-checkpoint-delay: 20000 block-incremental: off multifd-channels: 2 xbzrle-cache-size: 67108864 max-postcopy-bandwidth: 5242880 --> the max-postcopy-bandwidth is right tls-authz: '(null)' (qemu) info migrate ... total time: 203634 milliseconds expected downtime: 14082 milliseconds setup: 18 milliseconds transferred ram: 1041486 kbytes throughput: 949.07 mbps --> the real-time through consumes entire bandwidth of NIC remaining ram: 798336 kbytes total ram: 4211528 kbytes dirty pages rate: 123652 pages postcopy request count: 3066 Expected results: after recovery postcopy migration, the speed should honour max-postcopy-bandwidth limits Additional info:
Posted fix upstream. https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg01141.html
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Verify bz on hosts(kernel-4.18.0-177.el8.x86_64&qemu-kvm-4.2.0-9.module+el8.2.0+5699+b5331ee5.x86_64), test steps like Comment 0, the test result is good, so make this bz verified: (qemu) migrate -r tcp:10.73.33.186:5555 (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: postcopy-active total time: 676350 milliseconds expected downtime: 1505928 milliseconds setup: 51 milliseconds transferred ram: 646343 kbytes throughput: 41.94 mbps remaining ram: 5343656 kbytes total ram: 8405832 kbytes duplicate: 1407747 pages skipped: 0 pages normal: 158183 pages normal bytes: 632732 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 359590 dirty pages rate: 221440 pages postcopy request count: 803 (qemu) info migrate globals: store-global-state: on only-migratable: off send-configuration: on send-section-footer: on decompress-error-check: on clear-bitmap-shift: 18 Migration status: postcopy-active total time: 677277 milliseconds expected downtime: 1498033 milliseconds setup: 51 milliseconds transferred ram: 650963 kbytes throughput: 42.17 mbps remaining ram: 4656988 kbytes total ram: 8405832 kbytes duplicate: 1578636 pages skipped: 0 pages normal: 158961 pages normal bytes: 635844 kbytes dirty sync count: 2 page size: 4 kbytes multifd bytes: 0 kbytes pages-per-second: 3210 dirty pages rate: 221440 pages postcopy request count: 803
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017