Description of problem: Quoting from the upstream bug report https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06262.html Currently multifd migration has not been limited and it will consume the whole bandwidth of Nic. These two patches add speed limitation to it. Version-Release number of selected component (if applicable): qemu-kvm-3.1.0-30.el8 How reproducible: I've not attempted to reproduce myself. This is a speculative bug report based on the upstream bug report / patch posting. Steps to Reproduce: 1. Start a large guest (perhaps 8 CPUs, 16 GB of RAM) 2. Run a stress testing app in the guest that dirties memory as fast as possible, running on every CPU in the guest 3. Set a migration bandwidth limit of 50 MB/s 4. Start a migration with multifd mode enabled Actual results: Migration consumes entire NIC bandwidth Expected results: Migration is limited by bandwidth setting Additional info:
THere is a patch upstream for this. Adding it to the queue.
I can reproduce this issue on host(kernel-4.18.0-125.el8.x86_64 & qemu-4.1.0-rc3). (qemu) info migrate globals: ... capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: completed total time: 1432 milliseconds downtime: 258 milliseconds setup: 41 milliseconds transferred ram: 2650508 kbytes throughput: 15611.23 mbps ----> about 2GBps > 100MBps remaining ram: 0 kbytes total ram: 8405832 kbytes duplicate: 1471276 pages skipped: 0 pages normal: 657703 pages normal bytes: 2630812 kbytes dirty sync count: 4 page size: 4 kbytes multifd bytes: 2637576 kbytes pages-per-second: 827188 (qemu) info migrate_parameters ... max-bandwidth: 104857600 bytes/second downtime-limit: 300 milliseconds x-checkpoint-delay: 20000 block-incremental: off multifd-channels: 2 xbzrle-cache-size: 67108864 max-postcopy-bandwidth: 0 tls-authz: '(null)'
And not consume entire bandwidth of NIC: 40Gbps(5GBps) [root@dell-per430-10 work]# ethtool enp4s0f1 Settings for enp4s0f1: Supported ports: [ FIBRE ] Supported link modes: 40000baseCR4/Full Supported pause frame use: Symmetric Supports auto-negotiation: Yes Supported FEC modes: Not reported Advertised link modes: 40000baseCR4/Full Advertised pause frame use: No Advertised auto-negotiation: Yes Advertised FEC modes: Not reported Speed: 40000Mb/s Duplex: Full Port: Direct Attach Copper PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: d Wake-on: d Current message level: 0x00000007 (7) drv probe link Link detected: yes
upstream commit 1b81c974ccfd536aceef840e220912b142a7dda0
brew: 23335260
We need QA_ACK, please.
test this bz via case RHEL-174630 on environment[1], the result is right: environment[1]: src&dst host info: kernel-4.18.0-144.el8.x86_64 & qemu-img-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64 guest info: kernel-4.18.0-141.el8.x86_64 the important test steps and results: (qemu) migrate_set_speed 50M (qemu) info migrate_parameters ... max-bandwidth: 52428800 bytes/second (qemu) migrate -d tcp:192.168.11.21:5555 (qemu) info migrate ... capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: active total time: 1440 milliseconds expected downtime: 300 milliseconds setup: 41 milliseconds transferred ram: 77809 kbytes throughput: 420.72 mbps ... (qemu) migrate_set_speed 200M (qemu) info migrate_parameters ... max-bandwidth: 209715200 bytes/second (qemu) info migrate ... capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off Migration status: active total time: 18736 milliseconds expected downtime: 300 milliseconds setup: 41 milliseconds transferred ram: 1099571 kbytes throughput: 1682.66 mbps ... According to above result, set this bz verified, thanks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723