Bug 1734316 - multifd migration does not honour speed limits, consumes entire bandwidth of NIC
Summary: multifd migration does not honour speed limits, consumes entire bandwidth of NIC
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: 8.0
Assignee: Juan Quintela
QA Contact: Li Xiaohui
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-30 08:50 UTC by Daniel Berrangé
Modified: 2019-11-06 07:18 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-4.1.0-9.module+el8.1.0+4210+23b2046a
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-06 07:18:12 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:3723 0 None None None 2019-11-06 07:18:45 UTC

Description Daniel Berrangé 2019-07-30 08:50:52 UTC
Description of problem:

Quoting from the upstream bug report

  https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06262.html

Currently multifd migration has not been limited and it will consume
the whole bandwidth of Nic. These two patches add speed limitation to
it.

Version-Release number of selected component (if applicable):
qemu-kvm-3.1.0-30.el8

How reproducible:
I've not attempted to reproduce myself. This is a speculative bug report based on the upstream bug report / patch posting.

Steps to Reproduce:
1. Start a large guest (perhaps 8 CPUs, 16 GB of RAM)
2. Run a stress testing app in the guest that dirties memory as fast as possible, running on every CPU in the guest
3. Set a migration bandwidth limit of 50 MB/s
4. Start a migration with multifd mode enabled

Actual results:
Migration consumes entire NIC bandwidth

Expected results:
Migration is limited by bandwidth setting

Additional info:

Comment 1 Juan Quintela 2019-07-30 15:59:53 UTC
THere is a patch upstream for this.

Adding it to the queue.

Comment 2 Li Xiaohui 2019-08-06 08:05:04 UTC
I can reproduce this issue on host(kernel-4.18.0-125.el8.x86_64 & qemu-4.1.0-rc3).

(qemu) info migrate
globals:
...
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off block: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off 
Migration status: completed
total time: 1432 milliseconds
downtime: 258 milliseconds
setup: 41 milliseconds
transferred ram: 2650508 kbytes
throughput: 15611.23 mbps              ----> about 2GBps > 100MBps
remaining ram: 0 kbytes
total ram: 8405832 kbytes
duplicate: 1471276 pages
skipped: 0 pages
normal: 657703 pages
normal bytes: 2630812 kbytes
dirty sync count: 4
page size: 4 kbytes
multifd bytes: 2637576 kbytes
pages-per-second: 827188
    
(qemu) info migrate_parameters 
...
max-bandwidth: 104857600 bytes/second
downtime-limit: 300 milliseconds
x-checkpoint-delay: 20000
block-incremental: off
multifd-channels: 2
xbzrle-cache-size: 67108864
max-postcopy-bandwidth: 0
 tls-authz: '(null)'

Comment 3 Li Xiaohui 2019-08-06 08:13:26 UTC
And not consume entire bandwidth of NIC: 40Gbps(5GBps) 
[root@dell-per430-10 work]# ethtool enp4s0f1
Settings for enp4s0f1:
	Supported ports: [ FIBRE ]
	Supported link modes:   40000baseCR4/Full 
	Supported pause frame use: Symmetric
	Supports auto-negotiation: Yes
	Supported FEC modes: Not reported
	Advertised link modes:  40000baseCR4/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported
	Speed: 40000Mb/s
	Duplex: Full
	Port: Direct Attach Copper
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: on
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x00000007 (7)
			       drv probe link
	Link detected: yes

Comment 4 Juan Quintela 2019-08-21 13:48:33 UTC
upstream commit 1b81c974ccfd536aceef840e220912b142a7dda0

Comment 8 Juan Quintela 2019-09-04 11:57:23 UTC
brew: 23335260

Comment 9 Danilo de Paula 2019-09-04 14:11:01 UTC
We need QA_ACK, please.

Comment 12 Li Xiaohui 2019-09-17 11:33:29 UTC
test this bz via case RHEL-174630 on environment[1], the result is right:

environment[1]:
src&dst host info: kernel-4.18.0-144.el8.x86_64 & qemu-img-4.1.0-10.module+el8.1.0+4234+33aa4f57.x86_64
guest info: kernel-4.18.0-141.el8.x86_64

the important test steps and results:
(qemu) migrate_set_speed 50M
(qemu) info migrate_parameters 
...
max-bandwidth: 52428800 bytes/second
(qemu) migrate -d tcp:192.168.11.21:5555  
(qemu) info migrate
...
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off 
Migration status: active
total time: 1440 milliseconds
expected downtime: 300 milliseconds
setup: 41 milliseconds
transferred ram: 77809 kbytes
throughput: 420.72 mbps
...
(qemu) migrate_set_speed 200M
(qemu) info migrate_parameters 
...
max-bandwidth: 209715200 bytes/second
(qemu) info migrate
...
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off release-ram: off return-path: off pause-before-switchover: off multifd: on dirty-bitmaps: off postcopy-blocktime: off late-block-activate: off x-ignore-shared: off 
Migration status: active
total time: 18736 milliseconds
expected downtime: 300 milliseconds
setup: 41 milliseconds
transferred ram: 1099571 kbytes
throughput: 1682.66 mbps
...

According to above result, set this bz verified, thanks

Comment 14 errata-xmlrpc 2019-11-06 07:18:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723


Note You need to log in before you can comment on or make changes to this bug.