RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1289288 - Live Migration dynamic cpu throttling for auto-convergence (libvirt)
Summary: Live Migration dynamic cpu throttling for auto-convergence (libvirt)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1289285 1289290 1289291
Blocks: 1305606 1313485 1358141
TreeView+ depends on / blocked
 
Reported: 2015-12-07 20:17 UTC by Hai Huang
Modified: 2016-11-03 18:35 UTC (History)
10 users (show)

Fixed In Version: libvirt-2.0.0-1.el7
Doc Type: Enhancement
Doc Text:
Clone Of: 1289285
: 1358141 (view as bug list)
Environment:
Last Closed: 2016-11-03 18:35:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2016:2577 0 normal SHIPPED_LIVE Moderate: libvirt security, bug fix, and enhancement update 2016-11-03 12:07:06 UTC

Description Hai Huang 2015-12-07 20:17:07 UTC
+++ This bug was initially created as a clone of Bug #1289285 +++

Description of problem:

With extreme memory write intensive workloads, normal live migration will never complete because the guest is writing to memory faster than Qemu can transfer the memory changes to the destination system. In this case normal migration will continue forever, not making enough progress to stop the guest and proceed to the non-live "finishing up" phase of migration.

This feature provides a method for slowing down guest execution speed, thus hopefully, also slowing down guest memory write speed. As time advances autoconverge will continually increase the amount of guest cpu throttling until guest memory write speed slows enough to allow the guest to be stopped and migration to finish.

As of Qemu 2.5 dynamic throttling has been added to autoconverge dramatically increasing its effectiveness.

This feature will be available in RHEL7.3 qemu-kvm-rhev with the rebase 
to qemu 2.5.

The qemu feature page can be found in:
http://wiki.qemu.org/Features/AutoconvergeLiveMigration


Version-Release number of selected component (if applicable):

  qem-kvm-rhev  


How reproducible:
Always.


Steps to Reproduce:
Please refer to the qemu feature page above.


Actual results:
Live migration fails due to high page dirty rate 
(i.e. intensive memory writes).


Expected results:
Live migration successfully complete.


Additional info:

Comment 1 Michal Skrivanek 2015-12-21 11:19:08 UTC
what are you planning to do in libvirt?

Comment 2 Jiri Denemark 2015-12-21 12:44:50 UTC
We will need to add support for setting the migration parameters and reporting the current throttling value via virDomainGetJobStats.

Comment 4 Jiri Denemark 2016-06-21 11:47:52 UTC
Patches sent upstream for review: https://www.redhat.com/archives/libvir-list/2016-June/msg01336.html

Comment 5 Jiri Denemark 2016-06-22 14:13:01 UTC
This feature is now implemented upstream by v1.3.5-373-g5a23594..v1.3.5-383-gd85c3a5:

commit 5a235947c29be138f907c64a0ded255faa1da6fd
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 15:51:13 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Make qemuMonitorSetMigrationCompression saner
    
    Checking whether the function has anything to do is better done in the
    function rather then requiring callers to do that.
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit dbcbc86648c1e234523c265ac433056c0d069596
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 15:47:28 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Decouple migration parameters from compression settings
    
    Compression parameters are not the only migration parameters.
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit b1473708d832a84f5746e96aaa5790ebb9ac17d9
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 15:47:46 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Rename qemuMonitorMigrationCompression
    
    qemuMonitorMigrationParams is a better name for a structure which
    contains various migration parameters. While doing that, we should use
    full names for individual parameters.
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit eb3e0184146697f53cb1e31c9a330b039d9cd2e4
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 16:54:24 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Rework qemuMonitorJSONGetMigrationParams
    
    We should not require any parameters to be present. After all we have
    the *_set bools to express that some parameters were not set.
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit bd3da5169771ae8a6ea94bf5990700c3b53f21cd
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 16:55:07 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Rework qemuMonitorJSONSetMigrationParams
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit 15f42cba7e8e37b9f97d31c482eab0f811022d0a
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 17:07:55 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    test: Rework qemuMonitorJSONGetMigrationParams test
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit f6e12b40295b1601b9911f6ccb46e09bf8e47e85
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 15:45:59 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    Add auto convergence migration parameters
    
    They can be used to tune auto-convergence algorithm (which is enabled
    with VIR_MIGRATE_AUTO_CONVERGE).
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit 8d58952bedb8d00bc20604cb2a8a57fb4438532d
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 16:27:07 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Introduce qemuMigrationSetParams
    
    Several places in the code update qemuMonitorMigrationParams structure
    and qemuMigrationSetParams is then used to set them all at once.
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit abaa11006f26a05899b2d2d9ef8f3cf08878f6d2
Author:     Jiri Denemark <jdenemar>
AuthorDate: Mon Jun 20 17:10:32 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Add support for cpu throttling parameters
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit 445853e1baf6c8ac2df08b31185103077a8749ed
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jun 21 10:06:29 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    qemu: Implement auto convergence migration parameters
    
    Signed-off-by: Jiri Denemark <jdenemar>

commit d85c3a54517ffb0d1b056684069dfc8fc7bc9b8d
Author:     Jiri Denemark <jdenemar>
AuthorDate: Tue Jun 21 13:40:33 2016 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Wed Jun 22 15:54:21 2016 +0200

    Report auto convergence throttle rate in migration stats
    
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 7 zhe peng 2016-07-14 10:07:19 UTC
Hi Jiri
I test this feature with 
libvirt-2.0.0-1.el7.x86_64
qemu-kvm-rhev-2.6.0-13.el7.x86_64

cmdline:
# virsh migrate --live rhel7 qemu+ssh://$target_ip/system --verbose --unsafe --compressed --auto-converge --auto-converge-initial 50 --auto-converge-increment 3
error: internal error: unable to execute QEMU command 'migrate-set-parameters': Invalid parameter 'cpu-throttle-initial'

check libvirtd log:
2016-07-14 09:41:21.608+0000: 26999: debug : qemuMonitorJSONCheckError:376 : unable to execute QEMU command {"execute":"migrate-set-parameters","arguments":{"cpu-throttle-initial":50,"cpu-throttle-increment":3},"id":"libvirt-78"}: {"id":"libvirt-78","error":{"class":"GenericError","desc":"Invalid parameter 'cpu-throttle-initial'"}}
2016-07-14 09:41:21.608+0000: 26999: error : qemuMonitorJSONCheckError:387 : internal error: unable to execute QEMU command 'migrate-set-parameters': Invalid parameter 'cpu-throttle-initial

i check qemu-kvm directly,found the parameter is "x-cpu-throttle-initial"
(qemu) migrate_set_parameter x-cpu
x-cpu-throttle-increment  x-cpu-throttle-initial

Comment 8 Jiri Denemark 2016-07-19 07:52:15 UTC
Please, file a bug for qemu-kvm-rhev requesting removal of the "x-" prefix from these migration parameters.

Comment 9 zhe peng 2016-08-09 10:11:52 UTC
test with build:
libvirt-2.0.0-4.el7.x86_64
qemu-kvm-rhev-2.6.0-19.el7.x86_64

step:
 1. prepare two host,disable libvirtd keepalive on both host.
 2 define a guest and start
 3. in guest, run stressapptest
  Test 256MB, runing 8 "warm copy" threads. Exit after 60 seconds.
# stressapptest -s 60 -M 256 -m 8 -W
 4. open virt-viewer to connect guest
 5. on source host, do live migrate
# virsh migrate --live rhel7 qemu+ssh://$target_ip/system --verbose --compressed --auto-converge --auto-converge-initial 30 --auto-converge-increment 15

during migration, run #virsh domjobinfo rhel7 on source host:
# virsh domjobinfo rhel7
Job type:         Unbounded   
Time elapsed:     37510        ms
Data processed:   4.036 GiB
Data remaining:   96.113 MiB
Data total:       2.126 GiB
Memory processed: 4.036 GiB
Memory remaining: 96.113 MiB
Memory total:     2.126 GiB
Memory bandwidth: 103.643 MiB/s
Dirty rate:       31132        pages/s
Iteration:        12          
Constant pages:   116532      
Normal pages:     1011937     
Normal data:      3.860 GiB
Expected downtime: 1173         ms
Setup time:       9            ms
Compression cache: 64.000 MiB
Compressed data:  171.002 MiB
Compressed pages: 54565        
Compression cache misses: 459312       
Compression overflows: 77701        
Auto converge throttle: 60

if i set short time in stressapptest, like 60 sec, the migation will finished w/o error, but if set long time: 100 sec
cmd:# stressapptest -s 100 -M 256 -m 8 -W
the migration will failed,
Migration: [100 %]error: operation failed: migration job: unexpectedly failed.
and before failed, i check the domjobinfo, cpu reach 99% throttled like below:
Job type:         Unbounded   
Time elapsed:     79911        ms
Data processed:   5.925 GiB
Data remaining:   0.000 B
Data total:       2.126 GiB
Memory processed: 5.925 GiB
Memory remaining: 0.000 B
Memory total:     2.126 GiB
Memory bandwidth: 109.919 MiB/s
Dirty rate:       1            pages/s
Iteration:        22          
Constant pages:   148060      
Normal pages:     1468512     
Normal data:      5.602 GiB
Expected downtime: 1062         ms
Setup time:       9            ms
Compression cache: 64.000 MiB
Compressed data:  318.332 MiB
Compressed pages: 101161       
Compression cache misses: 861278       
Compression overflows: 132799       
Auto converge throttle: 99   

i will provide libvirtd log later

Hi Jiri, please help to check this issue, very thanks!

Comment 10 zhe peng 2016-08-09 10:23:20 UTC
Sorry Jiri, i make a mistake, now, migration always finished.
we need set all keepalive disabled.
virsh cmd:
# virsh -k0 migrate --live rhel7 qemu+ssh://10.66.144.33/system --verbose --unsafe --compressed --auto-converge --auto-converge-initial 30 --auto-converge-increment 15

by comment 9 & 10, move this bug to verified.

Comment 12 errata-xmlrpc 2016-11-03 18:35:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html


Note You need to log in before you can comment on or make changes to this bug.