Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 536943

Summary: RFE: migration enhancements - make sure live migration ends
Product: Red Hat Enterprise Linux 5 Reporter: Dor Laor <dlaor>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 5.4CC: berrange, dallan, jdenemar, jjarvis, juzhang, kelvin.zhao, llim, sputhenp, syeghiay, tburke, virt-maint, weizhan, xen-maint
Target Milestone: rcKeywords: FutureFeature
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.8.2-1.el5 Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: RHEVmigration Environment:
Last Closed: 2011-01-13 22:53:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 Jiri Denemark 2010-09-02 11:59:29 UTC
Fixed in libvirt-0.8.2-1.el5

Comment 9 Jiri Denemark 2010-10-27 09:11:03 UTC
Sure, 5.6 doesn't support JSON so you won't see the errors coming from JSON implementation being broken. Even for JSON the log content mentioned in comment 21 is not the expected behavior. The expected outcome is described in other comments.

Basically, live migration works as follows: First, all guest's memory is transferred to the target host. In each following iteration only memory pages changed from last iteration are transferred. When the amount of changed pages is low enough to be transferred in maxdowntime, the guest is paused and the migration is finished offline.

To test it, you need to start a guest which continuously changes lots of memory pages so that the migration never gets to the last step. After you increase maxdowntime enough, the migration will finish.

However, the implementation in qemu is a black magic and cannot really be done accurately, which makes testing this a bit harder.

Comment 10 Daniel Berrangé 2010-10-27 09:54:33 UTC
To add even more complexity to this, it was recently discovered that the way QEMU itself handles 'max migration downtime' is completely & utterly broken. Even if you set max downtime of 25ms, migrating a 200 GB guest may still see a downtime of 30 *Minutes*. Basically this is untestable as far as I can see, unless you can attach a debugger to QEMU and watch for QEMU processing the monitor command from libvirt.

Comment 11 weizhang 2010-10-28 02:04:42 UTC
Hi Jiri,

I test the set bandwidth on virt-manager and it seem work. I set the bandwith with 1M and 50M, and it will show the obvious difference on the migration speed. For testing the setmaxdowntime, I also use the bandwidth, I set the bandwidth to 1M and the migration will hold a long time. But After I setmaxdowntime with 1000, it will stop immediately. Can it be the right method to verify the bug?

Comment 12 Dave Allan 2010-10-28 03:52:52 UTC
weizhang, see Daniel's comment #10.

Comment 13 weizhang 2010-10-28 06:35:33 UTC
(In reply to comment #10)
> To add even more complexity to this, it was recently discovered that the way
> QEMU itself handles 'max migration downtime' is completely & utterly broken.
> Even if you set max downtime of 25ms, migrating a 200 GB guest may still see a
> downtime of 30 *Minutes*. Basically this is untestable as far as I can see,
> unless you can attach a debugger to QEMU and watch for QEMU processing the
> monitor command from libvirt.

How to " attach a debugger to QEMU and watch for QEMU processing the monitor command from libvirt." ? Thanks.

Comment 14 weizhang 2010-10-28 10:04:24 UTC
test steps:

1. set log_outputs="1:file:/var/log/libvirt/libvirt.log" in the libvirtd.conf file and restart libvirtd
2. do migration in virt-manager with bandwidth = 1 M
3. when migration do 
#virsh migrate-setmaxdowntime mig 1000
4. see the log info

expect result: you can see the following message on the /var/log/libvirt/libvirt.log and no error info when migration

...

 17:09:55.003: debug : qemuMonitorCommandWithHandler:231 : Send command 'migrate_set_speed 1m' for write with FD -1

...

17:09:58.349: debug : virDomainMigrateSetMaxDowntime:12141 : domain=0x1abdd4b0, downtime=1000, flags=0
17:09:58.349: debug : qemuDomainMigrateSetMaxDowntime:11611 : Requesting migration downtime change to 1000ms

...

17:09:58.388: debug : qemuDomainWaitForMigrationComplete:4843 : Setting migration downtime to 1000ms
17:09:58.388: debug : qemuMonitorSetMigrationDowntime:1262 : mon=0x1abbb770 downtime=1000
17:09:58.388: debug : qemuMonitorCommandWithHandler:231 : Send command 'migrate_set_downtime 1000ms' for write with FD -1

...

Comment 15 weizhang 2010-10-29 02:33:12 UTC
This bug can be verified.

# uname -r
2.6.18-228.el5
# rpm -qa libvirt
libvirt-0.8.2-8.el5
# rpm -qa |grep kvm
kvm-83-205.el5
kvm-qemu-img-83-205.el5
kmod-kvm-83-205.el5

Comment 17 errata-xmlrpc 2011-01-13 22:53:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0060.html