Bug 1368422
| Summary: | Post-copy migration fails with XBZRLE compression | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Milan Zamazal <mzamazal> | |
| Component: | qemu-kvm-rhev | Assignee: | Dr. David Alan Gilbert <dgilbert> | |
| Status: | CLOSED ERRATA | QA Contact: | xianwang <xianwang> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | high | |||
| Version: | 7.3 | CC: | chayang, dgilbert, hhuang, jherrman, juzhang, michal.skrivanek, mrezanin, mtessun, mzamazal, qizhu, qzhang, virt-maint, xianwang | |
| Target Milestone: | rc | Keywords: | ZStream | |
| Target Release: | 7.4 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | qemu-kvm-rhev-2.8.0-1 | Doc Type: | Bug Fix | |
| Doc Text: |
Using post-copy migration with XOR-based zero run-lenth enconding (XBZRLE) compression previously caused the migration to fail and the guest to stay in a paused state. This update disables XBZRLE page compression for post-copy migration, and thus avoids the described problem.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1395360 (view as bug list) | Environment: | ||
| Last Closed: | 2017-08-01 23:34:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1395265, 1395360, 1401400 | |||
This bug has been verified both for ppc and x86.
Bug reproduced in PPC platform:
Version-Release number of selected component (if applicable):
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.6.0-22.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
Guest:
3.10.0-558.el7.ppc64le
Steps to Reproduce:
1.Boot a vm in src host with qemu cli:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1' \
-sandbox off \
-nodefaults \
-machine pseries-rhel7.3.0 \
-vga std \
-device virtio-serial-pci,id=virtio_serial_pci0,bus=pci.0,addr=03 \
-device virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 \
-chardev socket,id=devorg.qemu.guest_agent.0,path=/tmp/virtio_port-org.qemu.guest_agent.0-20160516-164929-dHQ00mMM,server,nowait \
-device virtserialport,chardev=devorg.qemu.guest_agent.0,name=org.qemu.guest_agent.0,id=org.qemu.guest_agent.0,bus=virtio_serial_pci0.0 \
-device nec-usb-xhci,id=usb1,addr=1d.7,multifunction=on,bus=pci.0 \
-drive file=/root/RHEL.7.3.qcow2,if=none,id=blk1 \
-device virtio-blk-pci,scsi=off,drive=blk1,id=blk-disk1,bootindex=1 \
-drive id=drive_cd1,if=none,snapshot=off,aio=native,cache=none,media=cdrom,file=/root/RHEL-7.3-20161019.0-Server-ppc64le-dvd1.iso \
-device scsi-cd,id=cd1,drive=drive_cd1,bootindex=2 \
-device virtio-net-pci,mac=9a:7b:7c:7d:7e:71,id=idtlLxAk,vectors=4,netdev=idlkwV8e,bus=pci.0,addr=05 \
-netdev tap,id=idlkwV8e,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-m 8G \
-smp 2 \
-cpu host \
-device usb-kbd \
-device usb-tablet \
-qmp tcp:0:8881,server,nowait \
-vnc :1 \
-msg timestamp=on \
-rtc base=localtime,clock=vm,driftfix=slew \
-boot order=cdn,once=c,menu=off,strict=off \
-monitor stdio \
-enable-kvm
2.Boot a vm in dst host with qemu cli the same as src host and appending "-incoming tcp:0:5801"
3. Run "test" which is a program in guest that make memory intensive and can produce dirty pages during migration,the detail of program is as "Additional info".
#gcc test.c -o test
#./test
4. Set migration configuration in HMP and do migration
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.19.112.39:5801
5. Check migration status, after producing dirty pages switch to post-copy.
(qemu) info migrate
capabilities: xbzrle: on rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
dirty sync count: 7
dirty pages rate: 13587 pages
.......other info....
(qemu) migrate_start_postcopy
Actual results:
The migration fails and the VM gets paused.
In src HMP:
(qemu) migrate_start_postcopy
(qemu) 2017-02-13T06:56:00.913043Z qemu-kvm: RP: Sibling indicated error 1
2017-02-13T06:56:01.105488Z qemu-kvm: socket_writev_buffer: Got err=104 for (32768/18446744073709551615)
(qemu) info migrate
capabilities: xbzrle: on rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on
Migration status: failed
total time: 0 milliseconds
(qemu) info status
VM status: paused (postmigrate)
While in dst HMP:
(qemu) 2017-02-13T06:56:00.911136Z qemu-kvm: Unknown combination of migration flags: 0x40 (postcopy m)
2017-02-13T06:56:00.911222Z qemu-kvm: error while loading state section id 2(ram)
2017-02-13T06:56:00.911233Z qemu-kvm: postcopy_ram_listen_thread: loadvm failed: -22
Additional info:
(1)the program that specified in step 3
#gcc test.c -o test
#./test
#cat test.c
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
int main()
{
void wakeup();
signal(SIGALRM,wakeup);
alarm(120);
char *buf = (char *) calloc(40960, 4096);
while (1) {
int i;
for (i = 0; i < 40960 * 4; i++) {
buf[i * 4096 / 4]++;
}
printf(".");
}
}
void wakeup()
{
exit(0);
}
Bug verify in ppc platform
Bug is verified in following version:
Host:
kernel:3.10.0-558.el7.ppc64le
qemu-kvm-rhev-2.8.0-1.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
Guest:
3.10.0-558.el7.ppc64le
steps:
the same as bug reproduced.
Actual results:
The migration successed and the VM is running
In src HMP:
(qemu) info migrate
capabilities: xbzrle: on rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
Migration status: completed
dirty sync count: 22
postcopy request count: 1492
In dst HMP:
(qemu) info status
VM status: running
Bug verify in x86 platform
Bug is verified in following version:
Host:
3.10.0-563.el7.x86_64
qemu-kvm-rhev-2.8.0-1.el7.x86_64
Guest:
3.10.0-514.10.1.el7.x86_64
steps:
the same as bug reproduced.
Actual results:
The migration successed and the VM is running
In src HMP:
(qemu) info migrate
capabilities: xbzrle: on rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: on x-colo: off
Migration status: completed
dirty sync count: 14
postcopy request count: 3010
In dst HMP:
(qemu) info status
VM status: running
So, this bug is fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:2392 |
Description of problem: When I migrate a VM with XBZRLE compression enabled and I switch the migration to post-copy mode after several unsuccessful iterations of the migration, the migration fails and the VM remains in a paused state. Version-Release number of selected component (if applicable): 2.6.0-20.el7.x86_64 How reproducible: Most of the time. Steps to Reproduce: 1. Run a VM: virsh create DOMAIN.xml 2. Run a memory intensive application in the VM. 3. Limit migration bandwidth to prevent success of pre-copy migration, e.g.: virsh migrate-setspeed DOMAIN 10 4. Migrate the VM with XBZRLE compression and postcopy enabled: virsh migrate DOMAIN qemu+tcp://root@HOST/system --verbose --live --compressed --comp-methods xbzrle --postcopy 5. Wait a couple of iterations, then switch to post-copy from another shell: virsh migrate-postcopy DOMAIN 6. The migration fails with an error like this: qemu-kvm: Unknown combination of migration flags: 0x40 (postcopy mode) qemu-kvm: error while loading state section id 2(ram) qemu-kvm: postcopy_ram_listen_thread: loadvm failed: -22 and the VM gets paused. Actual results: The migration fails and the VM gets paused. Expected results: The migration succeeds.