Bug 1038540 - qemu-kvm aborted while cancel migration then restart it (with page delta compression)
Summary: qemu-kvm aborted while cancel migration then restart it (with page delta comp...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: ---
Assignee: Hai Huang
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-05 10:05 UTC by mazhang
Modified: 2016-09-20 04:40 UTC (History)
8 users (show)

Fixed In Version: qemu-kvm-1.5.3-47.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-06-13 12:57:35 UTC
Target Upstream Version:


Attachments (Terms of Use)
backtrace (43.16 KB, text/x-log)
2013-12-05 10:05 UTC, mazhang
no flags Details
Full log when reproduce the bug (43.71 KB, text/plain)
2014-02-19 06:23 UTC, Qunfang Zhang
no flags Details

Description mazhang 2013-12-05 10:05:24 UTC
Created attachment 833065 [details]
backtrace

Description of problem:
Running migration test with page delta compression, do migrate_cancel then restart migration on destination host, qemu-kvm aborted.

Version-Release number of selected component (if applicable):

Host:
qemu-img-1.5.3-21.el7.x86_64
qemu-kvm-common-rhev-1.5.3-21.el7.x86_64
qemu-kvm-rhev-debuginfo-1.5.3-21.el7.x86_64
qemu-kvm-rhev-1.5.3-21.el7.x86_64
kernel-3.10.0-57.el7.x86_64

Guest:
win8.2-32
virtio-win-prewhql-74

How reproducible:
100%


Steps to Reproduce:
1.start qemu-kvm with following command line:
#gdb --args /usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1,maxcpus=16 \
-enable-kvm \
-name win8-32 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6666,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-drive file=iscsi://10.66.4.216/iqn.2001-04.com.example:storage.disk1.mazhang/1,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=pci.0,addr=0x7,scsi=off,drive=drive-data-disk,id=data-disk \
-device virtio-balloon-pci,bus=pci.0,id=balloon0 \
-device virtio-serial-pci,id=virtio-serial1 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-spice port=5900,disable-ticketing,seamless-migration=on \
-vga qxl \
-global qxl-vga.vram_size=67108864 \
-device intel-hda,id=sound0,bus=pci.0 -device hda-duplex \

2.Migrate guest to another host.
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:10.66.106.40:5800

3.after migration finished, try migrate guest back, and test migrate_cancel
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:10.66.106.39:5800
(qemu) migrate_cancel 

4.Restart migration, after a short while qemu-kvm aborted.
(qemu) migrate -d tcp:10.66.106.39:5800


Actual results:
Qemu-kvm aborted.

Expected results:
Qemu-kvm works well.

Additional info:
Can not hit this problem without page delta compression.

Comment 2 Orit Wasserman 2013-12-17 11:57:30 UTC
Which qemu aborted source or destination?
What was the error message?

Comment 3 mazhang 2013-12-20 02:15:58 UTC
1 This case was test ping-pong migration, first time migration not aborted, while migrate guest back, source qemu-kvm aborted.
2 Error message please see attachment.

Comment 4 Miroslav Rezanina 2014-02-12 12:02:11 UTC
Fix included in qemu-kvm-1.5.3-47.el7

Comment 6 Qunfang Zhang 2014-02-19 06:20:31 UTC
This bug could be reproduced on qemu-kvm-1.5.3-46.el7.x86_64 and verified pass on qemu-kvm-1.5.3-48.el7.x86_64. 

On the old version qemu-kvm-1.5.3-46.el7.x86_64:

1. Boot up a guest:

(gdb) r -cpu SandyBridge -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=port1,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -vnc :10 -vga std  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6  -drive if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot up the guest on destination host with "-incoming tcp:0:5800".

3. On source host:
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:$dst_host_ip:5800
(qemu) migrate_cancel 

4. Restart the qemu command line on dst host and re-migrate again.
(qemu) migrate -d tcp:$dst_host_ip:5800

Result: 
Guest aborted.

*** Error in `/usr/libexec/qemu-kvm': double free or corruption (out): 0x00005555567ab910 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d52d)[0x7ffff2ce052d]
/lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff74f381f]
/usr/libexec/qemu-kvm(+0x225d3b)[0x555555779d3b]
/usr/libexec/qemu-kvm(+0x226944)[0x55555577a944]
/usr/libexec/qemu-kvm(qemu_savevm_state_complete+0x93)[0x5555557e9553]
/usr/libexec/qemu-kvm(+0x1b4989)[0x555555708989]
/lib64/libpthread.so.0(+0x7df3)[0x7ffff604ddf3]
/lib64/libc.so.6(clone+0x6d)[0x7ffff2d5939d]
======= Memory map: ========
555555554000-555555991000 r-xp 00000000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555b90000-555555c5f000 r--p 0043c000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555c5f000-555555ca5000 rw-p 0050b000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555ca5000-55555752a000 rw-p 00000000 00:00 0                          [heap]
......

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffed9bfd700 (LWP 15112)]
0x00007ffff2c98989 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff2c98989 in raise () from /lib64/libc.so.6
#1  0x00007ffff2c9a098 in abort () from /lib64/libc.so.6
#2  0x00007ffff2cd9177 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff2ce052d in _int_free () from /lib64/libc.so.6
#4  0x00007ffff74f381f in g_free () from /lib64/libglib-2.0.so.0
#5  0x0000555555779d3b in migration_end () at /usr/src/debug/qemu-1.5.3/arch_init.c:618
#6  0x000055555577a944 in ram_save_complete (f=0x555556829c10, opaque=<optimized out>)
    at /usr/src/debug/qemu-1.5.3/arch_init.c:781
#7  0x00005555557e9553 in qemu_savevm_state_complete (f=0x555556829c10)
    at /usr/src/debug/qemu-1.5.3/savevm.c:1954
#8  0x0000555555708989 in migration_thread (opaque=<optimized out>) at migration.c:606
#9  0x00007ffff604ddf3 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff2d5939d in clone () from /lib64/libc.so.6
(gdb) 


================

Verified pass on qemu-kvm-1.5.3-48.el7.x86_64:

Ping-pong migration for 6 times with same steps, migration finish successfully. No aborted happens and guest works well. 

So this bug is fixed.

Comment 7 Qunfang Zhang 2014-02-19 06:23:02 UTC
Created attachment 864974 [details]
Full log when reproduce the bug

Comment 8 Qunfang Zhang 2014-02-19 06:23:37 UTC
Setting to VERIFIED according to comment 6.

Comment 10 Ludek Smid 2014-06-13 12:57:35 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.


Note You need to log in before you can comment on or make changes to this bug.