Bug 1038540

Summary: qemu-kvm aborted while cancel migration then restart it (with page delta compression)
Product: Red Hat Enterprise Linux 7 Reporter: mazhang <mazhang>
Component: qemu-kvmAssignee: Hai Huang <hhuang>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 7.0CC: acathrow, hhuang, juzhang, mazhang, michen, mrezanin, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-1.5.3-47.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 12:57:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
backtrace
none
Full log when reproduce the bug none

Description mazhang 2013-12-05 10:05:24 UTC
Created attachment 833065 [details]
backtrace

Description of problem:
Running migration test with page delta compression, do migrate_cancel then restart migration on destination host, qemu-kvm aborted.

Version-Release number of selected component (if applicable):

Host:
qemu-img-1.5.3-21.el7.x86_64
qemu-kvm-common-rhev-1.5.3-21.el7.x86_64
qemu-kvm-rhev-debuginfo-1.5.3-21.el7.x86_64
qemu-kvm-rhev-1.5.3-21.el7.x86_64
kernel-3.10.0-57.el7.x86_64

Guest:
win8.2-32
virtio-win-prewhql-74

How reproducible:
100%


Steps to Reproduce:
1.start qemu-kvm with following command line:
#gdb --args /usr/libexec/qemu-kvm \
-M pc \
-cpu SandyBridge \
-m 4G \
-smp 4,sockets=2,cores=2,threads=1,maxcpus=16 \
-enable-kvm \
-name win8-32 \
-uuid 990ea161-6b67-47b2-b803-19fb01d30d12 \
-smbios type=1,manufacturer='Red Hat',product='RHEV Hypervisor',version=el6,serial=koTUXQrb,uuid=feebc8fd-f8b0-4e75-abc3-e63fcdb67170 \
-k en-us \
-rtc base=localtime,clock=host,driftfix=slew \
-nodefaults \
-monitor stdio \
-qmp tcp:0:6666,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-drive file=iscsi://10.66.4.216/iqn.2001-04.com.example:storage.disk1.mazhang/1,if=none,id=drive-data-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop \
-device virtio-blk-pci,bus=pci.0,addr=0x7,scsi=off,drive=drive-data-disk,id=data-disk \
-device virtio-balloon-pci,bus=pci.0,id=balloon0 \
-device virtio-serial-pci,id=virtio-serial1 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial1.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-spice port=5900,disable-ticketing,seamless-migration=on \
-vga qxl \
-global qxl-vga.vram_size=67108864 \
-device intel-hda,id=sound0,bus=pci.0 -device hda-duplex \

2.Migrate guest to another host.
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:10.66.106.40:5800

3.after migration finished, try migrate guest back, and test migrate_cancel
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:10.66.106.39:5800
(qemu) migrate_cancel 

4.Restart migration, after a short while qemu-kvm aborted.
(qemu) migrate -d tcp:10.66.106.39:5800


Actual results:
Qemu-kvm aborted.

Expected results:
Qemu-kvm works well.

Additional info:
Can not hit this problem without page delta compression.

Comment 2 Orit Wasserman 2013-12-17 11:57:30 UTC
Which qemu aborted source or destination?
What was the error message?

Comment 3 mazhang 2013-12-20 02:15:58 UTC
1 This case was test ping-pong migration, first time migration not aborted, while migrate guest back, source qemu-kvm aborted.
2 Error message please see attachment.

Comment 4 Miroslav Rezanina 2014-02-12 12:02:11 UTC
Fix included in qemu-kvm-1.5.3-47.el7

Comment 6 Qunfang Zhang 2014-02-19 06:20:31 UTC
This bug could be reproduced on qemu-kvm-1.5.3-46.el7.x86_64 and verified pass on qemu-kvm-1.5.3-48.el7.x86_64. 

On the old version qemu-kvm-1.5.3-46.el7.x86_64:

1. Boot up a guest:

(gdb) r -cpu SandyBridge -M pc -enable-kvm -m 4096 -smp 2,sockets=2,cores=1,threads=1 -name rhel6.4-64 -uuid 9a0e67ec-f286-d8e7-0548-0c1c9ec93009 -nodefconfig -nodefaults -monitor stdio -rtc base=utc,clock=host,driftfix=slew -no-kvm-pit-reinjection -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/RHEL-Server-7.0-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d5:51:8a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=channel1,path=/tmp/helloworld1,server,nowait -device virtserialport,chardev=channel1,name=port1,bus=virtio-serial0.0,id=port1 -chardev socket,id=channel2,path=/tmp/helloworld2,server,nowait -device virtserialport,chardev=channel2,name=port2,bus=virtio-serial0.0,id=port2 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -vnc :10 -vga std  -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6  -drive if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -qmp tcp:0:5555,server,nowait -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0

2. Boot up the guest on destination host with "-incoming tcp:0:5800".

3. On source host:
(qemu) migrate_set_capability xbzrle on
(qemu) migrate_set_cache_size 2G
(qemu) migrate -d tcp:$dst_host_ip:5800
(qemu) migrate_cancel 

4. Restart the qemu command line on dst host and re-migrate again.
(qemu) migrate -d tcp:$dst_host_ip:5800

Result: 
Guest aborted.

*** Error in `/usr/libexec/qemu-kvm': double free or corruption (out): 0x00005555567ab910 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7d52d)[0x7ffff2ce052d]
/lib64/libglib-2.0.so.0(g_free+0xf)[0x7ffff74f381f]
/usr/libexec/qemu-kvm(+0x225d3b)[0x555555779d3b]
/usr/libexec/qemu-kvm(+0x226944)[0x55555577a944]
/usr/libexec/qemu-kvm(qemu_savevm_state_complete+0x93)[0x5555557e9553]
/usr/libexec/qemu-kvm(+0x1b4989)[0x555555708989]
/lib64/libpthread.so.0(+0x7df3)[0x7ffff604ddf3]
/lib64/libc.so.6(clone+0x6d)[0x7ffff2d5939d]
======= Memory map: ========
555555554000-555555991000 r-xp 00000000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555b90000-555555c5f000 r--p 0043c000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555c5f000-555555ca5000 rw-p 0050b000 fd:01 68798346                   /usr/libexec/qemu-kvm
555555ca5000-55555752a000 rw-p 00000000 00:00 0                          [heap]
......

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffed9bfd700 (LWP 15112)]
0x00007ffff2c98989 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff2c98989 in raise () from /lib64/libc.so.6
#1  0x00007ffff2c9a098 in abort () from /lib64/libc.so.6
#2  0x00007ffff2cd9177 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffff2ce052d in _int_free () from /lib64/libc.so.6
#4  0x00007ffff74f381f in g_free () from /lib64/libglib-2.0.so.0
#5  0x0000555555779d3b in migration_end () at /usr/src/debug/qemu-1.5.3/arch_init.c:618
#6  0x000055555577a944 in ram_save_complete (f=0x555556829c10, opaque=<optimized out>)
    at /usr/src/debug/qemu-1.5.3/arch_init.c:781
#7  0x00005555557e9553 in qemu_savevm_state_complete (f=0x555556829c10)
    at /usr/src/debug/qemu-1.5.3/savevm.c:1954
#8  0x0000555555708989 in migration_thread (opaque=<optimized out>) at migration.c:606
#9  0x00007ffff604ddf3 in start_thread () from /lib64/libpthread.so.0
#10 0x00007ffff2d5939d in clone () from /lib64/libc.so.6
(gdb) 


================

Verified pass on qemu-kvm-1.5.3-48.el7.x86_64:

Ping-pong migration for 6 times with same steps, migration finish successfully. No aborted happens and guest works well. 

So this bug is fixed.

Comment 7 Qunfang Zhang 2014-02-19 06:23:02 UTC
Created attachment 864974 [details]
Full log when reproduce the bug

Comment 8 Qunfang Zhang 2014-02-19 06:23:37 UTC
Setting to VERIFIED according to comment 6.

Comment 10 Ludek Smid 2014-06-13 12:57:35 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.