Bug 1265179

Summary: With dataplane, when migrate to remote NBD disk after drive-mirror, qemu core dump ( both src host and des host)
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Stefan Hajnoczi <stefanha>
Status: CLOSED ERRATA QA Contact: FuXiangChun <xfu>
Severity: high Docs Contact:
Priority: medium    
Version: 7.2CC: chayang, huding, juzhang, knoel, lmiksik, michen, mrezanin, qizhu, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.6.0-9.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-07 20:39:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pei Zhang 2015-09-22 10:17:56 UTC
Description of problem:
With dataplane, drive-mirror to remote NBD empty disk, after the mirror job ready, do migrate, qemu core dump(both src host and des host).


Version-Release number of selected component (if applicable):
Src host:
kernel:3.10.0-318.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-24.el7.x86_64

des host:
kernel:3.10.0-318.el7.x86_64
qemu-kvm-rhev:qemu-kvm-rhev-2.3.0-24.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
1.on src host, boot guest with dataplane
# /usr/libexec/qemu-kvm -name rhel6.7 -machine pc-i440fx-rhel7.2.0,accel=kvm \
-cpu SandyBridge -m 4G,slots=256,maxmem=40G -numa node \
-smp 4,sockets=2,cores=2,threads=1 \
-uuid 82b1a01e-5f6c-4f5f-8d27-3855a74e6b61 \
-netdev tap,id=hostnet0 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=12:54:00:5c:88:61 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16 \
-spice port=5901,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-monitor stdio \
-serial unix:/tmp/monitor,server,nowait \
-qmp tcp:0:5555,server,nowait \
-object iothread,id=iothread0 \
-drive file=/home/rhel6.7_virtio.qcow2,format=qcow2,if=none,id=drive-virtio-blk0,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-blk0,id=virtio-blk0,iothread=iothread0 \

2. on des host, boot guest(-incoming) with a empty disk
# qemu-img create -f qcow2 /home/rhel6.7_virtio.qcow2 20G
# (same command with step 1)
  -incoming tcp:0:6666

3. on des host, create NBD server, and export the empty disk 
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.66.9.120", "port": "3333" } } } }
{"return": {}}

{ "execute": "nbd-server-add", "arguments": { "device": "drive-virtio-blk0","writable": true } }
{"return": {}}

4. on src host, drive-mirror to the remote NBD disk in step 3
{ "execute": "drive-mirror", "arguments": { "device": "drive-virtio-blk0","target": "nbd://10.66.9.120:3333/drive-virtio-blk0", "sync": "full","format": "raw", "mode": "existing" } }
{"return": {}}

5. on src host, after the mirror job ready, do migrate
{"timestamp": {"seconds": 1442915362, "microseconds": 512262}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-blk0", "len": 2859008000, "offset": 2859008000, "speed": 0, "type": "mirror"}}

(qemu) migrate -d tcp:10.66.9.120:6666


Actual results:
qemu core dumped.

src host:
(qemu) Co-routine re-entered recursively
Aborted (core dumped)

des host:
(qemu) qemu-kvm: Unknown combination of migration flags: 0
qemu-kvm: error while loading state section id 2(ram)
qemu-kvm: load of migration failed: Invalid argument


Expected results:
1. The migration should work well
2. qemu should not core dump

Additional info:
1. without dataplane, the vm storage migration works well.
2. gdb info:
**********on src host:
(qemu) [New Thread 0x7ffed95fe700 (LWP 26916)]
Co-routine re-entered recursively

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe7965700 (LWP 26828)]
0x00007ffff06ee5f7 in raise () from /lib64/libc.so.6
......

(gdb) bt full
#0  0x00007ffff06ee5f7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007ffff06efce8 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00005555557ed8eb in qemu_coroutine_enter (co=0x555558827c20, opaque=0x0)
    at qemu-coroutine.c:111
        self = <optimized out>
        ret = <optimized out>
#3  0x00005555557e9826 in aio_dispatch_clients (ctx=ctx@entry=0x5555569d89a0, 
    client_mask=client_mask@entry=-1) at aio-posix.c:171
        tmp = <optimized out>
        revents = <optimized out>
        dispatch = <optimized out>
        node = 0x555556937480
        progress = <optimized out>
#4  0x00005555557e9cdb in aio_poll_clients (ctx=0x5555569d89a0, 
    blocking=<optimized out>, client_mask=client_mask@entry=-1)
    at aio-posix.c:300
        node = 0x0
        was_dispatching = false
        i = <optimized out>
        ret = <optimized out>
        progress = false
---Type <return> to continue, or q <return> to quit---
        timeout = <optimized out>
        __PRETTY_FUNCTION__ = "aio_poll_clients"
#5  0x00005555556ca68e in aio_poll (blocking=<optimized out>, 
    ctx=<optimized out>) at /usr/src/debug/qemu-2.3.0/include/block/aio.h:255
No locals.
#6  iothread_run (opaque=0x555556a06000) at iothread.c:44
        iothread = 0x555556a06000
        blocking = <optimized out>
#7  0x00007ffff6bc4dc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#8  0x00007ffff07af1cd in clone () from /lib64/libc.so.6
No symbol table info available.

**********on des host:
(qemu) qemu-kvm: Unknown combination of migration flags: 0
qemu-kvm: error while loading state section id 2(ram)
qemu-kvm: load of migration failed: Invalid argument
Detaching after fork from child process 15053.
[Thread 0x7fffe3f6d700 (LWP 14759) exited]
[Thread 0x7fffe476e700 (LWP 14758) exited]
[Thread 0x7fffe4f6f700 (LWP 14757) exited]
[Thread 0x7fffe5770700 (LWP 14756) exited]
[Thread 0x7fffe7964700 (LWP 14745) exited]
[Thread 0x7fffe8165700 (LWP 14744) exited]
[Thread 0x7ffff7f93c80 (LWP 14740) exited]
[Inferior 1 (process 14740) exited with code 01]
......

(gdb) bt full
No stack.

Comment 1 Pei Zhang 2015-09-22 10:45:39 UTC
More info:
Also tested with  qemu-kvm-rhev-2.3.0-22.el7.x86_64 and qemu-kvm-rhev-2.3.0-23.el7.x86_64, both hit the same issue. So this bug maybe not a regression.

Comment 3 Stefan Hajnoczi 2016-06-21 13:51:14 UTC
*** Bug 1250861 has been marked as a duplicate of this bug. ***

Comment 4 Stefan Hajnoczi 2016-06-21 13:53:04 UTC
*** Bug 1250798 has been marked as a duplicate of this bug. ***

Comment 5 Miroslav Rezanina 2016-06-23 08:45:01 UTC
Fix included in qemu-kvm-rhev-2.6.0-9.el7

Comment 7 FuXiangChun 2016-09-14 07:56:42 UTC
According to comment0. verified this bug qemu-kvm-rhev-2.6.0-24.el7.

guest and qemu process work after migrating.

Comment 9 errata-xmlrpc 2016-11-07 20:39:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html