Bug 1440667

Summary: The guest exit abnormally with data-plane when do "block-job-complete" after do "drive-mirror" in QMP.
Product: Red Hat Enterprise Linux 7 Reporter: Yongxue Hong <yhong>
Component: qemu-kvm-rhevAssignee: Fam Zheng <famz>
Status: CLOSED ERRATA QA Contact: Qianqian Zhu <qizhu>
Severity: high Docs Contact:
Priority: high    
Version: 7.4CC: aliang, chayang, coli, juzhang, knoel, michen, mrezanin, pbonzini, qizhu, qzhang, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qemu-kvm-rhev-2.9.0-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-02 04:35:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yongxue Hong 2017-04-10 08:47:27 UTC
Description of problem:
The guest exit abnormally with data-plane when do "block-job-complete" after do "drive-mirror" in QMP.

Version-Release number of selected component (if applicable):
host:3.10.0-643.el7.ppc64le
guest:3.10.0-643.el7.ppc64le
qemu-kvm:version 2.8.92(qemu-kvm-rhev-2.9.0-0.el7.patchwork201703291116)

How reproducible:
100%

Steps to Reproduce:
1.boot a guest with data-plane as follow:
eg:
/usr/libexec/qemu-kvm \
-name rhel7_4-9343 \
-M pseries-rhel7.4.0 \
-m 8G \
-nodefaults \
-smp 4,sockets=4,cores=1,threads=1 \
-boot menu=on,order=cd \
-device VGA,id=vga0 \
-device nec-usb-xhci,id=xhci \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi-pci-0 \
-drive file=/home/hyx/iso/RHEL-7.4-20170330.1-Server-ppc64le-dvd1.iso,if=none,media=cdrom,id=cd-0 \
-device scsi-cd,bus=scsi-pci-0.0,id=scsi-cd-0,drive=cd-0,channel=0,scsi-id=0,lun=0,bootindex=1 \
-drive file=/home/hyx/image/rhel-7_4-9330-20G.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 \
-device virtio-blk-pci,bus=pci.0,addr=0x03,drive=drive-0,id=virtio-blk-0,iothread=iothread0,bootindex=0 \
-drive file=/home/hyx/image/rhel-7_4-9330-30G.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-1 \
-device virtio-blk-pci,bus=pci.0,addr=0x04,drive=drive-1,id=virtio-blk-1,iothread=iothread1 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=70:e2:84:14:0e:00 \
-monitor stdio \
-serial unix:./sock3,server,nowait \
-qmp tcp:0:3003,server,nowait \
-vnc :3

2. do "drive-mirror" in qmp
eg:
{ "execute": "drive-mirror", "arguments": { "device": "drive-1","target": "/home/hyx/image/rhel-7_4-9343-mirror-30G.qcow2","sync": "full","format": "qcow2" } }
3. do "block-job-complete" in qmp
eg:
{ "execute": "block-job-complete", "arguments": { "device": "drive-1"} }

Actual results:
The guest exit and the hmp shows:
qemu-kvm: block/io.c:164: bdrv_drain_recurse: Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' failed.

Expected results:
The guest run normally.

Additional info:
This is also reproduce in X86_64.

Comment 2 Yongxue Hong 2017-04-10 10:52:37 UTC
The backtrace debug info :
qemu-kvm: block/io.c:164: bdrv_drain_recurse: Assertion `qemu_get_current_aio_context() == qemu_get_aio_context()' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x3fffb40feaa0 (LWP 20251)]
0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install libdb-5.3.21-20.el7.ppc64le
(gdb) bt
#0  0x00003fffb6f2edc8 in raise () from /lib64/libc.so.6
#1  0x00003fffb6f30f4c in abort () from /lib64/libc.so.6
#2  0x00003fffb6f24b44 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00003fffb6f24c34 in __assert_fail () from /lib64/libc.so.6
#4  0x000000004a88a264 in bdrv_drain_recurse (bs=0x4bad6c00) at block/io.c:164
#5  0x000000004a88abd8 in bdrv_drained_begin (bs=0x4bad6c00) at block/io.c:231
#6  0x000000004a82deac in bdrv_child_cb_drained_begin (child=<optimized out>) at block.c:719
#7  0x000000004a88ac74 in bdrv_parent_drained_begin (bs=0x4bacd000) at block/io.c:53
#8  bdrv_drained_begin (bs=0x4bacd000) at block/io.c:228
#9  0x000000004a88b6ac in bdrv_co_drain_bh_cb (opaque=0x3ffda5adfd80) at block/io.c:190
#10 0x000000004a928d58 in aio_bh_call (bh=0x4b7c7d40) at util/async.c:90
#11 aio_bh_poll (ctx=0x4b8c1b80) at util/async.c:118
#12 0x000000004a92d934 in aio_poll (ctx=0x4b8c1b80, blocking=<optimized out>) at util/aio-posix.c:682
#13 0x000000004a70cde8 in iothread_run (opaque=0x4ba209a0) at iothread.c:59
#14 0x00003fffb70e8728 in start_thread () from /lib64/libpthread.so.0
#15 0x00003fffb70113d0 in clone () from /lib64/libc.so.6
(gdb)

Comment 3 Fam Zheng 2017-04-12 07:21:11 UTC
Upstream fix for this in QEMU 2.9:

commit 19dd29e8a77cd820515de5289f566508e0ed4926
Author: Fam Zheng <famz>
Date:   Fri Apr 7 14:54:11 2017 +0800

    mirror: Fix aio context of mirror_top_bs
    
    It should be moved to the same context as source, before inserting to the
    graph.
    
    Reviewed-by: Eric Blake <eblake>
    Reviewed-by: Kevin Wolf <kwolf>
    Signed-off-by: Fam Zheng <famz>
    Signed-off-by: Kevin Wolf <kwolf>

Comment 4 Qianqian Zhu 2017-04-26 06:31:15 UTC
Reproduced on qemu-kvm-rhev-2.8.0-5.el7.x86_64, and verified on qemu-kvm-rhev-2.9.0-1.el7.x86_64&kernel-3.10.0-640.el7.x86_64.

Steps:
1. Launch guest with data-plane:
/usr/libexec/qemu-kvm -name rhel7_4-9343 -m 1G -smp 2 -object iothread,id=iothread0 -drive file=/home/kvm_autotest_root/images/rhel74-64-virtio.qcow2,format=qcow2,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive-0 -device virtio-blk-pci,drive=drive-0,id=virtio-blk-0,iothread=iothread0,bootindex=0 -monitor stdio -qmp tcp:0:5555,server,nowait -vnc :3

2. block mirror and reopen:
{ "execute": "drive-mirror", "arguments": { "device": "drive-0","target": "/home/mirror1.qcow2","sync": "full","format": "qcow2" } }
{"return": {}}
{"timestamp": {"seconds": 1493187904, "microseconds": 180627}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-0", "len": 3761504256, "offset": 3761504256, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive-0"}}
{"return": {}}
{"timestamp": {"seconds": 1493187916, "microseconds": 974757}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-0", "len": 3761504256, "offset": 3761504256, "speed": 0, "type": "mirror"}}

Result:
qemu-kvm-rhev-2.8.0-5.el7.x86_64:
Both qemu and guest hang.
qemu-kvm-rhev-2.9.0-1.el7.x86_64:
Both qemu and guest work well.

Therefore moving to VERIFIED.

Comment 6 errata-xmlrpc 2017-08-02 04:35:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:2392