Bug 1525827

Summary: Qemu core dump after block mirror to nbd disk with data-plane
Product: Red Hat Enterprise Linux 7 Reporter: Qianqian Zhu <qizhu>
Component: qemu-kvm-rhevAssignee: Kevin Wolf <kwolf>
Status: CLOSED DUPLICATE QA Contact: Gu Nini <ngu>
Severity: high Docs Contact:
Priority: high    
Version: 7.5CC: aliang, chayang, coli, eblake, juzhang, knoel, lolyu, stefanha, virt-maint, xfu, yhong
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-27 18:45:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qianqian Zhu 2017-12-14 06:41:32 UTC
Description of problem:
Qemu core dump after block mirror to ndb disk with data-plane.
Without data-plane, the operation will succeed normally.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-12.el7.x86_64
kernel-3.10.0-820.el7.x86_64

How reproducible:
90%

Steps to Reproduce:
1.  Source:
/usr/libexec/qemu-kvm \
-M q35,accel=kvm,kernel-irqchip=split \
-m 4G \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,iothread=iothread1 \
-object iothread,id=iothread1 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/kvm_autotest_root/images/win2008-r2-64-virtio-scsi.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,id=idTqeiKU,netdev=idU17IIx,bus=pcie.0  \
-netdev tap,id=idU17IIx  \
-cpu 'SandyBridge',+kvm_pv_unhalt  \
-vnc :0   \
-enable-kvm  \
-qmp tcp::5555,server,nowait \
-monitor stdio

Destination:
/usr/libexec/qemu-kvm \
-M q35,accel=kvm,kernel-irqchip=split \
-m 4G \
-device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,iothread=iothread1 \
-object iothread,id=iothread1 \
-drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/mirror.qcow2 \
-device scsi-hd,id=image1,drive=drive_image1 \
-device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,id=idTqeiKU,netdev=idU17IIx,bus=pcie.0  \
-netdev tap,id=idU17IIx  \
-cpu 'SandyBridge',+kvm_pv_unhalt  \
-vnc :1  \
-enable-kvm  \
-qmp tcp::5566,server,nowait \
-monitor stdio \
--incoming tcp::1234

2. Destination:
 { "execute": "qmp_capabilities"}
{"return": {}}
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host":"10.66.8.133", "port": "9000" } } } }
{"return": {}}
{"execute":"nbd-server-add","arguments":{"device":"drive_image1", "writable": true}}
{"return": {}}

3. Source:
{ "execute": "drive-mirror", "arguments": { "device": "drive_image1", "target": "nbd://10.66.8.133:9000/drive_image1", "sync": "full", "format": "raw", "mode": "existing" } }


Actual results:

Destination qemu prompt:
(qemu) qemu-kvm: Disconnect client, due to: Unable to read from socket: Connection reset by peer

Source qemu core dump:
qemu-kvm: util/aio-posix.c:520: run_poll_handlers: Assertion `ctx->poll_disable_cnt == 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe694a700 (LWP 18015)]
0x00007fffed9f01f7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install elfutils-libelf-0.168-5.el7.x86_64 elfutils-libs-0.168-5.el7.x86_64 glibc-2.17-194.el7.x86_64 glusterfs-api-3.12.2-1.el7rhgs.x86_64 glusterfs-libs-3.12.2-1.el7rhgs.x86_64 libcurl-7.29.0-35.el7.x86_64 libgcc-4.8.5-14.el7.x86_64 libibverbs-13-5.el7.x86_64 librdmacm-13-5.el7.x86_64 libstdc++-4.8.5-14.el7.x86_64 nss-softokn-freebl-3.28.3-4.el7.x86_64 nss-util-3.28.4-2.el7.x86_64 spice-server-0.12.8-1.el7.x86_64 systemd-libs-219-38.el7.x86_64
(gdb) bt full
#0  0x00007fffed9f01f7 in raise () at /lib64/libc.so.6
#1  0x00007fffed9f18e8 in abort () at /lib64/libc.so.6
#2  0x00007fffed9e9266 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007fffed9e9312 in  () at /lib64/libc.so.6
#4  0x0000555555abd4e7 in aio_poll (max_ns=8000, ctx=0x555556d35ac0) at util/aio-posix.c:520
        progress = <optimized out>
        end_time = <optimized out>
        max_ns = 8000
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = <optimized out>
        start = 80632500985065
        __PRETTY_FUNCTION__ = "aio_poll"
#5  0x0000555555abd4e7 in aio_poll (blocking=true, ctx=0x555556d35ac0) at util/aio-posix.c:555
        max_ns = 8000
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = <optimized out>
        start = 80632500985065
        __PRETTY_FUNCTION__ = "aio_poll"
#6  0x0000555555abd4e7 in aio_poll (ctx=0x555556d35ac0, blocking=blocking@entry=true) at util/aio-posix.c:595
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = <optimized out>
        start = 80632500985065
        __PRETTY_FUNCTION__ = "aio_poll"
#7  0x00005555558b115e in iothread_run (opaque=0x555556ce7440) at iothread.c:59
        iothread = 0x555556ce7440
#8  0x00007fffedd85e25 in start_thread () at /lib64/libpthread.so.0
#9  0x00007fffedab334d in clone () at /lib64/libc.so.6


Expected results:
Block mirror succeed.

Additional info:

Comment 2 Ademar Reis 2017-12-18 17:10:47 UTC
May be related to Bug 1503437

Comment 3 Gu Nini 2018-07-27 08:38:29 UTC
Reproduced the bug recently:

https://bugzilla.redhat.com/show_bug.cgi?id=1503437#c8