Bug 1539530 - Guest hung after block-job-complete for a drive-mirror image with NBD backend and iothread.
Summary: Guest hung after block-job-complete for a drive-mirror image with NBD backend...
Keywords:
Status: CLOSED DUPLICATE of bug 1634219
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 8.0
Assignee: Sergio Lopez
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks: 1649160
TreeView+ depends on / blocked
 
Reported: 2018-01-29 06:37 UTC by Yongxue Hong
Modified: 2019-09-02 07:58 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-09-02 07:58:40 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Yongxue Hong 2018-01-29 06:37:20 UTC
Description of problem:
Guest hung after block-job-complete for a drive-mirror image with NBD backend and iothread.

Version-Release number of selected component (if applicable):
host: 4.14.0-33.el7a.ppc64le
guest: 4.14.0-33.el7a.ppc64le
qemu: qemu-kvm-rhev-2.10.0-18.el7.ppc64le

How reproducible:
100%

Steps to Reproduce:
1.Boot a guest with NBD backend and iothread.
e.g:
[root@c155f3-u23 commands]# cat guest-9332.sh
/usr/libexec/qemu-kvm \
-name 'guest' \
-machine pseries-rhel7.5.0 \
-m 16G \
-nodefaults \
-smp 8,cores=4,threads=2,sockets=1 \
-boot order=cdn,once=d,menu=off,strict=off  \
-device nec-usb-xhci,id=xhci0 \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-chardev socket,id=qmp_id_qmpmonitor,path=/var/tmp/qmp-cmd-monitor-yhong,server,nowait \
-mon chardev=qmp_id_qmpmonitor,mode=control  \
-chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial \
-enable-kvm \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
-device virtio-scsi-pci,id=scsi2 \
-drive file=nbd:10.0.1.8:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \
-device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \
-drive file=nbd:10.0.1.8:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \
-device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \
-drive file=nbd:10.0.1.8:20002,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \
-device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:03 \
-monitor stdio \

2.Do drive-mirror with drive_data1.
e.g:
{ 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data1', 'target': '/home/yhong/images/destination1.image', 'sync': 'full'} }
{"return": {}}
{"timestamp": {"seconds": 1517138778, "microseconds": 444429}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}

3.Do block-job-complete with drive_data1.
e.g:
{ "execute": "block-job-complete", "arguments": { "device": "drive_data1"} }
{"return": {}}
{"timestamp": {"seconds": 1517138811, "microseconds": 523421}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}

Actual results:
Guest hung.

Expected results:
Guest could run normally.

Additional info:
It is not reproduced with drive_data2 without iothread.

[root@c155f3-u23 ~]# nc -U /var/tmp/qmp-cmd-monitor-yhong 
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{ 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data2', 'target': '/home/yhong/images/destination2.image', 'sync': 'full'} }
{"return": {}}
{"timestamp": {"seconds": 1517139279, "microseconds": 147175}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive_data2"} }
{"return": {}}
{"timestamp": {"seconds": 1517139287, "microseconds": 306040}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}

[root@c155f3-u23 commands]# sh guest-9332.sh
QEMU 2.10.0 monitor - type 'help' for more information
(qemu) VNC server running on ::1:5900

(qemu) 
(qemu) Formatting '/home/yhong/images/destination2.image', fmt=qcow2 size=5368709120 cluster_size=65536 lazy_refcounts=off refcount_bits=16

(qemu) info block
drive_system (#block117): nbd://10.0.1.8:10086 (qcow2)
    Attached to:      system
    Cache mode:       writeback, direct

drive_data1 (#block306): nbd://10.0.1.8:20001 (qcow2)
    Attached to:      data1
    Cache mode:       writeback, direct

drive_data2 (#block1229): /home/yhong/images/destination2.image (qcow2)
    Attached to:      data2
    Cache mode:       writeback, direct
(qemu) 

Guest run normally.

Comment 1 Yongxue Hong 2018-01-29 06:40:58 UTC
It is also reproduced with blockdev-snapshot-sync and block-stream.
e.g:
[root@c155f3-u23 images]# cat ../commands/guest-85851.sh
/usr/libexec/qemu-kvm \
-name 'guest' \
-machine pseries-rhel7.5.0 \
-m 16G \
-nodefaults \
-smp 8,cores=4,threads=2,sockets=1 \
-boot order=cdn,once=d,menu=off,strict=off  \
-device nec-usb-xhci,id=xhci0 \
-device usb-tablet,id=usb-tablet0 \
-device usb-kbd,id=usb-kbd0 \
-chardev socket,id=qmp_id_qmpmonitor,path=/var/tmp/qmp-cmd-monitor-yhong,server,nowait \
-mon chardev=qmp_id_qmpmonitor,mode=control  \
-chardev socket,id=serial_id_serial,path=/var/tmp/serial-yhong,server,nowait \
-device spapr-vty,reg=0x30000000,chardev=serial_id_serial \
-enable-kvm \
-object iothread,id=iothread0 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
-device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
-device virtio-scsi-pci,id=scsi2,iothread=iothread2 \
-drive file=nbd:10.0.1.8:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \
-device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \
-drive file=/home/yhong/images/data1-15G.qcow2,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \
-device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \
-drive file=nbd:10.0.1.8:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \
-device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \
-netdev tap,id=hostnet0,script=/etc/qemu-ifup \
-device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=40:f2:e9:5d:9c:03 \
-monitor stdio \

[root@c155f3-u23 images]# nc -U /var/tmp/qmp-cmd-monitor-yhong 
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-17.el7)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{ 'execute': 'blockdev-snapshot-sync', 'arguments': { 'device': 'drive_data2','snapshot-file': '/home/yhong/images/sn2', 'format': 'qcow2', 'mode': 'absolute-paths' } }
{"return": {}}
{ 'execute': 'block-stream', 'arguments': { 'device': 'drive_data2', 'speed':1000000000, 'on-error': 'report' } }
{"return": {}}
{"timestamp": {"seconds": 1516539970, "microseconds": 953874}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data2", "len": 5368709120, "offset": 5368709120, "speed": 1000000000, "type": "stream"}}

guest hung

Comment 2 Yongxue Hong 2018-01-29 07:09:21 UTC
The operation of NBD server:
qemu-img create -f qcow2 -o preallocation=falloc rhel-alt-75-ppc64le-virtio-scsi-30G.qcow2 30G
qemu-nbd -f raw  rhel-alt-75-ppc64le-virtio-scsi-30G.qcow2 -p 10086 -t -e 10 &
for i in `seq 0 10`; do qemu-img create -f qcow2 -o preallocation=falloc disk-test$i.5G.qcow2 5G; done
for i in `seq 0 10`; do qemu-nbd -f raw  disk-test$i.5G.qcow2  -p 2000$i -t -e 10 & done

Comment 3 Qunfang Zhang 2018-01-29 07:18:47 UTC
Yongxue,

Does this bug also happen on Power8 and x86? 

Is this a regression? 

Thanks,
Qunfang

Comment 4 Yongxue Hong 2018-01-29 08:45:56 UTC
It is also reproduced on x86_64.

[root@ibm-x3850x5-07 yhong]# cat guest.sh 
/usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults  \
    -vga cirrus  \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
    -device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
    -device virtio-scsi-pci,id=scsi2 \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -device pvpanic,ioport=0x505,id=iduHVO78  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20180129-013808-Mj5o0JZF,path=/var/tmp/seabios,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20180129-013808-Mj5o0JZF,iobase=0x402 \
    -drive file=nbd:10.66.9.109:10086,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_system \
    -device scsi-hd,bus=scsi0.0,drive=drive_system,id=system,bootindex=0 \
    -drive file=nbd:10.66.9.109:20001,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data1 \
    -device scsi-hd,bus=scsi1.0,drive=drive_data1,id=data1 \
    -drive file=nbd:10.66.9.109:20002,format=qcow2,aio=native,if=none,cache=none,media=disk,werror=stop,rerror=stop,id=drive_data2 \
    -device scsi-hd,bus=scsi2.0,drive=drive_data2,id=data2 \
    -device virtio-net-pci,mac=9a:1a:1b:1c:1d:1e,id=idHuzuRL,vectors=4,netdev=idYunEv6,bus=pci.0  \
    -netdev tap,id=idYunEv6,vhost=on \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -m 8192  \
    -smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=d  \
    -enable-kvm \
    -monitor stdio

[root@ibm-x3850x5-07 ~]# nc -U /var/tmp/monitor-qmpmonitor1
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 10, "major": 2}, "package": "(qemu-kvm-rhev-2.10.0-18.el7)"}, "capabilities": []}}
{'execute':'qmp_capabilities'}
{"return": {}}
{ 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_data1', 'target': '/home/yhong/images/destination1.image', 'sync': 'full'} }
{"return": {}}
{"timestamp": {"seconds": 1517215362, "microseconds": 511234}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}
{ "execute": "block-job-complete", "arguments": { "device": "drive_data1"} }
{"return": {}}
{"timestamp": {"seconds": 1517215375, "microseconds": 955169}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive_data1", "len": 5368709120, "offset": 5368709120, "speed": 0, "type": "mirror"}}

Guest hung.

host : 3.10.0-837.el7.x86_64
guest : 3.10.0-837.el7.x86_64
qemu : qemu-kvm-rhev-2.10.0-18.el7.x86_64

Comment 6 Qianqian Zhu 2018-01-29 09:00:24 UTC
Maybe related to Bug 1525827

Comment 7 Yongxue Hong 2018-01-29 09:07:44 UTC
GDB trace:

(gdb) bt
#0  0x00007f97c4a70f0f in ppoll () at /lib64/libc.so.6
#1  0x000055cc9204476b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055cc9204476b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055cc92046435 in aio_poll (ctx=0x55cc950eba40, blocking=blocking@entry=true)
    at util/aio-posix.c:629
#4  0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:62
#5  0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:383
#6  0x000055cc91fd237a in nbd_close (bs=<optimized out>) at block/nbd.c:492
#7  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3170
#8  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3354
#9  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:4347
#10 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3176
#11 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3354
#12 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:4347
#13 0x000055cc91fc8fbe in mirror_exit (job=0x55cc9508b0e0, opaque=0x55cc971418d0) at block/mirror.c:602
#14 0x000055cc91f87769 in block_job_defer_to_main_loop_bh (opaque=0x55cc966938c0) at blockjob.c:965
#15 0x000055cc92043581 in aio_bh_poll (bh=0x55cc9706fc80) at util/async.c:90
#16 0x000055cc92043581 in aio_bh_poll (ctx=ctx@entry=0x55cc950eba40) at util/async.c:118
#17 0x000055cc920462f0 in aio_dispatch (ctx=0x55cc950eba40) at util/aio-posix.c:436
#18 0x000055cc9204345e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
---Type <return> to continue, or q <return> to quit---
#19 0x00007f97c65f38f9 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#20 0x000055cc9204558c in main_loop_wait () at util/main-loop.c:213
#21 0x000055cc9204558c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
#22 0x000055cc9204558c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
#23 0x000055cc91d258da in main () at vl.c:1937
#24 0x000055cc91d258da in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4828
(gdb) bt full
#0  0x00007f97c4a70f0f in ppoll () at /lib64/libc.so.6
#1  0x000055cc9204476b in qemu_poll_ns (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
#2  0x000055cc9204476b in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at util/qemu-timer.c:322
#3  0x000055cc92046435 in aio_poll (ctx=0x55cc950eba40, blocking=blocking@entry=true)
    at util/aio-posix.c:629
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = -1
        start = 0
        __PRETTY_FUNCTION__ = "aio_poll"
#4  0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:62
        waited_ = true
        busy_ = true
        bs_ = 0x55cc954b8000
        ctx_ = 0x55cc950ebcc0
        client = 0x55cc9514c1c0
        client = <optimized out>
        request = {handle = 0, from = 0, len = 0, flags = 0, type = 2}
#5  0x000055cc91fd39fa in nbd_client_close (bs=0x55cc954b8000) at block/nbd-client.c:383
---Type <return> to continue, or q <return> to quit---
        client = <optimized out>
        request = {handle = 0, from = 0, len = 0, flags = 0, type = 2}
#6  0x000055cc91fd237a in nbd_close (bs=<optimized out>) at block/nbd.c:492
        s = 0x55cc9514c1c0
#7  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3170
        child = <optimized out>
        next = <optimized out>
        ban = <optimized out>
        ban_next = <optimized out>
#8  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:3354
#9  0x000055cc91f848a6 in bdrv_unref (bs=0x55cc954b8000) at block.c:4347
#10 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3176
        child = <optimized out>
        next = <optimized out>
        ban = <optimized out>
        ban_next = <optimized out>
#11 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:3354
#12 0x000055cc91f848d3 in bdrv_unref (bs=0x55cc9513a800) at block.c:4347
#13 0x000055cc91fc8fbe in mirror_exit (job=0x55cc9508b0e0, opaque=0x55cc971418d0) at block/mirror.c:602
        data = 0x55cc971418d0
        replace_aio_context = 0x0
        src = 0x55cc9513a800
        target_bs = 0x55cc96e00000
---Type <return> to continue, or q <return> to quit---
        mirror_top_bs = 0x55cc96e06800
        local_err = 0x0
#14 0x000055cc91f87769 in block_job_defer_to_main_loop_bh (opaque=0x55cc966938c0) at blockjob.c:965
        data = 0x55cc966938c0
        aio_context = 0x55cc950ebcc0
#15 0x000055cc92043581 in aio_bh_poll (bh=0x55cc9706fc80) at util/async.c:90
        bh = <optimized out>
        bhp = <optimized out>
        next = 0x55cc9706f290
        ret = 1
        deleted = false
#16 0x000055cc92043581 in aio_bh_poll (ctx=ctx@entry=0x55cc950eba40) at util/async.c:118
        bh = <optimized out>
        bhp = <optimized out>
        next = 0x55cc9706f290
        ret = 1
        deleted = false
#17 0x000055cc920462f0 in aio_dispatch (ctx=0x55cc950eba40) at util/aio-posix.c:436
#18 0x000055cc9204345e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
        ctx = <optimized out>
#19 0x00007f97c65f38f9 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#20 0x000055cc9204558c in main_loop_wait () at util/main-loop.c:213
---Type <return> to continue, or q <return> to quit---
        context = 0x55cc950baa50
        pfds = <optimized out>
        context = 0x55cc950baa50
        ret = 1
        spin_counter = 0
        ret = 1
        timeout = 4294967295
        timeout_ns = <optimized out>
#21 0x000055cc9204558c in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261
        context = 0x55cc950baa50
        ret = 1
        spin_counter = 0
        ret = 1
        timeout = 4294967295
        timeout_ns = <optimized out>
#22 0x000055cc9204558c in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:515
        ret = 1
        timeout = 4294967295
        timeout_ns = <optimized out>
#23 0x000055cc91d258da in main () at vl.c:1937
        i = <optimized out>
        snapshot = <optimized out>
        linux_boot = <optimized out>
---Type <return> to continue, or q <return> to quit---
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = 0x55cc95026aa8 "d"
        cyls = <optimized out>
        heads = <optimized out>
        secs = <optimized out>
        translation = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
        olist = <optimized out>
        optind = 67
        optarg = 0x7ffc4d54c4d5 "stdio"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = 0x7ffc4d54be62 "cirrus"
        qtest_chrdev = <optimized out>
        qtest_log = <optimized out>
---Type <return> to continue, or q <return> to quit---
        pid_file = <optimized out>
        incoming = <optimized out>
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = <optimized out>
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffc4d54b050}
        __func__ = "main"
        __FUNCTION__ = "main"
#24 0x000055cc91d258da in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at vl.c:4828
        i = <optimized out>
        snapshot = <optimized out>
---Type <return> to continue, or q <return> to quit---
        linux_boot = <optimized out>
        initrd_filename = <optimized out>
        kernel_filename = <optimized out>
        kernel_cmdline = <optimized out>
        boot_order = <optimized out>
        boot_once = 0x55cc95026aa8 "d"
        cyls = <optimized out>
        heads = <optimized out>
        secs = <optimized out>
        translation = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        hda_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = <optimized out>
        olist = <optimized out>
        optind = 67
        optarg = 0x7ffc4d54c4d5 "stdio"
        loadvm = <optimized out>
        machine_class = 0x0
        cpu_model = <optimized out>
        vga_model = 0x7ffc4d54be62 "cirrus"
        qtest_chrdev = <optimized out>
---Type <return> to continue, or q <return> to quit---
        qtest_log = <optimized out>
        pid_file = <optimized out>
        incoming = <optimized out>
        defconfig = <optimized out>
        userconfig = <optimized out>
        nographic = <optimized out>
        display_type = <optimized out>
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = <optimized out>
        vmstate_dump_file = <optimized out>
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffc4d54b050}
        __func__ = "main"
        __FUNCTION__ = "main"

Comment 10 aihua liang 2018-12-18 08:33:38 UTC
Test on both qemu-kvm-rhev-2.12.0-20.el7.x86_64 and qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64, also hit this issue.
 
Test env:
   kernel version:3.10.0-957.1.2.el7.x86_64
   qemu-kvm-rhev version:qemu-kvm-rhev-2.12.0-20.el7.x86_64
                        /qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64

Test steps:
  1. Host1: Start ndb server:
      qemu-nbd rhel76-64-virtio-scsi.qcow2 -f raw -p 9000 -t
   
  2. Host2: Start guest with data plane enabled and nbd backend:
       /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20181107-005924-PkIxnG9p,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20181107-005924-PkIxnG9p,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idkp9HYI  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20181107-005924-PkIxnG9p,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20181107-005924-PkIxnG9p,path=/var/tmp/seabios-20181107-005924-PkIxnG9p,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20181107-005924-PkIxnG9p,iobase=0x402 \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.7,multifunction=on,bus=pci.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0,firstport=0,bus=pci.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.2,firstport=2,bus=pci.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.4,firstport=4,bus=pci.0 \
    -device virtio-net-pci,mac=9a:44:45:46:47:48,id=iddDGLIi,vectors=4,netdev=idDdrbRp,bus=pci.0,addr=0x7  \
    -netdev tap,id=idDdrbRp,vhost=on \
    -m 2048  \
    -smp 10,maxcpus=10,cores=5,threads=1,sockets=2  \
    -cpu host \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
     -rtc base=localtime,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -device virtio-scsi-pci,id=scsi0,iothread=iothread0 \
    -device virtio-scsi-pci,id=scsi1,iothread=iothread1 \
    -device virtio-scsi-pci,id=scsi2 \
    -drive if=none,id=drive_image1,aio=threads,cache=none,format=qcow2,werror=stop,rerror=stop,file=nbd:10.66.144.42:9000 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \

     
  3. Host2: Do block stream
      { 'execute': 'drive-mirror', 'arguments': { 'device': 'drive_image1', 'target': '/home/destination1.image', 'sync': 'full'} }
 
  --->{"timestamp": {"seconds": 1545120961, "microseconds": 574056}, "event": "BLOCK_JOB_READY", "data": {"device": "drive_image1", "len": 4769185792, "offset": 4769185792, "speed": 0, "type": "mirror"}}

  
  4. Host2: After block job reach "ready" status, complete the block job.
     { "execute": "block-job-complete", "arguments": { "device": "drive_image1"} }

Actual Result:
  After step4, guest hang.

Pstack info:
 # pstack 11057
Thread 16 (Thread 0x7f8e4cb50700 (LWP 11058)):
#0  0x00007f8e5431b1c9 in syscall () at /lib64/libc.so.6
#1  0x0000559155d62c10 in qemu_event_wait ()
#2  0x0000559155d7313e in call_rcu_thread ()
#3  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#4  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 15 (Thread 0x7f8e4c34f700 (LWP 11059)):
#0  0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1  0x0000559155d5ec0b in qemu_poll_ns ()
#2  0x0000559155d60987 in aio_poll ()
#3  0x0000559155b3049e in iothread_run ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 14 (Thread 0x7f8e4bb4e700 (LWP 11060)):
#0  0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1  0x0000559155d5ec0b in qemu_poll_ns ()
#2  0x0000559155d60987 in aio_poll ()
#3  0x0000559155b3049e in iothread_run ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 13 (Thread 0x7f8e4b24c700 (LWP 11077)):
#0  0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1  0x0000559155d5ec0b in qemu_poll_ns ()
#2  0x0000559155d60987 in aio_poll ()
#3  0x0000559155b3049e in iothread_run ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 12 (Thread 0x7f8e4aa4b700 (LWP 11078)):
#0  0x00007f8e545fe4ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f8e545f9dcb in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f8e545f9c98 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000559155d62539 in qemu_mutex_lock_impl ()
#4  0x0000559155a455f8 in qemu_mutex_lock_iothread ()
#5  0x0000559155a0866d in prepare_mmio_access.isra.24 ()
#6  0x0000559155a0cd08 in flatview_read_continue ()
#7  0x0000559155a0ceec in flatview_read ()
#8  0x0000559155a0d00f in address_space_read_full ()
#9  0x0000559155a683aa in kvm_cpu_exec ()
#10 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#11 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#12 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 11 (Thread 0x7f8e4a24a700 (LWP 11079)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 10 (Thread 0x7f8e49a49700 (LWP 11080)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 9 (Thread 0x7f8e49248700 (LWP 11081)):
#0  0x00007f8e545fe4ed in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007f8e545f9dcb in _L_lock_883 () at /lib64/libpthread.so.0
#2  0x00007f8e545f9c98 in pthread_mutex_lock () at /lib64/libpthread.so.0
#3  0x0000559155d62539 in qemu_mutex_lock_impl ()
#4  0x0000559155a455f8 in qemu_mutex_lock_iothread ()
#5  0x0000559155a0866d in prepare_mmio_access.isra.24 ()
#6  0x0000559155a0cd08 in flatview_read_continue ()
#7  0x0000559155a0ceec in flatview_read ()
#8  0x0000559155a0d00f in address_space_read_full ()
#9  0x0000559155a683aa in kvm_cpu_exec ()
#10 0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#11 0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#12 0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 8 (Thread 0x7f8e48a47700 (LWP 11082)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 7 (Thread 0x7f8e48246700 (LWP 11083)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 6 (Thread 0x7f8e47a45700 (LWP 11084)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 5 (Thread 0x7f8e47244700 (LWP 11085)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 4 (Thread 0x7f8e46a43700 (LWP 11086)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 3 (Thread 0x7f8e46242700 (LWP 11087)):
#0  0x00007f8e543178d7 in ioctl () at /lib64/libc.so.6
#1  0x0000559155a68165 in kvm_vcpu_ioctl ()
#2  0x0000559155a68233 in kvm_cpu_exec ()
#3  0x0000559155a457a6 in qemu_kvm_cpu_thread_fn ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 2 (Thread 0x7f8dc3fff700 (LWP 11091)):
#0  0x00007f8e545fb965 in pthread_cond_wait@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
#1  0x0000559155d627e9 in qemu_cond_wait_impl ()
#2  0x0000559155c7d28f in vnc_worker_thread_loop ()
#3  0x0000559155c7d858 in vnc_worker_thread ()
#4  0x00007f8e545f7dd5 in start_thread () at /lib64/libpthread.so.0
#5  0x00007f8e54320ead in clone () at /lib64/libc.so.6
Thread 1 (Thread 0x7f8e6da08dc0 (LWP 11057)):
#0  0x00007f8e543162cf in ppoll () at /lib64/libc.so.6
#1  0x0000559155d5ec0b in qemu_poll_ns ()
#2  0x0000559155d60987 in aio_poll ()
#3  0x0000559155ce3a6a in nbd_client_close ()
#4  0x0000559155ce139a in nbd_close ()
#5  0x0000559155c8a3f2 in bdrv_unref ()
#6  0x0000559155c8a41f in bdrv_unref ()
#7  0x0000559155cd3f1d in mirror_exit_common ()
#8  0x0000559155c8f952 in job_do_finalize ()
#9  0x0000559155c8fc10 in job_exit ()
#10 0x0000559155d5d771 in aio_bh_poll ()
#11 0x0000559155d60820 in aio_dispatch ()
#12 0x0000559155d5d64e in aio_ctx_dispatch ()
#13 0x00007f8e6d10d049 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#14 0x0000559155d5fb17 in main_loop_wait ()
#15 0x0000559155a01de7 in main ()

Comment 12 Sergio Lopez 2019-09-02 07:58:40 UTC

*** This bug has been marked as a duplicate of bug 1634219 ***


Note You need to log in before you can comment on or make changes to this bug.