Bug 1748253

Summary: QEMU crashes (core dump) when using the integrated NDB server with data-plane
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: aihua liang <aliang>
Component: qemu-kvmAssignee: Sergio Lopez <slopezpa>
Status: CLOSED ERRATA QA Contact: aihua liang <aliang>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: coli, eblake, jferlan, jinzhao, jsnow, juzhang, mtessun, qzhang, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-06 07:19:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1741186    

Description aihua liang 2019-09-03 08:34:28 UTC
Description of problem:
  Both src and dst qemu coredump when do storage vm migration on non-shared storage

Version-Release number of selected component (if applicable):
 kernel version: 4.18.0-131.el8.x86_64
 qemu-kvm version: qemu-kvm-4.1.0-5.module+el8.1.0+4076+b5e41ebc.x86_64

How reproducible:
 100%

Steps to Reproduce:
1.Start src guest with qemu cmds:
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/monitor-qmp1-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idWcUIuL  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190522-203214-pO8ikKhP,path=/var/tmp/seabios-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190522-203214-pO8ikKhP,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -blockdev node-name=file_node,driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,file=file_node \
    -device virtio-scsi-pci,iothread=iothread0,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x6,iothread=iothread1 \
    -blockdev node-name=file_data,driver=file,filename=/home/data1.qcow2 \
    -blockdev node-name=drive_data1,driver=qcow2,file=file_data \
    -device scsi-hd,drive=drive_data1,id=data1,bus=virtio_scsi_pci1.0,scsi-id=0,lun=1,channel=0,werror=stop,rerror=stop \
    -device virtio-net-pci,mac=9a:f4:f5:f6:f7:f8,id=idJqoo3m,vectors=4,netdev=idiujahB,bus=pci.0,addr=0x5  \
    -netdev tap,id=idiujahB,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \

2. Create an empty data disk and start dst guest with cmds:
    #qemu-img create -f qcow2 data2.qcow2 2G
    /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/monitor-qmp1-20190522-203214-pO8ikKhQ,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idWcUIuL  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190522-203214-pO8ikKhP,path=/var/tmp/seabios-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190522-203214-pO8ikKhP,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -blockdev node-name=file_node,driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,file=file_node \
    -device virtio-scsi-pci,iothread=iothread0,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x6,iothread=iothread1 \
    -blockdev node-name=file_data,driver=file,filename=/home/data2.qcow2 \
    -blockdev node-name=drive_data1,driver=qcow2,file=file_data \
    -device scsi-hd,drive=drive_data1,id=data1,bus=virtio_scsi_pci1.0,scsi-id=0,lun=1,channel=0,werror=stop,rerror=stop \
    -device virtio-net-pci,mac=9a:f4:f5:f6:f7:f8,id=idJqoo3m,vectors=4,netdev=idiujahB,bus=pci.0,addr=0x5  \
    -netdev tap,id=idiujahB,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \
    -incoming tcp:0:5000 \

3. Set migration capabilities in both src and dst.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true},{"capability":"pause-before-switchover","state":true}]}}

4. In dst, start ndb server, and expose data disk.
    { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.73.83", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments":{ "device": "drive_data1", "writable": true } }
{"return": {}}

5. In src, Add bitmap to data disk and add a new file
     { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name": "bitmap0"}}
    (guest)#dd if=/dev/urandom of=test bs=1M coutn=1000

6. In src, add the target device and do block mirror.
     {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.73.83","port":"3333"},"export":"drive_data1"}}
{"return": {}}
     {"execute": "blockdev-mirror", "arguments": { "device": "drive_data1","target": "mirror", "sync": "full", "job-id":"j1"}}
{"timestamp": {"seconds": 1567498785, "microseconds": 660915}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1567498785, "microseconds": 661001}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1567498786, "microseconds": 97466}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 99649}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 99698}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 99735}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 100230}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 101884}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 101930}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 104763}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 106887}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 109073}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 111419}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 113681}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 113839}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 113886}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 113934}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 113952}, "event": "BLOCK_JOB_ERROR", "data": {"device": "j1", "operation": "write", "action": "report"}}
{"timestamp": {"seconds": 1567498786, "microseconds": 114124}, "event": "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "j1"}}

Actual results:
 After step6, dst vm coredump, and src vm hang at first, then some minutes later, src vm also coredump.
 
In dst:
    (qemu) qemu-kvm: io/channel.c:411: qio_channel_restart_read: Assertion `qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)' failed.
bug_dst.txt: line 38: 17414 Aborted                 (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -machine pc -nodefaults ...
    (gdb) bt full
#0  0x00007fe16d02a8df in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = 
            {__val = {268444224, 94579003460096, 94579001339488, 140606173531095, 4222451712, 94579003460096, 94579003460197, 94579003460096, 94579003460096, 94579003460239, 94579003460396, 94579003460096, 94579003460396, 0, 0, 0}}
        pid = <optimized out>
        tid = <optimized out>
#1  0x00007fe16d014cf5 in __GI_abort () at abort.c:79
        save_stage = 1
        act = 
          {__sigaction_handler = {sa_handler = 0x5604e47d1da0, sa_sigaction = 0x5604e47d1da0}, sa_mask = {__val = {0, 94579003460096, 2, 0, 0, 0, 140736250749496, 21474836480, 140736250749344, 140606174654256, 140606174638848, 0, 4755299632365244928, 140606174623802, 0, 140606174638848}}, sa_flags = -510644346, sa_restorer = 0x5604e1902fc0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007fe16d014bc9 in __assert_fail_base
    (fmt=0x7fe16d17b300 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5604e1902fc0 "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)", file=0x5604e1902f86 "io/channel.c", line=411, function=<optimized out>) at assert.c:92
        str = 0x5604e47d1da0 "`\307\216\344\004V"
        total = 4096
#3  0x00007fe16d022e96 in __GI___assert_fail
    (assertion=assertion@entry=0x5604e1902fc0 "qemu_get_current_aio_context() == qemu_coroutine_get_aio_context(co)", file=file@entry=0x5604e1902f86 "io/channel.c", line=line@entry=411, function=function@entry=0x5604e19030d0 <__PRETTY_FUNCTION__.18094> "qio_channel_restart_read")
    at assert.c:101
#4  0x00005604e170e8c0 in qio_channel_restart_read (opaque=<optimized out>) at io/channel.c:410
        __PRETTY_FUNCTION__ = "qio_channel_restart_read"
        ioc = <optimized out>
        co = 0x5604e3e78050
        __PRETTY_FUNCTION__ = "qio_channel_restart_read"
#5  0x00005604e170e8c0 in qio_channel_restart_read (opaque=<optimized out>) at io/channel.c:404
        ioc = <optimized out>
        co = 0x5604e3e78050
        __PRETTY_FUNCTION__ = "qio_channel_restart_read"
#6  0x00005604e17626c2 in aio_dispatch_handlers (ctx=ctx@entry=0x5604e3beb6d0) at util/aio-posix.c:429
        revents = <optimized out>
        node = 0x7fe15c0052e0
        tmp = 0x5604e3c72e60
        progress = false
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x00005604e176306c in aio_dispatch (ctx=0x5604e3beb6d0) at util/aio-posix.c:460
#8  0x00005604e175fb02 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
        ctx = <optimized out>
        __PRETTY_FUNCTION__ = "aio_ctx_dispatch"
#9  0x00007fe171ea667d in g_main_dispatch (context=0x5604e3c74250) at gmain.c:3176
        dispatch = 0x5604e175faf0 <aio_ctx_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x0
        callback = 0x0
        cb_funcs = 0x0
        cb_data = 0x0
        need_destroy = <optimized out>
        source = 0x5604e3beb6d0
        current = 0x5604e3c6a6c0
        i = 0
        __func__ = "g_main_dispatch"
#10 0x00007fe171ea667d in g_main_context_dispatch (context=context@entry=0x5604e3c74250) at gmain.c:3829
#11 0x00005604e1762118 in glib_pollfds_poll () at util/main-loop.c:218
        context = 0x5604e3c74250
        pfds = <optimized out>
        context = 0x5604e3c74250
        ret = 1
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x5604e3c73410}
        ret = <optimized out>
        timeout_ns = <optimized out>
#12 0x00005604e1762118 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:241
        context = 0x5604e3c74250
        ret = 1
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x5604e3c73410}
        ret = <optimized out>
        timeout_ns = <optimized out>
#13 0x00005604e1762118 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x5604e3c73410}
        ret = <optimized out>
        timeout_ns = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#14 0x00005604e154b169 in main_loop () at vl.c:1809
#15 0x00005604e13fafd3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4506
        i = <optimized out>
        snapshot = 0
        linux_boot = <optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = 0x5604e3bf3940 "cdn"
        boot_once = <optimized out>
        ds = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = 0x0
        olist = <optimized out>
        optind = 71
        optarg = 0x7fffb63bb4f1 "tcp:0:5000"
        loadvm = 0x0
        machine_class = 0x5604e3c2e300
        cpu_option = 0x7fffb63bb420 "Skylake-Client,+kvm_pv_unhalt"
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        incoming = 0x7fffb63bb4f1 "tcp:0:5000"
        userconfig = <optimized out>
        nographic = false
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
--Type <RET> for more, q to quit, c to continue without paging--
        list_data_dirs = false
        dir = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7fffb63ba580}
        __func__ = "main"
  
In src:
   (qemu) bug.txt: line 37: 17308 Segmentation fault      (core dumped) /usr/libexec/qemu-kvm -name 'avocado-vt-vm1' -machine pc -nodefaults

    (gdb) bt full
#0  0x000055fb108dbb0c in qemu_aio_coroutine_enter (ctx=0x55fb12298ec0, co=0x0) at util/qemu-coroutine.c:107
        pending = {sqh_first = 0x55fb12298ec0, sqh_last = 0x55fb122ce750}
        from = <optimized out>
        __func__ = "qemu_aio_coroutine_enter"
        __PRETTY_FUNCTION__ = "qemu_aio_coroutine_enter"
#1  0x000055fb1084ee4c in nbd_client_attach_aio_context_bh (opaque=0x55fb12e921b0) at block/nbd.c:107
        bs = 0x55fb12e921b0
        s = <optimized out>
#2  0x000055fb108c441f in aio_wait_bh (opaque=0x7ffd95ab25d0) at util/aio-wait.c:55
        data = 0x7ffd95ab25d0
#3  0x000055fb108c3c26 in aio_bh_call (bh=0x55fb12e18a20) at util/async.c:117
        bh = 0x55fb12e18a20
        bhp = <optimized out>
        next = 0x55fb133eb440
        ret = 1
        deleted = false
#4  0x000055fb108c3c26 in aio_bh_poll (ctx=ctx@entry=0x55fb12298ec0) at util/async.c:117
        bh = 0x55fb12e18a20
        bhp = <optimized out>
        next = 0x55fb133eb440
        ret = 1
        deleted = false
#5  0x000055fb108c7354 in aio_poll (ctx=ctx@entry=0x55fb12298ec0, blocking=blocking@entry=true) at util/aio-posix.c:728
        node = <optimized out>
        i = <optimized out>
        ret = 0
        progress = <optimized out>
        timeout = 0
        start = <optimized out>
        __PRETTY_FUNCTION__ = "aio_poll"
#6  0x000055fb108c454d in aio_wait_bh_oneshot (ctx=0x55fb12298ec0, cb=<optimized out>, opaque=<optimized out>) at util/aio-wait.c:71
        waited_ = <optimized out>
        wait_ = 0x55fb11159abc <global_aio_wait>
        ctx_ = 0x55fb12298ec0
        data = {done = false, cb = 0x55fb1084ee30 <nbd_client_attach_aio_context_bh>, opaque = 0x55fb12e921b0}
        __PRETTY_FUNCTION__ = "aio_wait_bh_oneshot"
--Type <RET> for more, q to quit, c to continue without paging--
#7  0x000055fb108006e8 in bdrv_attach_aio_context (new_context=0x55fb12298ec0, bs=0x55fb12e921b0) at block.c:5898
        ban = <optimized out>
        ban_tmp = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_attach_aio_context"
        old_context = 0x55fb122aa7d0
        child = <optimized out>
        __func__ = "bdrv_set_aio_context_ignore"
        __PRETTY_FUNCTION__ = "bdrv_set_aio_context_ignore"
#8  0x000055fb108006e8 in bdrv_set_aio_context_ignore
    (bs=bs@entry=0x55fb12e921b0, new_context=new_context@entry=0x55fb12298ec0, ignore=ignore@entry=0x7ffd95ab2640) at block.c:5963
        old_context = 0x55fb122aa7d0
        child = <optimized out>
        __func__ = "bdrv_set_aio_context_ignore"
        __PRETTY_FUNCTION__ = "bdrv_set_aio_context_ignore"
#9  0x000055fb10800b33 in bdrv_child_try_set_aio_context
    (bs=bs@entry=0x55fb12e921b0, ctx=0x55fb12298ec0, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x0) at block.c:6058
        ignore = 0x0
        ret = true
#10 0x000055fb10800c66 in bdrv_try_set_aio_context (errp=0x0, ctx=<optimized out>, bs=0x55fb12e921b0) at block.c:6067
        tighten_restrictions = false
        ret = <optimized out>
        old_bs = 0x55fb12e921b0
        perm = 0
        shared_perm = 31
        __PRETTY_FUNCTION__ = "bdrv_replace_child"
#11 0x000055fb10800c66 in bdrv_replace_child (child=child@entry=0x55fb124ef800, new_bs=new_bs@entry=0x0) at block.c:2338
        tighten_restrictions = false
        ret = <optimized out>
        old_bs = 0x55fb12e921b0
        perm = 0
        shared_perm = 31
        __PRETTY_FUNCTION__ = "bdrv_replace_child"
#12 0x000055fb10800ce7 in bdrv_detach_child (child=0x55fb124ef800) at block.c:2456
#13 0x000055fb1080159d in bdrv_root_unref_child (child=<optimized out>) at block.c:2467
        child_bs = 0x55fb12e921b0
#14 0x000055fb10806b4c in block_job_remove_all_bdrv (job=job@entry=0x55fb12f12f90) at blockjob.c:193
--Type <RET> for more, q to quit, c to continue without paging--
        c = 0x55fb124ef800
        l = 0x55fb12290aa0 = {0x55fb124ef800, 0x55fb12842c00, 0x55fb12a4b590}
#15 0x000055fb1083fccb in mirror_exit_common (job=0x55fb12f12f90) at block/mirror.c:714
        s = 0x55fb12f12f90
        bjob = 0x55fb12f12f90
        bs_opaque = 0x55fb12e3add0
        replace_aio_context = <optimized out>
        src = 0x55fb122d4a30
        target_bs = 0x55fb12e921b0
        mirror_top_bs = 0x55fb12d7d670
        local_err = 0x0
        abort = <optimized out>
        ret = 0
        __PRETTY_FUNCTION__ = "mirror_exit_common"
#16 0x000055fb108409bd in mirror_abort (job=<optimized out>) at block/mirror.c:741
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "mirror_abort"
#17 0x000055fb1080805b in job_abort (job=<optimized out>) at job.c:659
        __PRETTY_FUNCTION__ = "job_finalize_single"
        __PRETTY_FUNCTION__ = "job_finalize_single"
#18 0x000055fb1080805b in job_finalize_single (job=0x55fb12f12f90) at job.c:680
        __PRETTY_FUNCTION__ = "job_finalize_single"
        __PRETTY_FUNCTION__ = "job_finalize_single"
#19 0x000055fb1080805b in job_finalize_single (job=0x55fb12f12f90) at job.c:670
        __PRETTY_FUNCTION__ = "job_finalize_single"
#20 0x000055fb10808aaa in job_completed_txn_abort (job=<optimized out>) at job.c:758
        outer_ctx = <optimized out>
        ctx = 0x55fb122aa7d0
        txn = 0x55fb124ee320
        other_job = 0x55fb12f12f90
        __PRETTY_FUNCTION__ = "job_completed_txn_abort"
#21 0x000055fb10808df4 in job_exit (opaque=0x55fb12f12f90) at job.c:873
        job = 0x55fb12f12f90
        ctx = 0x55fb122aa7d0
#22 0x000055fb108c3c26 in aio_bh_call (bh=0x55fb133eb440) at util/async.c:117
        bh = 0x55fb133eb440
--Type <RET> for more, q to quit, c to continue without paging--
        bhp = <optimized out>
        next = 0x55fb135da020
        ret = 1
        deleted = false
#23 0x000055fb108c3c26 in aio_bh_poll (ctx=ctx@entry=0x55fb12298ec0) at util/async.c:117
        bh = 0x55fb133eb440
        bhp = <optimized out>
        next = 0x55fb135da020
        ret = 1
        deleted = false
#24 0x000055fb108c7064 in aio_dispatch (ctx=0x55fb12298ec0) at util/aio-posix.c:459
#25 0x000055fb108c3b02 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260
        ctx = <optimized out>
        __PRETTY_FUNCTION__ = "aio_ctx_dispatch"
#26 0x00007efe1fac467d in g_main_dispatch (context=0x55fb1229a250) at gmain.c:3176
        dispatch = 0x55fb108c3af0 <aio_ctx_dispatch>
        prev_source = 0x0
        was_in_call = 0
        user_data = 0x0
        callback = 0x0
        cb_funcs = 0x0
        cb_data = 0x0
        need_destroy = <optimized out>
        source = 0x55fb12298ec0
        current = 0x55fb12290a40
        i = 0
        __func__ = "g_main_dispatch"
#27 0x00007efe1fac467d in g_main_context_dispatch (context=context@entry=0x55fb1229a250) at gmain.c:3829
#28 0x000055fb108c6118 in glib_pollfds_poll () at util/main-loop.c:218
        context = 0x55fb1229a250
        pfds = <optimized out>
        context = 0x55fb1229a250
        ret = 1
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x55fb12299410}
        ret = <optimized out>
        timeout_ns = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
#29 0x000055fb108c6118 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:241
        context = 0x55fb1229a250
        ret = 1
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x55fb12299410}
        ret = <optimized out>
        timeout_ns = <optimized out>
#30 0x000055fb108c6118 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:517
        mlpoll = {state = 0, timeout = 4294967295, pollfds = 0x55fb12299410}
        ret = <optimized out>
        timeout_ns = <optimized out>
#31 0x000055fb106af169 in main_loop () at vl.c:1809
#32 0x000055fb1055efd3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4506
        i = <optimized out>
        snapshot = 0
        linux_boot = <optimized out>
        initrd_filename = 0x0
        kernel_filename = 0x0
        kernel_cmdline = <optimized out>
        boot_order = 0x55fb12219940 "cdn"
        boot_once = <optimized out>
        ds = <optimized out>
        opts = <optimized out>
        machine_opts = <optimized out>
        icount_opts = <optimized out>
        accel_opts = 0x0
        olist = <optimized out>
        optind = 69
        optarg = 0x7ffd95ab44e3 "stdio"
        loadvm = 0x0
        machine_class = 0x55fb12254300
        cpu_option = 0x7ffd95ab4422 "Skylake-Client,+kvm_pv_unhalt"
        vga_model = 0x0
        qtest_chrdev = 0x0
        qtest_log = 0x0
        incoming = 0x0
        userconfig = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        nographic = false
        display_remote = <optimized out>
        log_mask = <optimized out>
        log_file = <optimized out>
        trace_file = <optimized out>
        maxram_size = <optimized out>
        ram_slots = 0
        vmstate_dump_file = 0x0
        main_loop_err = 0x0
        err = 0x0
        list_data_dirs = false
        dir = <optimized out>
        bdo_queue = {sqh_first = 0x0, sqh_last = 0x7ffd95ab2a40}
        __func__ = "main"

Expected results:
 Storage vm migration on non-shared storage can also be executed successfully.

Additional info:
 Will add the coredump file later.

Comment 2 aihua liang 2019-09-03 10:20:47 UTC
Hi, John

  I hit this bug when handle needinfo of https://bugzilla.redhat.com/show_bug.cgi?id=1652424, can you help to check it?

Thanks,
aliang

Comment 4 aihua liang 2019-09-04 03:24:17 UTC
Only virtio_scsi + data plane hit this issue.

Comment 5 aihua liang 2019-09-04 03:36:14 UTC
*** Bug 1717329 has been marked as a duplicate of this bug. ***

Comment 6 aihua liang 2019-09-04 06:57:20 UTC
Refer to 1717329, cancel the needinfo to jsnow.

Comment 7 John Ferlan 2019-09-11 18:27:05 UTC
Patch posted upstream: https://lists.gnu.org/archive/html/qemu-block/2019-09/msg00481.html

Comment 8 Sergio Lopez 2019-09-12 11:16:24 UTC
The actual issue is that enabling the integrated NDB server, if configured with a BlockBackend that uses an iothread (data-plane), crashes after negotiating with the client. The root cause is that we weren't setting the QIOChannel AioContext to the same one as the export's.

Comment 17 aihua liang 2019-09-29 03:36:35 UTC
Test on qemu-kvm-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64, the issue has been resolved, set bug's status to "Verified".

Test Steps:
 1.Start src guest with qemu cmds:
   /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/monitor-qmp1-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idWcUIuL  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190522-203214-pO8ikKhP,path=/var/tmp/seabios-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190522-203214-pO8ikKhP,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -blockdev node-name=file_node,driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,file=file_node \
    -device virtio-scsi-pci,iothread=iothread0,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x6,iothread=iothread1 \
    -blockdev node-name=file_data,driver=file,filename=/home/data1.qcow2 \
    -blockdev node-name=drive_data1,driver=qcow2,file=file_data \
    -device scsi-hd,drive=drive_data1,id=data1,bus=virtio_scsi_pci1.0,scsi-id=0,lun=1,channel=0,werror=stop,rerror=stop \
    -device virtio-net-pci,mac=9a:f4:f5:f6:f7:f8,id=idJqoo3m,vectors=4,netdev=idiujahB,bus=pci.0,addr=0x5  \
    -netdev tap,id=idiujahB,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \

2. Create an empty data disk and start dst guest with cmds:
    #qemu-img create -f qcow2 data2.qcow2 2G
    /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine pc  \
    -nodefaults \
    -device VGA,bus=pci.0,addr=0x2  \
    -chardev socket,id=qmp_id_qmp1,path=/var/tmp/monitor-qmp1-20190522-203214-pO8ikKhQ,server,nowait \
    -mon chardev=qmp_id_qmp1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20190522-203214-pO8ikKhP,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=idWcUIuL  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/serial-serial0-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-serial,chardev=serial_id_serial0  \
    -chardev socket,id=seabioslog_id_20190522-203214-pO8ikKhP,path=/var/tmp/seabios-20190522-203214-pO8ikKhP,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20190522-203214-pO8ikKhP,iobase=0x402 \
    -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x3 \
    -object iothread,id=iothread0 \
    -object iothread,id=iothread1 \
    -blockdev node-name=file_node,driver=file,filename=/home/kvm_autotest_root/images/rhel810-64-virtio-scsi.qcow2 \
    -blockdev node-name=drive_image1,driver=qcow2,file=file_node \
    -device virtio-scsi-pci,iothread=iothread0,id=virtio_scsi_pci0,bus=pci.0,addr=0x4 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-scsi-pci,id=virtio_scsi_pci1,bus=pci.0,addr=0x6,iothread=iothread1 \
    -blockdev node-name=file_data,driver=file,filename=/home/data2.qcow2 \
    -blockdev node-name=drive_data1,driver=qcow2,file=file_data \
    -device scsi-hd,drive=drive_data1,id=data1,bus=virtio_scsi_pci1.0,scsi-id=0,lun=1,channel=0,werror=stop,rerror=stop \
    -device virtio-net-pci,mac=9a:f4:f5:f6:f7:f8,id=idJqoo3m,vectors=4,netdev=idiujahB,bus=pci.0,addr=0x5  \
    -netdev tap,id=idiujahB,vhost=on \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot menu=off,strict=off,order=cdn,once=c \
    -enable-kvm \
    -monitor stdio \
    -incoming tcp:0:5000 \

3. Set migration capabilities in both src and dst.
    {"execute":"migrate-set-capabilities","arguments":{"capabilities":[{"capability":"events","state":true},{"capability":"dirty-bitmaps","state":true},{"capability":"pause-before-switchover","state":true}]}}

4. In dst, start ndb server, and expose data disk.
    { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.73.83", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments":{ "device": "drive_data1", "writable": true } }
{"return": {}}

5. In src, Add bitmap to data disk and add a new file
     { "execute": "block-dirty-bitmap-add", "arguments": {"node": "drive_data1", "name": "bitmap0"}}
    (guest)#dd if=/dev/urandom of=test bs=1M coutn=1000

6. In src, add the target device and do block mirror.
     {"execute":"blockdev-add","arguments":{"driver":"nbd","node-name":"mirror","server":{"type":"inet","host":"10.73.73.83","port":"3333"},"export":"drive_data1"}}
{"return": {}}
     {"execute": "blockdev-mirror", "arguments": { "device": "drive_data1","target": "mirror", "sync": "full", "job-id":"j1"}}
{"timestamp": {"seconds": 1569727762, "microseconds": 131671}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "j1"}}
{"timestamp": {"seconds": 1569727762, "microseconds": 131708}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "j1"}}
{"return": {}}
{"timestamp": {"seconds": 1569727780, "microseconds": 13812}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "j1"}}
{"timestamp": {"seconds": 1569727780, "microseconds": 13867}, "event": "BLOCK_JOB_READY", "data": {"device": "j1", "len": 2147614720, "offset": 2147614720, "speed": 0, "type": "mirror"}}

7. Migrate from src to dst
   { "execute": "migrate", "arguments": { "uri": "tcp:10.73.224.68:5000"}}
   {"timestamp": {"seconds": 1569727797, "microseconds": 875204}, "event": "MIGRATION", "data": {"status": "setup"}}
{"return": {}}
{"timestamp": {"seconds": 1569727797, "microseconds": 895189}, "event": "MIGRATION_PASS", "data": {"pass": 1}}
{"timestamp": {"seconds": 1569727797, "microseconds": 895245}, "event": "MIGRATION", "data": {"status": "active"}}
{"timestamp": {"seconds": 1569727859, "microseconds": 370179}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
{"timestamp": {"seconds": 1569727862, "microseconds": 279190}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
{"timestamp": {"seconds": 1569727862, "microseconds": 579658}, "event": "MIGRATION_PASS", "data": {"pass": 4}}
{"timestamp": {"seconds": 1569727862, "microseconds": 593908}, "event": "STOP"}
{"timestamp": {"seconds": 1569727862, "microseconds": 595872}, "event": "MIGRATION", "data": {"status": "pre-switchover"}}

8. Cancel block job.
 {"execute":"block-job-cancel","arguments":{"device":"j1"}}
{"return": {}}
{"timestamp": {"seconds": 1569727876, "microseconds": 671310}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "j1"}}
{"timestamp": {"seconds": 1569727876, "microseconds": 671348}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "j1"}}
{"timestamp": {"seconds": 1569727876, "microseconds": 671421}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "j1", "len": 2147942400, "offset": 2147942400, "speed": 0, "type": "mirror"}}
{"timestamp": {"seconds": 1569727876, "microseconds": 671446}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "j1"}}
{"timestamp": {"seconds": 1569727876, "microseconds": 671477}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "j1"}}

9. Continue migrate
   {"execute":"migrate-continue","arguments":{"state":"pre-switchover"}}
{"return": {}}
{"timestamp": {"seconds": 1569727888, "microseconds": 547919}, "event": "MIGRATION", "data": {"status": "device"}}
{"timestamp": {"seconds": 1569727888, "microseconds": 548552}, "event": "MIGRATION_PASS", "data": {"pass": 5}}
{"timestamp": {"seconds": 1569727888, "microseconds": 553978}, "event": "MIGRATION", "data": {"status": "completed"}}

10. Info vm status in src and guest.
   (src qemu)info status
    VM status: paused (postmigrate)

   (dst qemu)info status
    VM status: running

Comment 19 errata-xmlrpc 2019-11-06 07:19:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3723