Description of problem: Src qemu hang when do storage vm migration with dataplane enable Version-Release number of selected component (if applicable): kernel version: 4.18.0-147.el8.x86_64\ qemu-kvm version: qemu-kvm-4.2.0-0.module+el8.2.0+4714+8670762e.x86_64 How reproducible: 100% Steps to Reproduce: 1.Create an empty disk in dst, start guest with it and expose it. #qemu-img create -f qcow2 /home/aliang/mirror.qcow2 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -m 7168 \ -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id4p8G4l \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/aliang/mirror.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,bus=pcie.0-root-port-3,addr=0x0,id=scsi0,iothread=iothread0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idnMcj5J,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -incoming tcp:0:5000 \ { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } } {"return": {}} { "execute": "nbd-server-add", "arguments": { "device": "drive_image1","writable": true } } {"return": {}} 2. In src, start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -m 7168 \ -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1ms,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id4p8G4l \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel820-64-virtio.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idnMcj5J,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ 3. Do mirror from src to dst. { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target": "nbd://10.73.224.68:3333/drive_image1", "sync": "full","format": "raw", "mode": "existing" } } Actual results: After step3, src qemu hang. gdb info: (gdb) bt #0 0x00007f4b71412306 in __GI_ppoll (fds=0x559b919c39b0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000559b9082a909 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000559b9082a909 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x0000559b9082c8c4 in aio_poll (ctx=0x559b9199a570, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x0000559b907a815f in bdrv_drained_end (bs=bs@entry=0x559b91cbb510) at block/io.c:497 #5 0x0000559b90761a8b in bdrv_set_aio_context_ignore (bs=0x559b91cbb510, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60) at block.c:6019 #6 0x0000559b90761adc in bdrv_set_aio_context_ignore (bs=bs@entry=0x559b91b08200, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60) at block.c:5989 #7 0x0000559b90761e53 in bdrv_child_try_set_aio_context (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x7ffccf8fa048) at block.c:6102 #8 0x0000559b9076346e in bdrv_try_set_aio_context (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, errp=errp@entry=0x7ffccf8fa048) at block.c:6111 #9 0x0000559b90604f8e in qmp_drive_mirror (arg=arg@entry=0x7ffccf8fa050, errp=errp@entry=0x7ffccf8fa048) at blockdev.c:3996 #10 0x0000559b9071e6d9 in qmp_marshal_drive_mirror (args=<optimized out>, ret=<optimized out>, errp=0x7ffccf8fa148) at qapi/qapi-commands-block-core.c:619 #11 0x0000559b907e198c in do_qmp_dispatch (errp=0x7ffccf8fa140, allow_oob=<optimized out>, request=<optimized out>, cmds=0x559b910cdcc0 <qmp_commands>) at qapi/qmp-dispatch.c:132 #12 0x0000559b907e198c in qmp_dispatch (cmds=0x559b910cdcc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #13 0x0000559b90700141 in monitor_qmp_dispatch (mon=0x559b919bb340, req=<optimized out>) at monitor/qmp.c:120 #14 0x0000559b9070078a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209 #15 0x0000559b90829366 in aio_bh_call (bh=0x559b91911c60) at util/async.c:117 #16 0x0000559b90829366 in aio_bh_poll (ctx=ctx@entry=0x559b91910840) at util/async.c:117 #17 0x0000559b9082c754 in aio_dispatch (ctx=0x559b91910840) at util/aio-posix.c:459 #18 0x0000559b90829242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) --Type <RET> for more, q to quit, c to continue without paging-- at util/async.c:260 #19 0x00007f4b75bda67d in g_main_dispatch (context=0x559b9199b9c0) at gmain.c:3176 #20 0x00007f4b75bda67d in g_main_context_dispatch (context=context@entry=0x559b9199b9c0) at gmain.c:3829 #21 0x0000559b9082b808 in glib_pollfds_poll () at util/main-loop.c:219 #22 0x0000559b9082b808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #23 0x0000559b9082b808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #24 0x0000559b9060d201 in main_loop () at vl.c:1828 #25 0x0000559b904b9b82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Expected results: When enable dataplane, storage vm migration can execute successfully. Additional info: Both virtio_blk+dataplane+NBD and virtio_scsi+dataplane+NBD hit this issue. When disable dataplane, it works ok. When mirror to image on localfs, it works ok. Note: It block all test cases in storage_vm_migration with dataplane enable. And it's a basic configuration that will be hit by upper layer. So, set its priority to "High".
Test on qemu-kvm-4.1.0-16.module+el8.1.1+4917+752cfd65.x86_64, don't hit this issue, set it as a regression bug.
Test on qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 + blockdev, also hit this issue.
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Test on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e with -drive, it works ok.
Test on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e with -blockdev, it works ok.
No regression issue on qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904, set bug's status to "Verified".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017