Bug 1773517 - Src qemu hang when do storage vm migration with dataplane enable
Summary: Src qemu hang when do storage vm migration with dataplane enable
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.2
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Sergio Lopez
QA Contact: aihua liang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-18 10:35 UTC by aihua liang
Modified: 2020-05-05 09:51 UTC (History)
11 users (show)

Fixed In Version: qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:50:55 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:51:38 UTC

Description aihua liang 2019-11-18 10:35:01 UTC
Description of problem:
 Src qemu hang when do storage vm migration with dataplane enable

Version-Release number of selected component (if applicable):
 kernel version: 4.18.0-147.el8.x86_64\
 qemu-kvm version: qemu-kvm-4.2.0-0.module+el8.2.0+4714+8670762e.x86_64
 
How reproducible:
 100%

Steps to Reproduce:
 1.Create an empty disk in dst, start guest with it and expose it.
     #qemu-img create -f qcow2 /home/aliang/mirror.qcow2
     /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1mt,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id4p8G4l \
    -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/aliang/mirror.qcow2 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,bus=pcie.0-root-port-3,addr=0x0,id=scsi0,iothread=iothread0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idnMcj5J,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :1  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \
    -incoming tcp:0:5000 \
   
    { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments": { "device": "drive_image1","writable": true } }
{"return": {}}

  2. In src, start guest with qemu cmds:
      /usr/libexec/qemu-kvm \
    -name 'avocado-vt-vm1' \
    -machine q35  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x1 \
    -m 7168  \
    -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2  \
    -cpu 'Skylake-Client',+kvm_pv_unhalt  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1ms,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control \
    -device pvpanic,ioport=0x505,id=id4p8G4l \
    -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \
    -device isa-serial,id=serial0,chardev=chardev_serial0  \
    -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \
    -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \
    -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \
    -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \
    -object iothread,id=iothread0 \
    -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel820-64-virtio.qcow2 \
    -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \
    -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \
    -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \
    -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \
    -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0  \
    -netdev tap,id=idnMcj5J,vhost=on \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :0  \
    -rtc base=utc,clock=host,driftfix=slew  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \
    -monitor stdio \

  3. Do mirror from src to dst.
     { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target": "nbd://10.73.224.68:3333/drive_image1", "sync": "full","format": "raw", "mode": "existing" } }

Actual results:
  After step3, src qemu hang.
  gdb info:
  (gdb) bt
#0  0x00007f4b71412306 in __GI_ppoll (fds=0x559b919c39b0, nfds=1, timeout=<optimized out>, 
    timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0x0000559b9082a909 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:77
#2  0x0000559b9082a909 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at util/qemu-timer.c:336
#3  0x0000559b9082c8c4 in aio_poll (ctx=0x559b9199a570, blocking=blocking@entry=true) at util/aio-posix.c:669
#4  0x0000559b907a815f in bdrv_drained_end (bs=bs@entry=0x559b91cbb510) at block/io.c:497
#5  0x0000559b90761a8b in bdrv_set_aio_context_ignore
    (bs=0x559b91cbb510, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60) at block.c:6019
#6  0x0000559b90761adc in bdrv_set_aio_context_ignore
    (bs=bs@entry=0x559b91b08200, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60)
    at block.c:5989
#7  0x0000559b90761e53 in bdrv_child_try_set_aio_context
    (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x7ffccf8fa048) at block.c:6102
#8  0x0000559b9076346e in bdrv_try_set_aio_context
    (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, errp=errp@entry=0x7ffccf8fa048) at block.c:6111
#9  0x0000559b90604f8e in qmp_drive_mirror (arg=arg@entry=0x7ffccf8fa050, errp=errp@entry=0x7ffccf8fa048) at blockdev.c:3996
#10 0x0000559b9071e6d9 in qmp_marshal_drive_mirror (args=<optimized out>, ret=<optimized out>, errp=0x7ffccf8fa148)
    at qapi/qapi-commands-block-core.c:619
#11 0x0000559b907e198c in do_qmp_dispatch
    (errp=0x7ffccf8fa140, allow_oob=<optimized out>, request=<optimized out>, cmds=0x559b910cdcc0 <qmp_commands>)
    at qapi/qmp-dispatch.c:132
#12 0x0000559b907e198c in qmp_dispatch
    (cmds=0x559b910cdcc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175
#13 0x0000559b90700141 in monitor_qmp_dispatch (mon=0x559b919bb340, req=<optimized out>) at monitor/qmp.c:120
#14 0x0000559b9070078a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209
#15 0x0000559b90829366 in aio_bh_call (bh=0x559b91911c60) at util/async.c:117
#16 0x0000559b90829366 in aio_bh_poll (ctx=ctx@entry=0x559b91910840) at util/async.c:117
#17 0x0000559b9082c754 in aio_dispatch (ctx=0x559b91910840) at util/aio-posix.c:459
#18 0x0000559b90829242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at util/async.c:260
#19 0x00007f4b75bda67d in g_main_dispatch (context=0x559b9199b9c0) at gmain.c:3176
#20 0x00007f4b75bda67d in g_main_context_dispatch (context=context@entry=0x559b9199b9c0) at gmain.c:3829
#21 0x0000559b9082b808 in glib_pollfds_poll () at util/main-loop.c:219
#22 0x0000559b9082b808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242
#23 0x0000559b9082b808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518
#24 0x0000559b9060d201 in main_loop () at vl.c:1828
#25 0x0000559b904b9b82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504

Expected results:
 When enable dataplane, storage vm migration can execute successfully.

Additional info:
 Both virtio_blk+dataplane+NBD and virtio_scsi+dataplane+NBD hit this issue.
 When disable dataplane, it works ok.
 When mirror to image on localfs, it works ok.

Note:
 It block all test cases in storage_vm_migration with dataplane enable.
 And it's a basic configuration that will be hit by upper layer.
 So, set its priority to "High".

Comment 4 aihua liang 2019-12-03 06:33:04 UTC
Test on qemu-kvm-4.1.0-16.module+el8.1.1+4917+752cfd65.x86_64, don't hit this issue, set it as a regression bug.

Comment 5 aihua liang 2019-12-16 05:59:25 UTC
Test on qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 + blockdev, also hit this issue.

Comment 6 Ademar Reis 2020-02-05 23:08:39 UTC
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks

Comment 12 aihua liang 2020-02-19 07:59:43 UTC
Test on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e with -drive, it works ok.

Comment 13 aihua liang 2020-02-19 08:14:55 UTC
Test on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e with -blockdev, it works ok.

Comment 14 aihua liang 2020-02-20 10:23:09 UTC
No regression issue on qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904, set bug's status to "Verified".

Comment 16 errata-xmlrpc 2020-05-05 09:50:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.