Bug 1746217
Summary: | Src qemu hang when do storage vm migration during guest installation | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | aihua liang <aliang> |
Component: | qemu-kvm | Assignee: | Sergio Lopez <slopezpa> |
qemu-kvm sub component: | Storage | QA Contact: | aihua liang <aliang> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | unspecified | ||
Priority: | medium | CC: | coli, jferlan, jinzhao, juzhang, knoel, ngu, qzhang, slopezpa, virt-maint |
Version: | 8.1 | Keywords: | Regression |
Target Milestone: | rc | ||
Target Release: | 8.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-05 09:49:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1758964 |
Description
aihua liang
2019-08-28 01:55:07 UTC
Reproduce rate of this bug is not 100%, set its priority to "medium". If IOThreads weren't configured - is this still reproducible? IOW: Trying to determine IOThreads or NBD type problem. FWIW: indicating "backend:gluster" and usage of file=/mnt/nfs/install.qcow in src is confusing (In reply to John Ferlan from comment #4) > If IOThreads weren't configured - is this still reproducible? The issue not exist when iothreads not configured. When iothreads configured, both virtio_blk and virtio_scsi have this issue. > > IOW: Trying to determine IOThreads or NBD type problem. > It works ok when just do drive mirror on localfs (iothreads configured) via cmd.: { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target": "/home/rhel810-64-virtio.qcow2", "sync": "full","format": "qcow2", "mode": "existing" } } > FWIW: indicating "backend:gluster" and usage of file=/mnt/nfs/install.qcow > in src is confusing backend:gluster(mounted) means backend is gluster, then i mounted it via cmd: "mount.glusterfs intel-5405-32-2.englab.nay.redhat.com:/aliang /mnt/nfs", then i create the system disk and data disk images on it. Sorry for the confusing. Looks like bdrv_try_set_aio_context() is called with the wrong context acquired. We have a patch upstream addressing this issue, but has not been yet merged. https://lists.gnu.org/archive/html/qemu-block/2019-09/msg00643.html Update for upstream posting: https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg01657.html Hi, Sergio I hit this issue in my RHEL8.2.0 test, and its reproduce rate is 100%. The gdb info looks a little different from that in the description, can you help to check if they are the same issue? Thanks (gdb) bt #0 0x00007f4b71412306 in __GI_ppoll (fds=0x559b919c39b0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x0000559b9082a909 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x0000559b9082a909 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x0000559b9082c8c4 in aio_poll (ctx=0x559b9199a570, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x0000559b907a815f in bdrv_drained_end (bs=bs@entry=0x559b91cbb510) at block/io.c:497 #5 0x0000559b90761a8b in bdrv_set_aio_context_ignore (bs=0x559b91cbb510, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60) at block.c:6019 #6 0x0000559b90761adc in bdrv_set_aio_context_ignore (bs=bs@entry=0x559b91b08200, new_context=new_context@entry=0x559b919b01e0, ignore=ignore@entry=0x7ffccf8f9f60) at block.c:5989 #7 0x0000559b90761e53 in bdrv_child_try_set_aio_context (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, ignore_child=ignore_child@entry=0x0, errp=errp@entry=0x7ffccf8fa048) at block.c:6102 #8 0x0000559b9076346e in bdrv_try_set_aio_context (bs=bs@entry=0x559b91b08200, ctx=ctx@entry=0x559b919b01e0, errp=errp@entry=0x7ffccf8fa048) at block.c:6111 #9 0x0000559b90604f8e in qmp_drive_mirror (arg=arg@entry=0x7ffccf8fa050, errp=errp@entry=0x7ffccf8fa048) at blockdev.c:3996 #10 0x0000559b9071e6d9 in qmp_marshal_drive_mirror (args=<optimized out>, ret=<optimized out>, errp=0x7ffccf8fa148) at qapi/qapi-commands-block-core.c:619 #11 0x0000559b907e198c in do_qmp_dispatch (errp=0x7ffccf8fa140, allow_oob=<optimized out>, request=<optimized out>, cmds=0x559b910cdcc0 <qmp_commands>) at qapi/qmp-dispatch.c:132 #12 0x0000559b907e198c in qmp_dispatch (cmds=0x559b910cdcc0 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #13 0x0000559b90700141 in monitor_qmp_dispatch (mon=0x559b919bb340, req=<optimized out>) at monitor/qmp.c:120 #14 0x0000559b9070078a in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:209 #15 0x0000559b90829366 in aio_bh_call (bh=0x559b91911c60) at util/async.c:117 #16 0x0000559b90829366 in aio_bh_poll (ctx=ctx@entry=0x559b91910840) at util/async.c:117 #17 0x0000559b9082c754 in aio_dispatch (ctx=0x559b91910840) at util/aio-posix.c:459 #18 0x0000559b90829242 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) --Type <RET> for more, q to quit, c to continue without paging-- at util/async.c:260 #19 0x00007f4b75bda67d in g_main_dispatch (context=0x559b9199b9c0) at gmain.c:3176 #20 0x00007f4b75bda67d in g_main_context_dispatch (context=context@entry=0x559b9199b9c0) at gmain.c:3829 #21 0x0000559b9082b808 in glib_pollfds_poll () at util/main-loop.c:219 #22 0x0000559b9082b808 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #23 0x0000559b9082b808 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #24 0x0000559b9060d201 in main_loop () at vl.c:1828 #25 0x0000559b904b9b82 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Test Env: kernel version: 4.18.0-147.el8.x86_64\ qemu-kvm version: qemu-kvm-4.2.0-0.module+el8.2.0+4714+8670762e.x86_64 Reproduce Rate: 100% Test steps: 1.Create an empty disk in dst, start guest with it and expose it. #qemu-img create -f qcow2 /home/aliang/mirror.qcow2 /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -m 7168 \ -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id4p8G4l \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/aliang/mirror.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,bus=pcie.0-root-port-3,addr=0x0,id=scsi0,iothread=iothread0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idnMcj5J,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ -incoming tcp:0:5000 \ { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "10.73.224.68", "port": "3333" } } } } {"return": {}} { "execute": "nbd-server-add", "arguments": { "device": "drive_image1","writable": true } } {"return": {}} 2. In src, start guest with qemu cmds: /usr/libexec/qemu-kvm \ -name 'avocado-vt-vm1' \ -machine q35 \ -nodefaults \ -device VGA,bus=pcie.0,addr=0x1 \ -m 7168 \ -smp 4,maxcpus=4,cores=2,threads=1,dies=1,sockets=2 \ -cpu 'Skylake-Client',+kvm_pv_unhalt \ -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20191118-011823-gEG3j1ms,server,nowait \ -mon chardev=qmp_id_qmpmonitor1,mode=control \ -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20191118-011823-gEG3j1mt,server,nowait \ -mon chardev=qmp_id_catch_monitor,mode=control \ -device pvpanic,ioport=0x505,id=id4p8G4l \ -chardev socket,server,id=chardev_serial0,path=/var/tmp/serial-serial0-20191118-011823-gEG3j1mt,nowait \ -device isa-serial,id=serial0,chardev=chardev_serial0 \ -chardev socket,id=seabioslog_id_20191118-011823-gEG3j1mt,path=/var/tmp/seabios-20191118-011823-gEG3j1mt,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20191118-011823-gEG3j1mt,iobase=0x402 \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,chassis=2,addr=0x2,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2,addr=0x0 \ -object iothread,id=iothread0 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=/home/kvm_autotest_root/images/rhel820-64-virtio.qcow2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,chassis=3,addr=0x3,bus=pcie.0 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-3,addr=0x0,iothread=iothread0 \ -device scsi-hd,id=image1,drive=drive_image1,bootindex=0,bus=scsi0.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,chassis=4,addr=0x4,bus=pcie.0 \ -device virtio-net-pci,mac=9a:4f:f4:e5:bd:67,id=idkQvhgf,netdev=idnMcj5J,bus=pcie.0-root-port-4,addr=0x0 \ -netdev tap,id=idnMcj5J,vhost=on \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=utc,clock=host,driftfix=slew \ -boot order=cdn,once=c,menu=off,strict=off \ -enable-kvm \ -device pcie-root-port,id=pcie_extra_root_port_0,slot=5,chassis=5,addr=0x5,bus=pcie.0 \ -monitor stdio \ 3. Do mirror from src to dst. { "execute": "drive-mirror", "arguments": { "device": "drive_image1","target": "nbd://10.73.224.68:3333/drive_image1", "sync": "full","format": "raw", "mode": "existing" } } After step3, src qemu hang. Additional info: Both virtio_blk+dataplane+NBD and virtio_scsi+dataplane+NBD hit this issue. When disable dataplane, it works ok. When mirror to image on localfs, it works ok. Hi, Looking at the backtrace, it looks like a slightly different issue that should be fixed by the same patch series. Thanks, Sergio. (In reply to Sergio Lopez from comment #10) > Hi, > > Looking at the backtrace, it looks like a slightly different issue that > should be fixed by the same patch series. > > Thanks, > Sergio. Hi, Sergio The new issue blocked all my storage_vm_migration tests, it's has a priority of high. Filed a new bug bz#1773517 to track the new issue, leave the original one behind to cover different test scenario. As the new issue can fixed by the same patch series, Sergio, can you help to update the patch info in bz#1773517? Thanks, aliang QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks Verified on qemu-kvm-4.2.0-10.module+el8.2.0+5740+c3dff59e, the issue has been resolved, set bug's status to "Verified". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017 |