Bug 1486594
Summary: | qemu-kvm core dumped when hot-unplug a virtio-blk/virtio-scsi device which is in use | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yilzhang |
Component: | qemu-kvm-rhev | Assignee: | Ademar Reis <areis> |
Status: | CLOSED WORKSFORME | QA Contact: | CongLi <coli> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 7.5 | CC: | aliang, chayang, coli, juzhang, knoel, lijin, pbonzini, qzhang, stefanha, virt-maint, yujma |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-07-19 09:20:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1558351 |
Description
yilzhang
2017-08-30 08:27:00 UTC
This also could be reproduced on x86 and P8 P8 version: Host kernel: 3.10.0-693.el7.ppc64le qemu version: qemu-kvm-rhev-2.9.0-16.el7_4.3.ppc64le Guest kernel: 3.10.0-693.el7.ppc64le x86 version: Host kernel: 3.10.0-693.el7.x86_64 qemu version: qemu-kvm-rhev-2.9.0-16.el7_4.3.x86_64 Guest kernel: 3.10.0-648.el7.x86_64 When testing this scenario, I also hit another call trace of core (The only difference: qemu command line used is different from the above one) [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 8192 -serial unix:/'. Program terminated with signal 11, Segmentation fault. #0 0x0000000046ce7bc8 in bdrv_inc_in_flight (bs=0x0) at block/io.c:508 508 atomic_inc(&bs->in_flight); Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le elfutils-libelf-0.168-8.el7.ppc64le elfutils-libs-0.168-8.el7.ppc64le glib2-2.50.3-3.el7.ppc64le glibc-2.17-196.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-9.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-8.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-10.el7.ppc64le libcurl-7.29.0-42.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-16.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-13-7.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-4.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-13-7.el7.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-16.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7_3.ppc64le nss-3.28.4-12.el7_4.ppc64le nss-softokn-freebl-3.28.3-8.el7_4.ppc64le nss-util-3.28.4-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-5.el7.ppc64le openssl-libs-1.0.2k-8.el7.ppc64le p11-kit-0.23.5-3.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-42.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le (gdb) bt #0 0x0000000046ce7bc8 in bdrv_inc_in_flight (bs=0x0) at block/io.c:508 #1 0x0000000046cd8838 in blk_aio_prwv (blk=0x7b3814a0, offset=<optimized out>, bytes=<optimized out>, qiov=<optimized out>, co_entry=<optimized out>, flags=<optimized out>, cb=<optimized out>, opaque=<optimized out>) at block/block-backend.c:1145 #2 0x0000000046a1db54 in virtio_blk_handle_flush (mrb=0x3fffafd1de60, req=0x7b242840) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:457 #3 virtio_blk_handle_request (req=0x7b242840, mrb=0x3fffafd1de60) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:566 #4 0x0000000046a1e060 in virtio_blk_handle_vq (s=0x7d228510, vq=0x7de60000) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:609 #5 0x0000000046a1e3c0 in virtio_blk_data_plane_handle_output (vdev=<optimized out>, vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/block/dataplane/virtio-blk.c:158 #6 0x0000000046a4c19c in virtio_queue_notify_aio_vq (vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1510 #7 0x0000000046d890bc in aio_dispatch_handlers (ctx=0x7b2c17c0) at util/aio-posix.c:399 #8 0x0000000046d89f54 in aio_poll (ctx=0x7b2c17c0, blocking=<optimized out>) at util/aio-posix.c:685 #9 0x0000000046b70548 in iothread_run (opaque=0x7b410840) at iothread.c:59 #10 0x00003fffb1aa8af4 in start_thread () from /lib64/libpthread.so.0 #11 0x00003fffb19d4ef4 in clone () from /lib64/libc.so.6 (gdb) bt full #0 0x0000000046ce7bc8 in bdrv_inc_in_flight (bs=0x0) at block/io.c:508 No locals. #1 0x0000000046cd8838 in blk_aio_prwv (blk=0x7b3814a0, offset=<optimized out>, bytes=<optimized out>, qiov=<optimized out>, co_entry=<optimized out>, flags=<optimized out>, cb=<optimized out>, opaque=<optimized out>) at block/block-backend.c:1145 acb = <optimized out> co = <optimized out> #2 0x0000000046a1db54 in virtio_blk_handle_flush (mrb=0x3fffafd1de60, req=0x7b242840) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:457 No locals. #3 virtio_blk_handle_request (req=0x7b242840, mrb=0x3fffafd1de60) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:566 type = <optimized out> in_iov = 0x7b2428f8 iov = 0x7b242918 in_num = 0 out_num = 0 s = <optimized out> vdev = 0x7d228510 __func__ = "virtio_blk_handle_request" __PRETTY_FUNCTION__ = "virtio_blk_handle_request" #4 0x0000000046a1e060 in virtio_blk_handle_vq (s=0x7d228510, vq=0x7de60000) at /usr/src/debug/qemu-2.9.0/hw/block/virtio-blk.c:609 req = 0x7b242840 mrb = {reqs = {0x0 <repeats 32 times>}, num_reqs = 0, is_write = false} progress = true #5 0x0000000046a1e3c0 in virtio_blk_data_plane_handle_output (vdev=<optimized out>, vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/block/dataplane/virtio-blk.c:158 s = <optimized out> #6 0x0000000046a4c19c in virtio_queue_notify_aio_vq (vq=<optimized out>) at /usr/src/debug/qemu-2.9.0/hw/virtio/virtio.c:1510 vdev = <optimized out> #7 0x0000000046d890bc in aio_dispatch_handlers (ctx=0x7b2c17c0) at util/aio-posix.c:399 revents = <optimized out> node = <optimized out> tmp = 0x7b436180 progress = <optimized out> #8 0x0000000046d89f54 in aio_poll (ctx=0x7b2c17c0, blocking=<optimized out>) at util/aio-posix.c:685 node = <optimized out> i = <optimized out> ret = <optimized out> progress = false timeout = <optimized out> start = 108918899346437 __PRETTY_FUNCTION__ = "aio_poll" #9 0x0000000046b70548 in iothread_run (opaque=0x7b410840) at iothread.c:59 iothread = 0x7b410840 #10 0x00003fffb1aa8af4 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #11 0x00003fffb19d4ef4 in clone () from /lib64/libc.so.6 No symbol table info available. (gdb) qemu-kvm may also core dumped when hot-unplug a virtio-scsi device with data-plane when it is in use. So will change this bug's title Back Trace of core dump: [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/libexec/qemu-kvm -smp 8,sockets=2,cores=4,threads=1 -m 8192 -serial unix:/'. Program terminated with signal 11, Segmentation fault. #0 0x00003fff9059b530 in pthread_mutex_lock () from /lib64/libpthread.so.0 Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.ppc64le cyrus-sasl-lib-2.1.26-21.el7.ppc64le cyrus-sasl-plain-2.1.26-21.el7.ppc64le elfutils-libelf-0.168-8.el7.ppc64le elfutils-libs-0.168-8.el7.ppc64le glib2-2.50.3-3.el7.ppc64le glibc-2.17-196.el7.ppc64le gmp-6.0.0-15.el7.ppc64le gnutls-3.3.26-9.el7.ppc64le gperftools-libs-2.4-8.el7.ppc64le keyutils-libs-1.5.8-3.el7.ppc64le krb5-libs-1.15.1-8.el7.ppc64le libaio-0.3.109-13.el7.ppc64le libattr-2.4.46-12.el7.ppc64le libcap-2.22-9.el7.ppc64le libcom_err-1.42.9-10.el7.ppc64le libcurl-7.29.0-42.el7.ppc64le libdb-5.3.21-20.el7.ppc64le libfdt-1.4.3-1.el7.ppc64le libffi-3.0.13-18.el7.ppc64le libgcc-4.8.5-16.el7.ppc64le libgcrypt-1.5.3-14.el7.ppc64le libgpg-error-1.12-3.el7.ppc64le libibverbs-13-7.el7.ppc64le libidn-1.28-4.el7.ppc64le libiscsi-1.9.0-7.el7.ppc64le libnl3-3.2.28-4.el7.ppc64le libpng-1.5.13-7.el7_2.ppc64le librdmacm-13-7.el7.ppc64le libseccomp-2.3.1-3.el7.ppc64le libselinux-2.5-11.el7.ppc64le libssh2-1.4.3-10.el7_2.1.ppc64le libstdc++-4.8.5-16.el7.ppc64le libtasn1-4.10-1.el7.ppc64le libusbx-1.0.20-1.el7.ppc64le lzo-2.06-8.el7.ppc64le nettle-2.7.1-8.el7.ppc64le nspr-4.13.1-1.0.el7_3.ppc64le nss-3.28.4-12.el7_4.ppc64le nss-softokn-freebl-3.28.3-8.el7_4.ppc64le nss-util-3.28.4-3.el7.ppc64le numactl-libs-2.0.9-6.el7_2.ppc64le openldap-2.4.44-5.el7.ppc64le openssl-libs-1.0.2k-8.el7.ppc64le p11-kit-0.23.5-3.el7.ppc64le pcre-8.32-17.el7.ppc64le pixman-0.34.0-1.el7.ppc64le snappy-1.1.0-3.el7.ppc64le systemd-libs-219-42.el7.ppc64le xz-libs-5.2.2-1.el7.ppc64le zlib-1.2.7-17.el7.ppc64le (gdb) bt #0 0x00003fff9059b530 in pthread_mutex_lock () from /lib64/libpthread.so.0 #1 0x000000004ad9cd38 in qemu_mutex_lock (mutex=<optimized out>) at util/qemu-thread-posix.c:60 #2 0x000000004ad95e6c in aio_context_acquire (ctx=<error reading variable: value has been optimized out>) at util/async.c:489 #3 0x000000004abf9750 in scsi_dma_complete (opaque=0x6fcb8800, ret=<optimized out>) at hw/scsi/scsi-disk.c:295 #4 0x000000004ab85874 in dma_complete (ret=<optimized out>, dbs=0x6dd215f0) at dma-helpers.c:116 #5 dma_blk_cb (opaque=0x6dd215f0, ret=<optimized out>) at dma-helpers.c:138 #6 0x000000004ace5740 in blk_aio_complete (acb=0x6dc219a0) at block/block-backend.c:1125 #7 0x000000004adb52b8 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:79 #8 0x00003fff903f2b9c in makecontext () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () Paolo, does this ring a bell? I mean, the immediate cause of the trace in comment 3 is clear (by the way, did we leave blk_aio_*() on empty BBs completely unfixed in the end?), but none of these functions should even be called after the device is unplugged. Somehow the requests aren't correctly drained, it seems. Nope, doesn't ring a bell... I'll try reproducing with both upstream and RHEL QEMU, maybe we can bisect. (In reply to Paolo Bonzini from comment #6) > Nope, doesn't ring a bell... I'll try reproducing with both upstream and > RHEL QEMU, maybe we can bisect. Paolo, can you please take a look? Yilin, can you please test again with 2.10? Thanks! (In reply to Paolo Bonzini from comment #8) > Yilin, can you please test again with 2.10? Thanks! Qemu-kvm 2.10 still has this issue (qemu will crash with core dumped). Testing was conducted on Power8 host: Host kernel: 3.10.0-829.el7.ppc64le Guest kernel: 3.10.0-827.el7.ppc64le qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-17.el7 *** Bug 1516663 has been marked as a duplicate of this bug. *** Could trigger qemu core dump no matter data plane enabled. Not hit this issue, guest works well. RHEL-AV 8.1: Host kernel: 4.18.0-112.el8.x86_64 qemu version: qemu-kvm-4.0.0-5.module+el8.1.0+3622+5812d9bf RHEL 8.1: Host kernel: 4.18.0-112.el8.x86_64 qemu version: qemu-kvm-2.12.0-81.module+el8.1.0+3619+dfe1ae01 Not hit this issue, guest works well. RHEL 7.7: Host kernel: 3.10.0-1060.el7.x86_64 qemu version: qemu-kvm-rhev-2.12.0-33.el7 (In reply to yilzhang from comment #0) > Steps to Reproduce: > 1. Start guest with data plane enable > > 2. Hotplug one virtio-blk device with dataplane > [host]# qemu-img create -f qcow2 /home/add-disk.qcow2 2T > (qemu) __com.redhat_drive_add > file=/home/add-disk.qcow2,id=data_disk,format=qcow2,werror=stop,rerror=stop, > cache=none,aio=native > (qemu) device_add > virtio-blk-pci,drive=data_disk,id=data,iothread=iothread0 > > 3. Inside guest, write to this data disk > Guest ~]# dd if=/dev/zero of=/dev/vda bs=1M count=4444 oflag=sync > status=progress > 4. During dd is in progress, hot unplug this device > (qemu) drive_del data_disk ^^^^^^^ Confirmed with developer before, drive_del virtio-blk-pci directly is not supported. So I will close this bug, please be free to reopen it if there is anything wrong. Thanks. |