Bug 1764120
Summary: | [Data plane]virtio_scsi_ctx_check: Assertion `blk_get_aio_context(d->conf.blk) == s->ctx' failed when unplug a device that running block stream on it | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | aihua liang <aliang> |
Component: | qemu-kvm-rhev | Assignee: | Sergio Lopez <slopezpa> |
Status: | CLOSED ERRATA | QA Contact: | aihua liang <aliang> |
Severity: | unspecified | Docs Contact: | |
Priority: | medium | ||
Version: | 7.8 | CC: | coli, jinzhao, juzhang, lmiksik, mjankula, ngu, qzhang, redhat-bugzilla, virt-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | qemu-kvm-rhev-2.12.0-39.el7 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-03-31 14:34:56 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
aihua liang
2019-10-22 09:42:19 UTC
Sergio - this is I believe similar to bz 1683937 (comment 5 perhaps) Thanks a lot for the access to the machine and the detailed reproducer. With them I've been able to see the error myself, and dig a bit on it. Turns out this is an issue fixed upstream by this commit: commit 9c5aad84da1c37429d06c193f23a8df6445ed29e Author: Zhengui Li <lizhengui> Date: Mon Jul 22 23:05:20 2019 +0200 virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi disk commit a6f230c move blockbackend back to main AioContext on unplug. It set the AioContext of SCSIDevice to the main AioContex, but s->ctx is still the iothread AioContex(if the scsi controller is configure with iothread). So if there are having in-flight requests during unplug, a failing assertion happend. The bt is below: (gdb) bt #0 0x0000ffff86aacbd0 in raise () from /lib64/libc.so.6 #1 0x0000ffff86aadf7c in abort () from /lib64/libc.so.6 #2 0x0000ffff86aa6124 in __assert_fail_base () from /lib64/libc.so.6 #3 0x0000ffff86aa61a4 in __assert_fail () from /lib64/libc.so.6 #4 0x0000000000529118 in virtio_scsi_ctx_check (d=<optimized out>, s=<optimized out>, s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:246 #5 0x0000000000529ec4 in virtio_scsi_handle_cmd_req_prepare (s=0x2779ec00, req=0xffff740397d0) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:559 #6 0x000000000052a228 in virtio_scsi_handle_cmd_vq (s=0x2779ec00, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:603 #7 0x000000000052afa8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0xffff7c6d7110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:59 #8 0x000000000054d94c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>) at /home/qemu-4.0.0/hw/virtio/virtio.c:2452 assert(blk_get_aio_context(d->conf.blk) == s->ctx) failed. To avoid assertion failed, moving the "if" after qdev_simple_device_unplug_cb. In addition, to avoid another qemu crash below, add aio_disable_external before qdev_simple_device_unplug_cb, which disable the further processing of external clients when doing qdev_simple_device_unplug_cb. (gdb) bt #0 scsi_req_unref (req=0xffff6802c6f0) at hw/scsi/scsi-bus.c:1283 #1 0x00000000005294a4 in virtio_scsi_handle_cmd_req_submit (req=<optimized out>, s=<optimized out>) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:589 #2 0x000000000052a2a8 in virtio_scsi_handle_cmd_vq (s=s@entry=0x9c90e90, vq=vq@entry=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi.c:625 #3 0x000000000052afd8 in virtio_scsi_data_plane_handle_cmd (vdev=<optimized out>, vq=0xffff7c05f110) at /home/qemu-4.0.0/hw/scsi/virtio-scsi-dataplane.c:60 #4 0x000000000054d97c in virtio_queue_host_notifier_aio_poll (opaque=<optimized out>) at /home/qemu-4.0.0/hw/virtio/virtio.c:2447 #5 0x00000000009b204c in run_poll_handlers_once (ctx=ctx@entry=0x6efea40, timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:521 #6 0x00000000009b2b64 in run_poll_handlers (ctx=ctx@entry=0x6efea40, max_ns=max_ns@entry=4000, timeout=timeout@entry=0xffff7d7f7308) at util/aio-posix.c:559 #7 0x00000000009b2ca0 in try_poll_mode (ctx=ctx@entry=0x6efea40, timeout=0xffff7d7f7308, timeout@entry=0xffff7d7f7348) at util/aio-posix.c:594 #8 0x00000000009b31b8 in aio_poll (ctx=0x6efea40, blocking=blocking@entry=true) at util/aio-posix.c:636 #9 0x00000000006973cc in iothread_run (opaque=0x6ebd800) at iothread.c:75 #10 0x00000000009b592c in qemu_thread_start (args=0x6efef60) at util/qemu-thread-posix.c:502 #11 0x0000ffff8057f8bc in start_thread () from /lib64/libpthread.so.0 #12 0x0000ffff804e5f8c in thread_start () from /lib64/libc.so.6 (gdb) p bus $1 = (SCSIBus *) 0x0 Signed-off-by: Zhengui li <lizhengui> Message-Id: <1563696502-7972-1-git-send-email-lizhengui> Signed-off-by: Paolo Bonzini <pbonzini> Message-Id: <1563829520-17525-1-git-send-email-pbonzini> Signed-off-by: Paolo Bonzini <pbonzini> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index d0bdbff090..8b9e5e2b49 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -832,6 +832,7 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev); VirtIOSCSI *s = VIRTIO_SCSI(vdev); SCSIDevice *sd = SCSI_DEVICE(dev); + AioContext *ctx = s->ctx ?: qemu_get_aio_context(); if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) { virtio_scsi_acquire(s); @@ -841,14 +842,16 @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, virtio_scsi_release(s); } + aio_disable_external(ctx); + qdev_simple_device_unplug_cb(hotplug_dev, dev, errp); + aio_enable_external(ctx); + if (s->ctx) { virtio_scsi_acquire(s); /* If other users keep the BlockBackend in the iothread, that's ok */ blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL); virtio_scsi_release(s); } - - qdev_simple_device_unplug_cb(hotplug_dev, dev, errp); } static struct SCSIBusInfo virtio_scsi_scsi_info = { Sadly, we can't do a simple backport of this fix, because the hotplug mechanism has changed significantly from 2.12 to 4.2. In fact, in 2.12, the call to qdev_simple_device_unplug_cb() will cause the underlying device to be immediately released, so the attempt to access sd->conf.blk below will lead to a crash. I'm going to look for an alternative solution. Verified on qemu-kvm-rhev-2.12.0-39.el7.x86_64, it works ok, set bug's status to "Verified". Run test case for 20 times, all pass. cmd: python ConfigTest.py --guestname=RHEL.7.8 --platform=x86_64 --testcase=block_stream.with_hot_unplug.with_data_plane --nrepeat=20 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1216 |