Bug 1506531
Summary: | [data-plane] Qemu-kvm core dumped when hot-unplugging a block device with data-plane while the drive-mirror job is running | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | yilzhang | ||||
Component: | qemu-kvm-rhev | Assignee: | Jeff Cody <jcody> | ||||
Status: | CLOSED ERRATA | QA Contact: | Qianqian Zhu <qizhu> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.5 | CC: | aliang, chayang, coli, juzhang, knoel, lmiksik, michen, mrezanin, qizhu, qzhang, stefanha, virt-maint, xfu, yilzhang | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | qemu-kvm-rhev-2.10.0-12.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-04-11 00:44:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1508708 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
yilzhang
2017-10-26 09:39:22 UTC
If not use data-plane, qemu-kvm will work well and hot-unplug in step4 can succeed. Yilin How about x86? Can you reproduce? Thanks. Will test it on x86, please stay tuned. (In reply to yilzhang from comment #4) > Will test it on x86, please stay tuned. If it's confirmed to be a power-only BZ, then we'll reassign it. See also Bug 1503437, which might be related. This bug can be reproduced on x86: Host kernel: 3.10.0-747.el7.x86_64 qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-3.el7 Guest kernel: 3.10.0-747.el7.x86_64 Jeff, some ideas to debug this: 1. blockjob.c:block_job_create() calls blk_add_aio_context_notifier() so that the job can respond to AioContext changes. This test case is exactly the type of scenario that this code was written for since device_del will destroy the device that is using an IOThread and move the BlockDriverState back into the global AioContext. 2. bdrv_drained_begin() -> bdrv_co_yield_to_drain() -> qemu_coroutine_yield() should be called with job->busy = true (I haven't checked if this is the case). The assertion failure is complaining about a spurious qemu_coroutine_enter(), so the first step is tracking down who is entering this coroutine. I have my suspicions that this is the same root cause as BZ #1508708. This is fixed in upstream commit: commit 481cad48e5e655746893e001af31c161f4587a02 Author: Manos Pitsidianakis <el13635.gr> Date: Sat Sep 23 14:14:09 2017 +0300 block: add bdrv_co_drain_end callback BlockDriverState has a bdrv_co_drain() callback but no equivalent for the end of the drain. The throttle driver (block/throttle.c) needs a way to mark the end of the drain in order to toggle io_limits_disabled correctly, thus bdrv_co_drain_end is needed. We should probably backport the whole series 78b62d3..b867eaa In addition to the commit in Comment #10, this is also dependent on the patches I submitted upstream for BZ #1508708 Created attachment 1356114 [details]
Reproducer test script
Test script to reproduce the BZ.
Hit same issue when hot unplugging a block running block stream job, on qemu-kvm-rhev-2.10.0-11.el7.x86_64 Fix included in qemu-kvm-rhev-2.10.0-12.el7 Reproduced with qemu-kvm-rhev-2.10.0-11.el7.x86_64: Steps: 1. Launch guest: # /usr/libexec/qemu-kvm -M q35,accel=kvm,kernel-irqchip=split -m 4G -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie.0,iothread=iothread1 -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/kvm_autotest_root/images/win2012-64r2-virtio-scsi.qcow2 -device scsi-hd,id=image1,drive=drive_image1 -device virtio-net-pci,mac=9a:89:8a:8b:8c:8d,id=idTqeiKU,netdev=idU17IIx,bus=pcie.0 -object iothread,id=iothread1 -netdev tap,id=idU17IIx -cpu 'SandyBridge',+kvm_pv_unhalt -vnc :0 -enable-kvm -qmp tcp::5555,server,nowait -monitor stdio -object iothread,id=iothread2 -object iothread,id=iothread3 -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=/home/data1.qcow2 2. Block mirror: { "execute": "drive-mirror", "arguments": { "device": "drive_image1", "target":"destination-1.image", "sync": "full"} } 3. Hot plug {"execute":"device_add","arguments":{"driver":"scsi-hd","drive":"drive_image2", "bus":"virtio_scsi_pci0.0","id":"ddisk_1"}} 4. Hot unplug: { "execute": "device_del", "arguments": { "id": "ddisk_1" }} 5. Quit qemu: (qemu) quit Result: (qemu) quit qemu-kvm: block/io.c:250: bdrv_co_yield_to_drain: Assertion `data.done' failed. Aborted (core dumped) Verified on qemu-kvm-rhev-2.10.0-12.el7.x86_64: Steps same as comment 16. Result: Hot plug and unplug success, qemu works well. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:1104 |