Description of problem: hotplug one disk then execute block_resize command with qmp, there is no response on this command the disk not be extended. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux release 8.4 Beta (Ootpa) 4.18.0-252.el8.dt4.x86_64 qemu-kvm-common-5.2.0-0.module+el8.4.0+8855+a9e237a9.x86_64 How reproducible: 100% Steps to Reproduce: 1.create image qemu-img create -f qcow2 /home/kvm_autotest_root/images/storage0.qcow2 10G 2.hotplug one disk {"execute":"qmp_capabilities"} {"execute": "device_add", "arguments": {"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pci.0", "addr": "0x6", "iothread": "iothread1"}, "id": "ObvkQjyd"} {"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/storage0.qcow2", "cache": {"direct": true, "no-flush": false}}, "id": "Jaw90cpP"} {"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}, "id": "Jn0EUb6J"} {"execute": "device_add", "arguments": {"driver": "scsi-hd", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "serial": "TARGET_DISK0", "bus": "virtio_scsi_pci0.0"}, "id": "oiyRS0nF"} 3.check in guest lsblk the disk exist in guest and size is 10G 4.execute block_resize command {"execute": "block_resize", "arguments": {"node-name": "drive_stg0", "size": 16106127360}, "id": "ZFnaEVuL"} Actual results: no response Expected results: get response like following and check in guest it show size 15g {"return": {}, "id": "ZFnaEVuL"} Additional info: automation reproduce: python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_block_resize.one_pci --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx --customsparams="vm_mem_limit = 12G\nimage_aio=threads" --clone=no No issue found on Red Hat Enterprise Linux release 8.4 Beta (Ootpa) 4.18.0-250.el8.dt4.x86_64 qemu-kvm-common-4.2.0-37.module+el8.4.0+8837+c89bcfe6.x86_64
I could reproduce this on current upstream master. It looks like a deadlock in bdrv_co_yield_to_drain(), where we schedule the drain operation in the iothread while holding its AioContext lock: (gdb) p qmp_dispatcher_co_busy $1 = true (gdb) p qmp_dispatcher_co $2 = (Coroutine *) 0x562bc51b3760 (gdb) source scripts/qemu-gdb.py (gdb) qemu coroutine 0x562bc51b3760 #0 0x0000562bc45ee076 in qemu_coroutine_switch (from_=0x562bc51b3760, to_=0x7fa03d95aec0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302 #1 0x0000562bc45fa5dc in qemu_coroutine_yield () at ../util/qemu-coroutine.c:193 #2 0x0000562bc44923f8 in bdrv_co_yield_to_drain (bs=0x562bc64b7200, begin=true, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true, drained_end_counter=0x0) at ../block/io.c:374 #3 0x0000562bc449252a in bdrv_do_drained_begin (bs=0x562bc64b7200, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at ../block/io.c:401 #4 0x0000562bc4492757 in bdrv_drained_begin (bs=0x562bc64b7200) at ../block/io.c:434 #5 0x0000562bc44d64a6 in blk_drain (blk=0x562bc66e6400) at ../block/block-backend.c:1718 #6 0x0000562bc44d4756 in blk_remove_bs (blk=0x562bc66e6400) at ../block/block-backend.c:828 #7 0x0000562bc44d3c1a in blk_delete (blk=0x562bc66e6400) at ../block/block-backend.c:447 #8 0x0000562bc44d3ebe in blk_unref (blk=0x562bc66e6400) at ../block/block-backend.c:502 #9 0x0000562bc446233a in qmp_block_resize (has_device=false, device=0x0, has_node_name=true, node_name=0x562bc5cedcd0 "drive_stg0", size=16106127360, errp=0x7fa03cff7cf0) at ../blockdev.c:2492 #10 0x0000562bc45b1d31 in qmp_marshal_block_resize (args=0x7fa0240078a0, ret=0x7fa03cff7d88, errp=0x7fa03cff7d80) at qapi/qapi-commands-block-core.c:222 #11 0x0000562bc4609ba6 in qmp_dispatch (cmds=0x562bc4e13aa0 <qmp_commands>, request=0x7fa024005b20, allow_oob=false, cur_mon=0x562bc546ab40) at ../qapi/qmp-dispatch.c:188 #12 0x0000562bc44417a1 in monitor_qmp_dispatch (mon=0x562bc546ab40, req=0x7fa024005b20) at ../monitor/qmp.c:145 #13 0x0000562bc4441c47 in monitor_qmp_dispatcher_co (data=0x0) at ../monitor/qmp.c:282 #14 0x0000562bc45edd83 in coroutine_trampoline (i0=-988072096, i1=22059) at ../util/coroutine-ucontext.c:173 (gdb) thread apply all bt [...] Thread 3 (Thread 0x7fa03cef6640 (LWP 26394) "qemu-system-x86"): #0 0x00007fa04a637ea0 in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007fa04a6307f1 in pthread_mutex_lock () at /lib64/libpthread.so.0 #2 0x0000562bc460dc09 in qemu_mutex_lock_impl (mutex=0x562bc544d060, file=0x562bc49f3311 "../util/async.c", line=645) at ../util/qemu-thread-posix.c:79 #3 0x0000562bc45df9d8 in aio_context_acquire (ctx=0x562bc544d000) at ../util/async.c:645 #4 0x0000562bc43dc445 in virtio_scsi_acquire (s=0x562bc53cedd0) at /home/kwolf/source/qemu/include/hw/virtio/virtio-scsi.h:136 #5 0x0000562bc43dc65a in virtio_scsi_data_plane_handle_cmd (vdev=0x562bc53cedd0, vq=0x562bc621b5c0) at ../hw/scsi/virtio-scsi-dataplane.c:58 #6 0x0000562bc43c25bd in virtio_queue_notify_aio_vq (vq=0x562bc621b5c0) at ../hw/virtio/virtio.c:2326 #7 0x0000562bc43c5611 in virtio_queue_host_notifier_aio_poll (opaque=0x562bc621b638) at ../hw/virtio/virtio.c:3550 #8 0x0000562bc45e9049 in run_poll_handlers_once (ctx=0x562bc544d000, now=5277040110206, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:399 #9 0x0000562bc45e9336 in run_poll_handlers (ctx=0x562bc544d000, max_ns=4000, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:493 #10 0x0000562bc45e9514 in try_poll_mode (ctx=0x562bc544d000, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:536 #11 0x0000562bc45e9625 in aio_poll (ctx=0x562bc544d000, blocking=true) at ../util/aio-posix.c:577 #12 0x0000562bc446f821 in iothread_run (opaque=0x562bc544cc00) at ../iothread.c:73 #13 0x0000562bc460e852 in qemu_thread_start (args=0x562bc544c8e0) at ../util/qemu-thread-posix.c:521 #14 0x00007fa04a62e3f9 in start_thread () at /lib64/libpthread.so.0 #15 0x00007fa04a55b903 in clone () at /lib64/libc.so.6
Fix posted upstream: https://lists.gnu.org/archive/html/qemu-block/2020-12/msg00137.html
Hit same issue on {'kvm_version': '4.18.0-262.el8.x86_64', 'qemu_version': 'qemu-kvm-core-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64'} python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_block_resize.one_pci --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.7.9 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=q35 --customsparams="vm_mem_limit = 12G\nimage_aio=native" --clone=no
*** Bug 1948296 has been marked as a duplicate of this bug. ***
This bug affects a basic disk resize flow in oVirt. Severity should be raised to 'urgent' and it should be fixed in RHEL-8.4 also.
Adding more info on the RHV side from bug 1948296: This breaks basic functionallity in RHV in a very bad way. Steps to Reproduce: 1. Start vm with a qcow2 disk using virtio or virtio-scsi and iothread 2. Try to resize one of the qcow2 disks in RHV Actual results: Libvirt blockResize call does not return. In vdsm log, warning about blocked worker every 60 seconds Vm cannot be stopped qemu process cannot be terminated (kill) The only way to recover is to kill it (kill -9) The only way to avoid this bug is to disable iothread, which is enabled by default in RHV.
Use qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0 to execute storage vm migration test plan, and the test passed.
Passed relevant case testing: python ConfigTest.py --testcase=block_hotplug..with_plug.with_block_resize.one_pci --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk,virtio_scsi --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx,q35 --clone=no Regression test for blk+i440fx and q35+scsi python ConfigTest.py --category=virtual_block_device --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx --customsparams="vm_mem_limit = 12G\nimage_aio=threads" python ConfigTest.py --category=virtual_block_device --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_scsi --nicmodel=virtio_net --imageformat=qcow2 --machines=q35 --customsparams="vm_mem_limit = 12G\nimage_aio=native"
This bug mark to verified according to test result of comment 14,15
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098