Bug 1903511 - no response on QMP command 'block_resize'
Summary: no response on QMP command 'block_resize'
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: 8.4
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: rc
: 8.4
Assignee: Kevin Wolf
QA Contact: qing.wang
URL:
Whiteboard:
: 1948296 (view as bug list)
Depends On:
Blocks: 1948532
TreeView+ depends on / blocked
 
Reported: 2020-12-02 08:36 UTC by qing.wang
Modified: 2021-05-25 06:45 UTC (History)
14 users (show)

Fixed In Version: qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:45:10 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description qing.wang 2020-12-02 08:36:12 UTC
Description of problem:
hotplug one disk then execute block_resize command with qmp, there is no response on this command the disk not be extended.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)
4.18.0-252.el8.dt4.x86_64
qemu-kvm-common-5.2.0-0.module+el8.4.0+8855+a9e237a9.x86_64


How reproducible:
100%

Steps to Reproduce:
1.create image
qemu-img create -f qcow2 /home/kvm_autotest_root/images/storage0.qcow2 10G

2.hotplug one disk
{"execute":"qmp_capabilities"}
  {"execute": "device_add", "arguments": {"id": "virtio_scsi_pci0", "driver": "virtio-scsi-pci", "bus": "pci.0", "addr": "0x6", "iothread": "iothread1"}, "id": "ObvkQjyd"}
  {"execute": "blockdev-add", "arguments": {"node-name": "file_stg0", "driver": "file", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/home/kvm_autotest_root/images/storage0.qcow2", "cache": {"direct": true, "no-flush": false}}, "id": "Jaw90cpP"}
  {"execute": "blockdev-add", "arguments": {"node-name": "drive_stg0", "driver": "qcow2", "read-only": false, "cache": {"direct": true, "no-flush": false}, "file": "file_stg0"}, "id": "Jn0EUb6J"}
  {"execute": "device_add", "arguments": {"driver": "scsi-hd", "id": "stg0", "drive": "drive_stg0", "write-cache": "on", "serial": "TARGET_DISK0", "bus": "virtio_scsi_pci0.0"}, "id": "oiyRS0nF"}

3.check in guest
lsblk 
the disk exist in guest and size is 10G

4.execute block_resize command
{"execute": "block_resize", "arguments": {"node-name": "drive_stg0", "size": 16106127360}, "id": "ZFnaEVuL"}

Actual results:
no response 

Expected results:
get response like following and check in guest it show size 15g
{"return": {}, "id": "ZFnaEVuL"}

Additional info:

automation reproduce:
python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_block_resize.one_pci  --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx --customsparams="vm_mem_limit = 12G\nimage_aio=threads" --clone=no

No issue found on 
Red Hat Enterprise Linux release 8.4 Beta (Ootpa)
4.18.0-250.el8.dt4.x86_64
qemu-kvm-common-4.2.0-37.module+el8.4.0+8837+c89bcfe6.x86_64

Comment 2 Kevin Wolf 2020-12-03 10:44:23 UTC
I could reproduce this on current upstream master.

It looks like a deadlock in bdrv_co_yield_to_drain(), where we schedule the drain operation in the iothread while holding its AioContext lock:

(gdb) p qmp_dispatcher_co_busy 
$1 = true
(gdb) p qmp_dispatcher_co
$2 = (Coroutine *) 0x562bc51b3760
(gdb) source scripts/qemu-gdb.py 
(gdb) qemu coroutine 0x562bc51b3760
#0  0x0000562bc45ee076 in qemu_coroutine_switch (from_=0x562bc51b3760, to_=0x7fa03d95aec0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
#1  0x0000562bc45fa5dc in qemu_coroutine_yield () at ../util/qemu-coroutine.c:193
#2  0x0000562bc44923f8 in bdrv_co_yield_to_drain (bs=0x562bc64b7200, begin=true, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true, drained_end_counter=0x0)
    at ../block/io.c:374
#3  0x0000562bc449252a in bdrv_do_drained_begin (bs=0x562bc64b7200, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at ../block/io.c:401
#4  0x0000562bc4492757 in bdrv_drained_begin (bs=0x562bc64b7200) at ../block/io.c:434
#5  0x0000562bc44d64a6 in blk_drain (blk=0x562bc66e6400) at ../block/block-backend.c:1718
#6  0x0000562bc44d4756 in blk_remove_bs (blk=0x562bc66e6400) at ../block/block-backend.c:828
#7  0x0000562bc44d3c1a in blk_delete (blk=0x562bc66e6400) at ../block/block-backend.c:447
#8  0x0000562bc44d3ebe in blk_unref (blk=0x562bc66e6400) at ../block/block-backend.c:502
#9  0x0000562bc446233a in qmp_block_resize (has_device=false, device=0x0, has_node_name=true, node_name=0x562bc5cedcd0 "drive_stg0", size=16106127360, errp=0x7fa03cff7cf0)
    at ../blockdev.c:2492
#10 0x0000562bc45b1d31 in qmp_marshal_block_resize (args=0x7fa0240078a0, ret=0x7fa03cff7d88, errp=0x7fa03cff7d80) at qapi/qapi-commands-block-core.c:222
#11 0x0000562bc4609ba6 in qmp_dispatch (cmds=0x562bc4e13aa0 <qmp_commands>, request=0x7fa024005b20, allow_oob=false, cur_mon=0x562bc546ab40) at ../qapi/qmp-dispatch.c:188
#12 0x0000562bc44417a1 in monitor_qmp_dispatch (mon=0x562bc546ab40, req=0x7fa024005b20) at ../monitor/qmp.c:145
#13 0x0000562bc4441c47 in monitor_qmp_dispatcher_co (data=0x0) at ../monitor/qmp.c:282
#14 0x0000562bc45edd83 in coroutine_trampoline (i0=-988072096, i1=22059) at ../util/coroutine-ucontext.c:173

(gdb) thread apply all bt
[...]
Thread 3 (Thread 0x7fa03cef6640 (LWP 26394) "qemu-system-x86"):
#0  0x00007fa04a637ea0 in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x00007fa04a6307f1 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x0000562bc460dc09 in qemu_mutex_lock_impl (mutex=0x562bc544d060, file=0x562bc49f3311 "../util/async.c", line=645) at ../util/qemu-thread-posix.c:79
#3  0x0000562bc45df9d8 in aio_context_acquire (ctx=0x562bc544d000) at ../util/async.c:645
#4  0x0000562bc43dc445 in virtio_scsi_acquire (s=0x562bc53cedd0) at /home/kwolf/source/qemu/include/hw/virtio/virtio-scsi.h:136
#5  0x0000562bc43dc65a in virtio_scsi_data_plane_handle_cmd (vdev=0x562bc53cedd0, vq=0x562bc621b5c0) at ../hw/scsi/virtio-scsi-dataplane.c:58
#6  0x0000562bc43c25bd in virtio_queue_notify_aio_vq (vq=0x562bc621b5c0) at ../hw/virtio/virtio.c:2326
#7  0x0000562bc43c5611 in virtio_queue_host_notifier_aio_poll (opaque=0x562bc621b638) at ../hw/virtio/virtio.c:3550
#8  0x0000562bc45e9049 in run_poll_handlers_once (ctx=0x562bc544d000, now=5277040110206, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:399
#9  0x0000562bc45e9336 in run_poll_handlers (ctx=0x562bc544d000, max_ns=4000, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:493
#10 0x0000562bc45e9514 in try_poll_mode (ctx=0x562bc544d000, timeout=0x7fa03cef13c8) at ../util/aio-posix.c:536
#11 0x0000562bc45e9625 in aio_poll (ctx=0x562bc544d000, blocking=true) at ../util/aio-posix.c:577
#12 0x0000562bc446f821 in iothread_run (opaque=0x562bc544cc00) at ../iothread.c:73
#13 0x0000562bc460e852 in qemu_thread_start (args=0x562bc544c8e0) at ../util/qemu-thread-posix.c:521
#14 0x00007fa04a62e3f9 in start_thread () at /lib64/libpthread.so.0
#15 0x00007fa04a55b903 in clone () at /lib64/libc.so.6

Comment 3 Kevin Wolf 2020-12-04 13:30:47 UTC
Fix posted upstream:

https://lists.gnu.org/archive/html/qemu-block/2020-12/msg00137.html

Comment 4 qing.wang 2021-01-12 08:47:09 UTC
Hit same issue on {'kvm_version': '4.18.0-262.el8.x86_64', 'qemu_version': 'qemu-kvm-core-5.2.0-2.module+el8.4.0+9186+ec44380f.x86_64'}

python ConfigTest.py --testcase=block_hotplug.block_scsi.fmt_qcow2.default.with_plug.with_block_resize.one_pci  --iothread_scheme=roundrobin --nr_iothreads=2  --platform=x86_64 --guestname=RHEL.7.9 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=q35 --customsparams="vm_mem_limit = 12G\nimage_aio=native" --clone=no

Comment 5 CongLi 2021-04-12 00:57:07 UTC
*** Bug 1948296 has been marked as a duplicate of this bug. ***

Comment 6 Eyal Shenitzky 2021-04-12 05:24:57 UTC
This bug affects a basic disk resize flow in oVirt.
Severity should be raised to 'urgent' and it should be fixed in RHEL-8.4 also.

Comment 7 Nir Soffer 2021-04-12 11:05:32 UTC
Adding more info on the RHV side from bug 1948296:

This breaks basic functionallity in RHV in a very bad way.

Steps to Reproduce:
1. Start vm with a qcow2 disk using virtio or virtio-scsi and iothread
2. Try to resize one of the qcow2 disks in RHV

Actual results:
Libvirt blockResize call does not return.
In vdsm log, warning about blocked worker every 60 seconds
Vm cannot be stopped
qemu process cannot be terminated (kill)
The only way to recover is to kill it (kill -9)

The only way to avoid this bug is to disable iothread, which is
enabled by default in RHV.

Comment 14 leidwang@redhat.com 2021-04-14 09:47:50 UTC
Use qemu-kvm-5.2.0-15.module+el8.4.0+10650+50781ca0 to execute storage vm migration test plan, and the test passed.

Comment 15 qing.wang 2021-04-15 08:23:31 UTC
Passed relevant case testing:
python ConfigTest.py --testcase=block_hotplug..with_plug.with_block_resize.one_pci  --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk,virtio_scsi --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx,q35 --clone=no


Regression test for blk+i440fx and q35+scsi
python ConfigTest.py --category=virtual_block_device  --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_blk --nicmodel=virtio_net --imageformat=qcow2 --machines=i440fx --customsparams="vm_mem_limit = 12G\nimage_aio=threads" 

python ConfigTest.py --category=virtual_block_device  --iothread_scheme=roundrobin --nr_iothreads=2 --platform=x86_64 --guestname=RHEL.8.4.0 --driveformat=virtio_scsi --nicmodel=virtio_net --imageformat=qcow2 --machines=q35 --customsparams="vm_mem_limit = 12G\nimage_aio=native"

Comment 18 qing.wang 2021-04-16 00:57:40 UTC
This bug mark to verified according to test result of comment 14,15

Comment 20 errata-xmlrpc 2021-05-25 06:45:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.