Created attachment 1771193 [details] logs and qemu core dump Description of problem: block-resize job for qcow2 disk never finish. The only way to complete the job is to kill -9 qemu. It seems that iothread is deadlocked: Thread 3 (Thread 0x7fe9fec25700 (LWP 70060)): #0 0x00007fea0b8e565d in __lll_lock_wait () from target:/lib64/libpthread.so.0 #1 0x00007fea0b8dea44 in pthread_mutex_lock () from target:/lib64/libpthread.so.0 #2 0x00005653cd1da40d in qemu_mutex_lock_impl (mutex=0x5653cfa5e420, file=0x5653cd5352e0 "../util/async.c", line=645) at ../util/qemu-thread-posix.c:79 #3 0x00005653ccfeaf70 in virtio_scsi_acquire (s=0x5653d1333510) at /usr/src/debug/qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64/include/hw/virtio/virtio-scsi.h:136 #4 virtio_scsi_data_plane_handle_event (vdev=<optimized out>, vq=0x7fe9fc55b0a8) at ../hw/scsi/virtio-scsi-dataplane.c:84 #5 0x00005653cd030a6e in virtio_queue_notify_aio_vq (vq=<optimized out>) at ../hw/virtio/virtio.c:2326 #6 0x00005653cd1d5a9f in run_poll_handlers_once (timeout=<synthetic pointer>, now=4854163337034, ctx=0x5653cfa5e3c0) at ../util/aio-posix.c:397 #7 run_poll_handlers (timeout=<synthetic pointer>, max_ns=4000, ctx=0x5653cfa5e3c0) at ../util/aio-posix.c:493 #8 try_poll_mode (timeout=<synthetic pointer>, ctx=0x5653cfa5e3c0) at ../util/aio-posix.c:536 #9 aio_poll (ctx=0x5653cfa5e3c0, blocking=blocking@entry=true) at ../util/aio-posix.c:577 #10 0x00005653cd0c8eca in iothread_run (opaque=0x5653cf985f20) at ../iothread.c:73 Version-Release number of selected component (if applicable): qemu-kvm-5.2.0-14.module+el8.4.0+10425+ad586fa5.x86_64 libvirt-daemon-7.0.0-13.module+el8.4.0+10604+5608c2b4.x86_64 How reproducible: Always (4/4) Steps to Reproduce: 1. Start vm with a qcow2 disk using virtio or virtio-scsi and iothread 2. Try to resize one of the qcow2 disks in RHV Actual results: Libvirt blockResize call does not return. In vdsm log, warning about blocked worker every 60 seconds Vm cannot be stopped qemu process cannot be terminated (kill) The only way to recover is to kill it (kill -9) Expected results: Disk resized quickly Additional info: Workaround: disbale iothread. Vdsm log: 2021-04-11 18:29:40,871+0300 INFO (jsonrpc/7) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'fb7797bd-3cca-4baa-bcb3-35e8851015f7', 'volumeID': '31fdb830-422f-4f64-8d24-ae54bfcaca27', 'imageID': 'b48e41d8-0b37-4511-bf81-bfadb27f01ee', 'domainID': 'a682d0d8-b135-43ca-ab3c-48155cebd40b'}, newSize='3221225472') from=::ffff:192.168.122.11,52826, flow_id=098c702f-3008-4c0a-b4c6-956a4c6d5dbf, vmId=46ecb982-50c6-406a-bc5c-16bf9b842ea4 (api:48) Vdsm call libvirt like this: flags = libvirt.VIR_DOMAIN_BLOCK_RESIZE_BYTES self.log.debug("Calling blockResise drive=%s size=%s flags=%s", drive.name, newSizeBytes, flags) self._dom.blockResize(drive.name, newSizeBytes, flags=flags) Libvirt log: 2021-04-11 15:29:40.895+0000: 34970: debug : virDomainBlockResize:6057 : dom=0x7fc0c800b460, (VM: name=resize-disk, uuid=46ecb982-50c6-406a-bc5c-16bf9b842ea4), disk=vdb, size=3221225472, flags=0x1 2021-04-11 15:29:40.895+0000: 34970: info : qemuMonitorSend:950 : QEMU_MONITOR_SEND_MSG: mon=0x7fc11804a1b0 msg={"execute":" block_resize","arguments":{"node-name":"libvirt-2-format","size":3221225472},"id":"libvirt-435"} fd=-1 No reply seen for id libvirt-435. Then lot of errors like: 2021-04-11 15:30:18.532+0000: 34971: error : qemuDomainObjBeginJobInternal:975 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockResize) ... 2021-04-11 15:31:53.614+0000: 34967: warning : qemuDomainObjBeginJobInternal:953 : Cannot start job (query, none, none) for domain resize-disk; current job is (modify, none, none) owned by (34970 remoteDispatchDomainBlockResize, 0 <null>, 0 <null> (flags=0x0)) for (132s, 0s, 0s) ... 2021-04-11 15:44:24.066+0000: 34968: error : qemuDomainObjBeginJobInternal:975 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockResize)
*** This bug has been marked as a duplicate of bug 1903511 ***
(In reply to CongLi from comment #1) Thanks CongLi, But note that you converted a bug with: - Severity: Urgent - Dependent Products: Red Hat Enterprise Virtualization Manager To a bug with: - Severity: High - Dependent Products: (none) This bug breaks basic functionallity in RHV, in a very bad way, and we hope to get it fixed in 8.4.
(In reply to Nir Soffer from comment #2) > (In reply to CongLi from comment #1) > Thanks CongLi, > > But note that you converted a bug with: > - Severity: Urgent > - Dependent Products: Red Hat Enterprise Virtualization Manager > > To a bug with: > - Severity: High > - Dependent Products: (none) > > This bug breaks basic functionallity in RHV, in a very bad way, > and we hope to get it fixed in 8.4. Thanks Nir, I will take care of the info next time when close a bug and update the corresponding bug.