Description of problem: VM remains in paused state when trying to write some data on a resized disk resides on iscsi storage domain. The same test passed till rhv-4.4.8-4 but failed on the latest rhv-4.4.8-5 (as well as on 4.4.7-2 async). It does not happen with virtio-scsi, only with virtio disk. Steps to reproduce: 1. Create VM 2. Attach 1g raw preallocated disk 3. Resize disk to 2g from RHV UI 4. Wait until the resize is completed 5. Check that the disk was resized in the guest 6. Write 2g of data to disk The VM pauses after starting the write. More details: Run 1: vdsm log: 2021-08-18 23:20:38,945+0300 INFO (jsonrpc/7) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': 'ba4364d4-bcff-4428-8ecb-006a9e48eaea', 'imageID': '4226aae1-108f-4150-801f-f9e73018855b', 'domainID' : '0b9240a5-862f-4573-9ba0-34c8f853269c'}, newSize='2147483648') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, vmId=8f5db04a-41cd-4175-a8f3-1dc8be44bb04 (api:48) 2021-08-18 23:20:38,946+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') moving from state preparing -> state preparing (task:624) 2021-08-18 23:20:38,947+0300 INFO (jsonrpc/7) [vdsm.api] START refreshVolume(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='4226aae1-108f-4150-801f-f9e73018855b', volUUID='ba4364d4-bcff-4428-8ecb-0 06a9e48eaea') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=530dd786-2452-4aea-ad62-4e0c2343b21e (api:48) 2021-08-18 23:20:38,947+0300 INFO (jsonrpc/7) [storage.LVM] Refreshing LVs (vg=0b9240a5-862f-4573-9ba0-34c8f853269c, lvs=('ba4364d4-bcff-4428-8ecb-006a9e48eaea',)) (lvm:1816) 2021-08-18 23:20:39,076+0300 INFO (jsonrpc/7) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=530dd786-2452-4aea-ad62-4e0c2343b21e (api:54) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') finished: None (task:1202) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') moving from state finished -> state finished (task:624) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') ref 0 aborting False (task:1000) 2021-08-18 23:20:39,077+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') moving from state finished -> state preparing (task:624) 2021-08-18 23:20:39,077+0300 INFO (jsonrpc/7) [vdsm.api] START getVolumeSize(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='4226aae1-108f-4150-801f-f9e73018855b', volUUID='ba4364d4-bcff-4428-8ecb-0 06a9e48eaea') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=bd720612-8fcd-49a5-8df3-051cf0b5eb7c (api:48) 2021-08-18 23:20:39,077+0300 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '2147483648', 'truesize': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=bd720612-8fcd-49a5-8df3-0 51cf0b5eb7c (api:54) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') finished: {'apparentsize': '2147483648', 'truesize': '2147483648'} (task:1202) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') moving from state finished -> state finished (task:624) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') ref 0 aborting False (task:1000) 2021-08-18 23:20:39,089+0300 INFO (jsonrpc/7) [api.virt] FINISH diskSizeExtend return={'status': {'code': 0, 'message': 'Done'}, 'size': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, vmId=8f5db04a-41cd-4175-a8f3-1dc8be44bb04 (api:54) In the guest: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2G 0 disk sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ├─vda1 252:1 0 1M 0 part ├─vda2 252:2 0 100M 0 part /boot/efi └─vda3 252:3 0 9.9G 0 part / vdb 252:16 0 2G 0 disk # dd if=/dev/zero bs=1M count=2048 of=/dev/vdb conv=fsync After write was started, we see many events for "abnormal vmstop device" 2021-08-18 23:20:57,294+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') abnormal vm stop device ua-4226aae1-108f-4150-801f-f9e73018855b error (vm:5188) 2021-08-18 23:20:57,294+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') CPU stopped: onIOError (vm:6023) 2021-08-18 23:20:57,304+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') CPU stopped: onSuspend (vm:6023) ... Libvirt log: 1. Successful resize 2021-08-18 20:20:39.079+0000: 1537: debug : virDomainBlockResize:6057 : dom=0x7f1c00002380, (VM: name=resize, uuid=8f5db04a-41cd-4175-a8f3-1dc8be44bb04), disk=vdc, size=2147483648, flags=0x1 2021-08-18 20:20:39.080+0000: 9003: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f1bfc04d0a0 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-6-format","size":2147483648},"id":"libvirt-4187"} len=111 ret=111 errno=0 2021-08-18 20:20:39.085+0000: 9003: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0x7f1bfc04d0a0 reply={"return": {}, "id": "libvirt-4187"} 2. IOError 2021-08-18 20:20:57.290+0000: 9003: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1629318057, "microseconds": 290744}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-6-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}}] 2021-08-18 20:20:57.291+0000: 9003: info : qemuMonitorJSONIOProcessLine:237 : QEMU_MONITOR_RECV_EVENT: mon=0x7f1bfc04d0a0 event={"timestamp": {"seconds": 1629318057, "microseconds": 290744}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": fa lse, "node-name": "libvirt-6-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}} Run 2: vdsm log: In the guest before the resize: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ├─vda1 252:1 0 1M 0 part ├─vda2 252:2 0 100M 0 part /boot/efi └─vda3 252:3 0 9.9G 0 part / vdb 252:16 0 1G 0 disk Testing writing before the resize: [root@vm-13-101 ~]# dd if=/dev/zero bs=1M count=1024 of=/dev/vdb conv=fsync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.65022 s, 405 MB/s Successful resize vdsm.log: 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='ce1276cb-f421-4378-88cb-07657477309d') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='ce1276cb-f421-4378-88cb-07657477309d') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,116+0300 DEBUG (check/loop) [storage.asyncevent] Process <subprocess.Popen object at 0x7fcdc44d7208> terminated (count=1) (asyncevent:660) 2021-08-18 23:48:30,116+0300 DEBUG (check/loop) [storage.check] FINISH check '/dev/3a3e5451-7d1c-4fc0-842a-dbea5bf36158/metadata' (rc=0, elapsed=0.04) (check:359) 2021-08-18 23:48:30,133+0300 INFO (jsonrpc/5) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': '30e5f051-478a-4af8-add9-56327d59115b', 'imageID': 'ace9789b-e320-4e2f-a2bf-85c47fc87f36', 'domainID': '0b9240a5-862f-4573-9ba0-34c8f853269c'}, newSize='2147483648') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, vmId=e3c192b0-46e7-4c9d-bd40-1d7dcc574801 (api:48) 2021-08-18 23:48:30,134+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') moving from state preparing -> state preparing (task:624) 2021-08-18 23:48:30,134+0300 INFO (jsonrpc/5) [vdsm.api] START refreshVolume(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='ace9789b-e320-4e2f-a2bf-85c47fc87f36', volUUID='30e5f051-478a-4af8-add9-56327d59115b') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=7999cf22-2755-4ecd-8cf0-7654faf2defa (api:48) 2021-08-18 23:48:30,135+0300 INFO (jsonrpc/5) [storage.LVM] Refreshing LVs (vg=0b9240a5-862f-4573-9ba0-34c8f853269c, lvs=('30e5f051-478a-4af8-add9-56327d59115b',)) (lvm:1816) 2021-08-18 23:48:30,255+0300 INFO (jsonrpc/5) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=7999cf22-2755-4ecd-8cf0-7654faf2defa (api:54) 2021-08-18 23:48:30,255+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') finished: None (task:1202) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') moving from state finished -> state preparing (task:624) 2021-08-18 23:48:30,257+0300 INFO (jsonrpc/5) [vdsm.api] START getVolumeSize(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='ace9789b-e320-4e2f-a2bf-85c47fc87f36', volUUID='30e5f051-478a-4af8-add9-56327d59115b') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=21be0a49-175d-4ffe-a672-2f1489e1c083 (api:48) 2021-08-18 23:48:30,257+0300 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '2147483648', 'truesize': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=21be0a49-175d-4ffe-a672-2f1489e1c083 (api:54) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') finished: {'apparentsize': '2147483648', 'truesize': '2147483648'} (task:1202) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,278+0300 INFO (jsonrpc/5) [api.virt] FINISH diskSizeExtend return={'status': {'code': 0, 'message': 'Done'}, 'size': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, vmId=e3c192b0-46e7-4c9d-bd40-1d7dcc574801 (api:54) libvirtd.log: 2021-08-18 20:48:30.258+0000: 1536: debug : virDomainBlockResize:6057 : dom=0x7f1ba800be40, (VM: name=resize2, uuid=e3c192b0-46e7-4c9d-bd40-1d7dcc574801), disk=vdb, size=2147483648, flags=0x1 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjBeginJobInternal:843 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7f1ba83be870 name=resize2, current job=none agentJob=none async=none) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjBeginJobInternal:890 : Started job: modify (async=none vm=0x7f1ba83be870 name=resize2) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjEnterMonitorInternal:5809 : Entering monitor (mon=0x7f1bfc04d620 vm=0x7f1ba83be870 name=resize2) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuMonitorBlockResize:2223 : device=<null> nodename=libvirt-4-format size=2147483648 2021-08-18 20:48:30.258+0000: 1536: debug : qemuMonitorBlockResize:2225 : mon:0x7f1bfc04d620 vm:0x7f1ba83be870 fd:44 2021-08-18 20:48:30.258+0000: 1536: info : qemuMonitorSend:950 : QEMU_MONITOR_SEND_MSG: mon=0x7f1bfc04d620 msg={"execute":"block_resize","arguments":{"node-name":"libvirt-4-format","size":2147483648},"id":"libvirt-445"} fd=-1 2021-08-18 20:48:30.258+0000: 51419: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f1bfc04d620 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-4-format","size":2147483648},"id":"libvirt-445"} len=110 ret=110 errno=0 2021-08-18 20:48:30.278+0000: 51419: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"return": {}, "id": "libvirt-445"}] 2021-08-18 20:48:30.278+0000: 51419: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0x7f1bfc04d620 reply={"return": {}, "id": "libvirt-445"} In the guest, after the resize: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ├─vda1 252:1 0 1M 0 part ├─vda2 252:2 0 100M 0 part /boot/efi └─vda3 252:3 0 9.9G 0 part / vdb 252:16 0 2G 0 disk Starting the write: # dd if=/dev/zero bs=1M count=2048 of=/dev/vdb conv=fsync (VM paused) After the write, many libvirt events for abnormal stop: vdsm.log: 2021-08-18 23:48:55,041+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') abnormal vm stop device ua-ace9789b-e320-4e2f-a2bf-85c47fc87f36 error (vm:5188) 2021-08-18 23:48:55,041+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onIOError (vm:6023) 2021-08-18 23:48:55,052+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onSuspend (vm:6023) 2021-08-18 23:48:55,084+0300 WARN (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') device vdb reported I/O error (vm:4021) 2021-08-18 23:48:55,085+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') abnormal vm stop device ua-ace9789b-e320-4e2f-a2bf-85c47fc87f36 error (vm:5188) 2021-08-18 23:48:55,085+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onIOError (vm:6023) libvird.log: Many errors like: 2021-08-18 20:48:55.038+0000: 51419: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1629319735, "microseconds": 38077}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-4-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}}] 2021-08-18 20:48:55.038+0000: 51419: info : qemuMonitorJSONIOProcessLine:237 : QEMU_MONITOR_RECV_EVENT: mon=0x7f1bfc04d620 event={"timestamp": {"seconds": 1629319735, "microseconds": 38077}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-4-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}} This looks like a qemu bug, since same operation works fine in virtio-scsi. Expected results: VM shoudn't be paused during resize/extend operation
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Our automation tests passed on rhv-4.4.8-6/7 with qemu-kvm-5.2.0-16.module+el8.4.0+12393+838d9165.8.x86_64. Moving to 'Verified'.
Failing QA for oVirt: we haven't a qemu-kvm package including a fix for this bug. We miss both the 8.4 latest build and the 8.5 one for Stream
(In reply to Sandro Bonazzola from comment #9) > Failing QA for oVirt: we haven't a qemu-kvm package including a fix for this > bug. > We miss both the 8.4 latest build and the 8.5 one for Stream Are you using a different setup than the QE?
Failing QA for oVirt: we haven't a qemu-kvm package including a fix for this bug. QE is testing on RHV, which is not really the same thing. We usually don't care...but here we do.
qemu-kvm-6.0.0-29.el8s shipped live, included also in ovirt-node-ng-image-update 4.4.8.3. Closing current release.