Created attachment 1814774 [details] Logs Description of problem: As part of our Tier1 automation regression suite, we found 2 cases where VM remains in paused state when trying to write some data on a resized disk resides on iscsi storage domain. The same test passed till rhv-4.4.8-4 but failed on the latest rhv-4.4.8-5 (as well as on 4.4.7-2 async). In engine log we can see a general error: 2021-08-13 22:55:33,476+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-1) [73350b3e] EVENT_ID: VM_PAUSED_ERROR(139), VM vm_TestCase5062_1322520552 has been paused due to unknown storage error. VDSM log: 2021-08-13 22:55:33,480+0300 WARN (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') device vdb reported I/O error (vm:4021) 2021-08-13 22:56:42,972+0300 WARN (jsonrpc/6) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') monitor became unresponsive (command timeout, age=69.87595747699925) (vm:6036) How reproducible: 100% by automation and manual steps. Steps to Reproduce: 1. Create a VM from rhel8.3 template 2. Run the VM 3. Add a 1G disk on iscsi with VirtIO interface (can be thin or preallocated) 4. Extend this disk by 1G (total size should be 2G) 5. SSH to the VM 6. Find the lv name (/dev/vdb) 7. Create a label: /sbin/parted /dev/vdb --script -- mklabel gpt 8. Create a partition /sbin/parted /dev/vdb --script -- mkpart primary ext4 0 100% 9. Create a FS: mkfs.ext4 /dev/vdb1 10. Create a target directory: mkdir -p /mount-point_test 11. Find the UUID of /dev/vdb1 12. Insert to fstab: echo "UUID="9702d603-7698-43fa-91ce-bf577a699f46" /mount-point_test ext4 defaults 0 0" >> /etc/fstab 13. mount -a 14. run dd - Copy less than the supposed extended size so the lv will not grow bigger than the extended size: /bin/dd bs=1M count=1358 if=/dev/vda of=/mount-point_test/written_test_storage status=none Actual results: VM gets stuck and remains in paused state. Run VM don't work, only power off and run again. Expected results: dd operation should succeed and VM should not remain in paused state. Additional info: engine, vdsm, art, libvirt and qemu logs are attached.
Evelina, is this issue occurs without extending the disk? Can we narrow down the flow?
(In reply to Eyal Shenitzky from comment #2) > Evelina, is this issue occurs without extending the disk? > Can we narrow down the flow? I also see the following warning on the logs - 2021-08-13 22:55:33,356+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,356+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,361+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onSuspend (vm:6023) 2021-08-13 22:55:33,480+0300 WARN (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') device vdb reported I/O error (vm:4021) 2021-08-13 22:55:33,482+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,482+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,485+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,485+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,488+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,488+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,491+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,491+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,494+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,494+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,497+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,497+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,500+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,500+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,503+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,503+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,505+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,505+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,508+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,508+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) 2021-08-13 22:55:33,511+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) 2021-08-13 22:55:33,511+0300 INFO (libvirt/events) [virt.vm] (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError (vm:6023) Arik, any known issues? Evelina, is it occurs only on block-based storage?
(In reply to Eyal Shenitzky from comment #3) > (In reply to Eyal Shenitzky from comment #2) > > Evelina, is this issue occurs without extending the disk? > > Can we narrow down the flow? Currently trying without extend, I added the exact flow we do in our automation tests. I tried simply add a disk, resize it and run a random dd but it didn't reproduce. I'll add results without extending when I'm done testing it. > > I also see the following warning on the logs - > > 2021-08-13 22:55:33,356+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,356+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,361+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onSuspend > (vm:6023) > 2021-08-13 22:55:33,480+0300 WARN (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') device vdb reported I/O error > (vm:4021) > 2021-08-13 22:55:33,482+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,482+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,485+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,485+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,488+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,488+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,491+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,491+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,494+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,494+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,497+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,497+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,500+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,500+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,503+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,503+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,505+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,505+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,508+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,508+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > 2021-08-13 22:55:33,511+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') abnormal vm stop device > ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988 error (vm:5188) > 2021-08-13 22:55:33,511+0300 INFO (libvirt/events) [virt.vm] > (vmId='e5537a63-1e3c-4bfd-a080-789735db44e2') CPU stopped: onIOError > (vm:6023) > > > Arik, any known issues? > Evelina, is it occurs only on block-based storage? Yes, only on iscsi (on FC it works).
No, I'm not aware of this issue
(In reply to Evelina Shames from comment #4) > (In reply to Eyal Shenitzky from comment #3) > > (In reply to Eyal Shenitzky from comment #2) > > > Evelina, is this issue occurs without extending the disk? > > > Can we narrow down the flow? > Currently trying without extend, I added the exact flow we do in our > automation tests. Same flow without resizing works fine.
Lukas shared on an e-mail thread the bugs that drove changes since the last time this was seen working. Adding below for completion: Bug 1983113 - QEMU 6.0.0 socket_get_fd() fails with the error "socket_get_fd: too many connections" [rhel-8.4.0.z] Bug 1976256 - Pivot failed: Job ... in state 'standby' cannot accept command verb 'complete' [rhel-8.4.0.z] Bug 1973269 - RFE: Add support for specifying cluster_size/subcluster allocation for qcow2 volumes via virStorageVolCreateXML [rhel-8.4.0.z] Bug 1955187 - RHV VM pauses due to 'qemu-kvm' getting EINVAL on i/o to a direct lun with scsi passthrough enabled [rhel-8.4.0.z] Bug 1973829 - [incremental backup] qemu-kvm hangs when Rebooting the VM during full backup [rhel-8.4.0.z] Kevin, could you take a look, let us know if there are any logs / traces that could help us identify where this bug belongs to?
Created attachment 1815064 [details] Logs I also encountered this issue when running automation TC - Copy existing disk when vm cloned from snapshot here are the errors: engine: 2021-08-18 09:13:31,133+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ForkJoinPool-1-worker-7) [7cdcba05] EVENT_ID: VM_PAUSED_ERROR(139), VM copy_disk_vm_iscsi has been paused due to unknown storage error. vdsm: 021-08-18 09:13:31,088+0300 INFO (libvirt/events) [virt.vm] (vmId='c474e2d3-405c-4dd5-8974-4db289835eda') abnormal vm stop device ua-14949751-f89b-42e4-a4ed-825ff3e7191f error (vm:5188) 2021-08-18 09:13:31,088+0300 INFO (libvirt/events) [virt.vm] (vmId='c474e2d3-405c-4dd5-8974-4db289835eda') CPU stopped: onIOError (vm:6023) 2021-08-18 09:13:31,091+0300 INFO (libvirt/events) [virt.vm] (vmId='c474e2d3-405c-4dd5-8974-4db289835eda') CPU stopped: onSuspend (vm:6023) ...
(In reply to sshmulev from comment #10) Just forgot to mention the version it reproduced: ovirt-engine-4.4.8.4-0.7.el8ev.noarch vdsm-4.40.80.5-1.el8ev.x86_64 QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+12191+622ad3bb.5) This can be reproduced by automation, I am trying to reproduce it manually right now.
(In reply to Klaus Heinrich Kiwi from comment #9) > Kevin, could you take a look, let us know if there are any logs / traces > that could help us identify where this bug belongs to? Unfortunately, as usual with bugs from RHV, VDSM is the lowest layer from which detailed logs are available. What happens in libvirt and QEMU is mostly guesswork. We don't even have the error message of the I/O error event. From the VDSM logs, it seems that the affected disk was hotplugged with the following configuration: <disk device="disk" snapshot="no" type="block"> <source dev="/rhev/data-center/mnt/blockSD/3a788dc3-14b9-499e-a13b-1427b81c3f70/images/9c48bcb0-226b-47b4-acb4-ce0cf217f988/3f7aeaf1-837c-42e8-b198-68b9683c3704"> <seclabel model="dac" relabel="no" type="none" /> </source> <target bus="virtio" dev="vdb" /> <serial>9c48bcb0-226b-47b4-acb4-ce0cf217f988</serial> <driver cache="none" error_policy="stop" io="native" name="qemu" type="raw" /> <alias name="ua-9c48bcb0-226b-47b4-acb4-ce0cf217f988" /> </disk> So it's a raw image exposed as a virtio-blk disk. > Lukas shared on an e-mail thread the bugs that drove changes since the last > time this was seen working. Adding below for completion: > [...] > Bug 1955187 - RHV VM pauses due to 'qemu-kvm' getting EINVAL on i/o to a > direct lun with scsi passthrough enabled [rhel-8.4.0.z] This change is the most suspect one because it changes code around block limits, and getting these wrong can cause I/O errors. One change is that it now detects max_iov for block devices, which is what we're using here, but this value doesn't actually seem to be used anywhere relevant for virtio-blk. The change was made for scsi-generic. It also sets max_hw_transfer, which however is unused outside of scsi-generic, too. max_transfer is not set any more for sg devices, but it wasn't set for block devices even before. So, after having a first look at that series, it's still unclear to me how it would cause this kind of problems. (In reply to sshmulev from comment #11) > Just forgot to mention the version it reproduced: > QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+12191+622ad3bb.5) This is the same package version for both reports. Can you confirm whether it still works correctly with qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4? This would pretty much narrow it down to the series I suspected anyway.
(In reply to Kevin Wolf from comment #13) > > This is the same package version for both reports. Can you confirm whether > it still works correctly with > qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4? This would pretty much > narrow it down to the series I suspected anyway. So I ran the same test with the version (qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4) and it worked without any error logs.
(In reply to sshmulev from comment #14) > (In reply to Kevin Wolf from comment #13) > > > > This is the same package version for both reports. Can you confirm whether > > it still works correctly with > > qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4? This would pretty much > > narrow it down to the series I suspected anyway. > > So I ran the same test with the version > (qemu-kvm-5.2.0-16.module+el8.4.0+11923+e8b883e4.4) and it worked without > any error logs. Is it possible to run the following inside the guest in both the old and the new versions of the package? # for f in /sys/block/$VIRTIO_BLK_DEV/queue/*; do echo -n "$f: "; cat $f; done VIRTIO_BLK_DEV is the block device name inside the guest (e.g. vdb). Changes between these values may help explain why the patches Kevin mentioned could cause a problem.
(In reply to Stefan Hajnoczi from comment #15) > Is it possible to run the following inside the guest in both the old and the > new versions of the package? > > # for f in /sys/block/$VIRTIO_BLK_DEV/queue/*; do echo -n "$f: "; cat $f; > done Simpler way: grep -s "" /sys/block/$VIRTIO_BLK_DEV/queue/*
(In reply to Stefan Hajnoczi from comment #15) > (In reply to sshmulev from comment #14) > > (In reply to Kevin Wolf from comment #13) > Is it possible to run the following inside the guest in both the old and the > new versions of the package? > > # for f in /sys/block/$VIRTIO_BLK_DEV/queue/*; do echo -n "$f: "; cat $f; > done > > VIRTIO_BLK_DEV is the block device name inside the guest (e.g. vdb). > > Changes between these values may help explain why the patches Kevin > mentioned could cause a problem. They are identical between the two guest versions.
When extending a disk, RHV does: flags = libvirt.VIR_DOMAIN_BLOCK_RESIZE_BYTES self._dom.blockResize(drive.name, newSizeBytes, flags=flags) In the attach libvirt logs we see: $ grep virDomainBlockResize *.log cow-libvirtd.log:2021-08-18 13:41:42.355+0000: 590882: debug : virDomainBlockResize:6057 : dom=0x7fc0f000bc60, (VM: name=vm_TestCase5063_1816405077, uuid=2dc838b0-7699-4a1c-9780-d27df9df9434), disk=vdb, size=2147483648, flags=0x1 raw-libvirtd.log:2021-08-18 13:16:47.805+0000: 577505: debug : virDomainBlockResize:6057 : dom=0x7f379c00b860, (VM: name=vm_TestCase5062_1816154827, uuid=a51060ff-ff72-4006-b24d-fb968fff3a66), disk=vdb, size=2147483648, flags=0x1 raw-libvirtd.log: 2021-08-18 13:16:47.805+0000: 577505: debug : virDomainBlockResize:6057 : dom=0x7f379c00b860, (VM: name=vm_TestCase5062_1816154827, uuid=a51060ff-ff72-4006-b24d-fb968fff3a66), disk=vdb, size=2147483648, flags=0x1 2021-08-18 13:16:47.806+0000: 579240: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f37a4047050 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-4-format","size":2147483648},"id":"libvirt-414"} len=110 ret=110 errno=0 I don't see a reply for "libvirt-414", or any other message about "block-resize" Evelina, we must have also the vdsm log to understand whats going on. Can you extract it from the enviroment?
Peter, can look at the libvirt logs from comment 19? Looks like the virDomainBlockResize never completed.
(In reply to Nir Soffer from comment #22) > Peter, can look at the libvirt logs from comment 19? > > Looks like the virDomainBlockResize never completed. I was going to agre with you, but strangely that log file does not show replies for *any* of the QMP commands. It makes me think the log filters are configured in a way that blocks recording of the replies.
(In reply to Evelina Shames from comment #0) > Steps to Reproduce: > 1. Create a VM from rhel8.3 template > 2. Run the VM > 3. Add a 1G disk on iscsi with VirtIO interface (can be thin or preallocated) > 4. Extend this disk by 1G (total size should be 2G) How do you send this request? How do you wait until the extend operation was completed on the host side? Some RHV APIs are async, and you need to poll the entity until it reports the expected size, or until the job is finished.
I tested this flow on RHEL 8.5: # rpm -q libvirt-daemon qemu-kvm libvirt-daemon-7.6.0-1.module+el8.5.0+12097+2c77910b.x86_64 qemu-kvm-6.0.0-27.module+el8.5.0+12121+c40c8708.x86_64 Using this libvird.conf: # Libvirt logging for debugging qemu vms # https://www.libvirt.org/kbase/debuglogs.html#targeted-logging-for-debugging-qemu-vms # NOTE: filter order matters, util must be last to avoid noisy object logs. log_filters="1:libvirt 1:qemu 1:conf 1:security 3:event 3:json 3:file 3:object 1:util" log_outputs="1:file:/var/log/libvirt/libvirtd.log" In vdsm we see: 2021-08-18 18:01:24,420+0300 INFO (jsonrpc/1) [api.virt] START diskSizeExtend(driveSpecs={'poolID': '3f8e08e6-5485-429e-a686-63754906b27b', 'volumeID': '2279441e-bcbd-4ee8-a619-bc55bc160b3f', 'imageID': '0873e366-b786-4b18-a3c0-43cbf9124c04', 'domainID': 'feab3738-c158-4d48-8a41-b5a95c057a50'}, newSize='2147483648') from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, vmId=af03a553-6c7d-4389-af99-28027428e67c (api:48) 2021-08-18 18:01:24,421+0300 INFO (jsonrpc/1) [vdsm.api] START refreshVolume(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='0873e366-b786-4b18-a3c0-43cbf9124c04', volUUID='2279441e-bcbd-4ee8-a619-bc55bc160b3f') from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, task_id=e80c818e-b38a-492d-90f8-a3421f4cd249 (api:48) 2021-08-18 18:01:24,421+0300 INFO (jsonrpc/1) [storage.LVM] Refreshing LVs (vg=feab3738-c158-4d48-8a41-b5a95c057a50, lvs=('2279441e-bcbd-4ee8-a619-bc55bc160b3f',)) (lvm:1816) 2021-08-18 18:01:24,528+0300 INFO (jsonrpc/1) [vdsm.api] FINISH refreshVolume return=None from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, task_id=e80c818e-b38a-492d-90f8-a3421f4cd249 (api:54) 2021-08-18 18:01:24,529+0300 INFO (jsonrpc/1) [vdsm.api] START getVolumeSize(sdUUID='feab3738-c158-4d48-8a41-b5a95c057a50', spUUID='3f8e08e6-5485-429e-a686-63754906b27b', imgUUID='0873e366-b786-4b18-a3c0-43cbf9124c04', volUUID='2279441e-bcbd-4ee8-a619-bc55bc160b3f', options=None) from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, task_id=5add47a2-8345-416b-b697-a21ae65f1135 (api:48) 2021-08-18 18:01:24,529+0300 INFO (jsonrpc/1) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '2147483648', 'truesize': '2147483648'} from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, task_id=5add47a2-8345-416b-b697-a21ae65f1135 (api:54) 2021-08-18 18:01:24,547+0300 INFO (jsonrpc/1) [api.virt] FINISH diskSizeExtend return={'status': {'code': 0, 'message': 'Done'}, 'size': '2147483648'} from=::ffff:192.168.122.11,42788, flow_id=956c8e3c-fb4c-461f-9605-46cd96f4030c, vmId=af03a553-6c7d-4389-af99-28027428e67c (api:54) In libvirtd.log we see: 2021-08-18 15:01:24.531+0000: 11016: debug : virDomainBlockResize:6055 : dom=0x55ecac7425a0, (VM: name=resize, uuid=af03a553-6c7d-4389-af99-28027428e67c), disk=sdb, size=2147483648, flags=0x1 2021-08-18 15:01:24.532+0000: 12233: info : qemuMonitorIOWrite:438 : QEMU_MONITOR_IO_WRITE: mon=0x7f104004a060 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-1-format","size":2147483648},"id":"libvirt-448"} len=110 ret=110 errno=0 2021-08-18 15:01:24.536+0000: 12233: info : qemuMonitorJSONIOProcessLine:240 : QEMU_MONITOR_RECV_REPLY: mon=0x7f104004a060 reply={"return": {}, "id": "libvirt-448"} In the guest: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 6G 0 disk ββsda1 8:1 0 1M 0 part ββsda2 8:2 0 1G 0 part /boot ββsda3 8:3 0 615M 0 part [SWAP] ββsda4 8:4 0 4.4G 0 part / sdb 8:16 0 2G 0 disk sr0 11:0 1 1024M 0 rom # dd if=/dev/zero bs=1M count=2048 of=/dev/sdb conv=fsync 2048+0 records in 2048+0 records out 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 6.94503 s, 309 MB/s I'm not sure why the test described in comment 0 is so complicated. Writing directly to the block device is better for testing.
Forgot to mention that I tested also with virtio, both works in 8.5.
The stack trace is quite unexpected in what it shows The main event loop thread shows that QEMU is in the process of exiting. it has stopped the main event loop and is running 'qemu_cleanup' which triggers a bdrv_drain_all: Thread 1 (Thread 0x7fc283600f00 (LWP 581518)): #0 0x00007fc27f949b36 in ppoll () from /lib64/libc.so.6 #1 0x000055fa8c11afb9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=-1) at ../util/qemu-timer.c:337 #3 0x000055fa8c128f19 in fdmon_poll_wait (ctx=0x55fa8cca7b60, ready_list=0x7ffece771c00, timeout=-1) at ../util/fdmon-poll.c:80 #4 0x000055fa8c120541 in aio_poll (ctx=0x55fa8cca7b60, blocking=blocking@entry=true) at ../util/aio-posix.c:607 #5 0x000055fa8c062fb5 in bdrv_drain_all_begin () at ../block/io.c:638 #6 bdrv_drain_all_begin () at ../block/io.c:603 #7 0x000055fa8bfec25c in qemu_cleanup () at ../softmmu/vl.c:4547 #8 0x000055fa8bd71ab7 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:51 Meanwhile one of the vCPU threads is waiting on a lock Thread 5 (Thread 0x7fc270a2d700 (LWP 581529)): #0 0x00007fc27fc2e65d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007fc27fc27979 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x000055fa8c10037d in qemu_mutex_lock_impl (mutex=0x55fa8c982000 <qemu_global_mutex>, file=0x55fa8c1f60d3 "../softmmu/physmem.c", line=2729) at ../util/qemu-thread-posix.c:79 #3 0x000055fa8bf774ee in qemu_mutex_lock_iothread_impl (file=file@entry=0x55fa8c1f60d3 "../softmmu/physmem.c", line=line@entry=2729) at ../softmmu/cpus.c:485 #4 0x000055fa8bf5ea66 in prepare_mmio_access (mr=<optimized out>) at ../softmmu/physmem.c:2729 #5 0x000055fa8bf5f919 in flatview_write_continue (fv=fv@entry=0x7fc21c066840, addr=addr@entry=49216, attrs=..., ptr=ptr@entry=0x7fc283620000, len=len@entry=1, addr1=<optimized out>, l=<optimized out>, mr=0x55fa8e929fc0) at ../softmmu/physmem.c:2754 #6 0x000055fa8bf5fac6 in flatview_write (fv=0x7fc21c066840, addr=49216, attrs=..., buf=0x7fc283620000, len=1) at ../softmmu/physmem.c:2799 #7 0x000055fa8bf6365f in address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at ../softmmu/physmem.c:2891 #8 0x000055fa8bfad904 in kvm_handle_io (count=1, size=1, direction=<optimized out>, data=<optimized out>, attrs=..., port=49216) at ../accel/kvm/kvm-all.c:2297 #9 kvm_cpu_exec (cpu=cpu@entry=0x55fa8cd5a730) at ../accel/kvm/kvm-all.c:2543 #10 0x000055fa8bfcc045 in kvm_vcpu_thread_fn (arg=0x55fa8cd5a730) at ../accel/kvm/kvm-cpus.c:49 #11 0x000055fa8c100294 in qemu_thread_start (args=0x55fa8cd80040) at ../util/qemu-thread-posix.c:521 #12 0x00007fc27fc2514a in start_thread () from /lib64/libpthread.so.0 #13 0x00007fc27f954dc3 in clone () from /lib64/libc.so.6 This looks like a problem with gracefully shutting down the QEMU process, which seems separate from the original block resize issues. The hang in bdrv_drain_all though does still suggest storage related issues. I think we'll need to start a fresh booted VM and capture a correct libvirt debug log that shows the QMP JSON replies too.
Looking in vdsm log from attachment 1815307 [details]: 1. VM started 2021-08-18 16:23:45,448+0300 INFO (vm/bdbde39e) [virt.vm] (vmId='bdbde39e-3528-4264-a4f7-afd4f36c13ea') <?xml version='1.0' encoding='utf-8'?> <domain xmlns:ns0="http://libvirt.org/schemas/domain/qemu/1.0" xmlns:ns1="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> <name>vm_TestCase5063_1816234899</name> <uuid>bdbde39e-3528-4264-a4f7-afd4f36c13ea</uuid> <memory>1048576</memory> <currentMemory>1048576</currentMemory> <iothreads>1</iothreads> <maxMemory slots="16">4194304</maxMemory> <vcpu current="1">16</vcpu> <sysinfo type="smbios"> <system> <entry name="manufacturer">Red Hat</entry> <entry name="product">RHEL</entry> <entry name="version">8.4-0.6.el8</entry> <entry name="family">RHV</entry> <entry name="sku">8.4.0</entry> <entry name="serial">3efcbc4a-5fc6-4e66-bded-36bba7700151</entry> <entry name="uuid">bdbde39e-3528-4264-a4f7-afd4f36c13ea</entry> </system> <baseBoard> <entry name="manufacturer">Red Hat</entry> <entry name="product">RHEL-AV</entry> </baseBoard> </sysinfo> <clock adjustment="0" offset="variable"> <timer name="rtc" tickpolicy="catchup" /> <timer name="pit" tickpolicy="delay" /> <timer name="hpet" present="no" /> </clock> <features> <acpi /> </features> <cpu match="exact"> <model>Skylake-Client</model> <feature name="hle" policy="disable" /> <feature name="rtm" policy="disable" /> <feature name="mpx" policy="disable" /> <topology cores="1" sockets="16" threads="1" /> <numa> <cell cpus="0-15" id="0" memory="1048576" /> </numa> </cpu> <cputune /> <ns0:capabilities> <ns0:add capability="blockdev" /> <ns0:add capability="incremental-backup" /> </ns0:capabilities> <devices> <input bus="ps2" type="mouse" /> <channel type="unix"> <target name="ovirt-guest-agent.0" type="virtio" /> <source mode="bind" path="/var/lib/libvirt/qemu/channels/bdbde39e-3528-4264-a4f7-afd4f36c13ea.ovirt-guest-agent.0" /> </channel> <channel type="unix"> <target name="org.qemu.guest_agent.0" type="virtio" /> <source mode="bind" path="/var/lib/libvirt/qemu/channels/bdbde39e-3528-4264-a4f7-afd4f36c13ea.org.qemu.guest_agent.0" /> </channel> <controller index="0" model="qemu-xhci" ports="8" type="usb"> <alias name="ua-33e3ea29-aee5-494a-b014-c12df8aefb7c" /> </controller> <controller index="0" ports="16" type="virtio-serial"> <alias name="ua-4885e698-9917-42d7-8e38-2eeaf8270668" /> </controller> <video> <model heads="1" ram="65536" type="qxl" vgamem="16384" vram="8192" /> <alias name="ua-853c9508-33cf-442b-a04c-b555b1dbf433" /> </video> <memballoon model="virtio"> <stats period="5" /> <alias name="ua-a2fae15e-558f-4963-9fc2-b7b5847e5150" /> </memballoon> <controller index="0" model="virtio-scsi" type="scsi"> <driver iothread="1" /> <alias name="ua-af91d249-9407-4150-9bf1-59d67d4dfe21" /> </controller> <rng model="virtio"> <backend model="random">/dev/urandom</backend> <alias name="ua-ce3c0b21-c420-47a2-9914-fb9eff88f666" /> </rng> <graphics autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" tlsPort="-1" type="spice"> <channel mode="secure" name="main" /> <channel mode="secure" name="inputs" /> <channel mode="secure" name="cursor" /> <channel mode="secure" name="playback" /> <channel mode="secure" name="record" /> <channel mode="secure" name="display" /> <channel mode="secure" name="smartcard" /> <channel mode="secure" name="usbredir" /> <listen network="vdsm-ovirtmgmt" type="network" /> </graphics> <channel type="spicevmc"> <target name="com.redhat.spice.0" type="virtio" /> </channel> <controller model="pcie-root" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <controller model="pcie-root-port" type="pci" /> <disk device="cdrom" snapshot="no" type="file"> <driver error_policy="report" name="qemu" type="raw" /> <source file="" startupPolicy="optional"> <seclabel model="dac" relabel="no" type="none" /> </source> <target bus="sata" dev="sdc" /> <readonly /> <alias name="ua-2450921b-5e61-4e92-ba9d-c6287b5b9630" /> </disk> <disk device="disk" snapshot="no" type="block"> <target bus="virtio" dev="vda" /> <source dev="/rhev/data-center/mnt/blockSD/0b9240a5-862f-4573-9ba0-34c8f853269c/images/33198759-4a4e-4e2e-9605-c18e186b1256/b862f01f-fab3-411c-8e23-2c97d0b2c18b"> <seclabel model="dac" relabel="no" type="none" /> </source> <driver cache="none" error_policy="stop" io="native" iothread="1" name="qemu" type="qcow2" /> <alias name="ua-33198759-4a4e-4e2e-9605-c18e186b1256" /> <boot order="1" /> <serial>33198759-4a4e-4e2e-9605-c18e186b1256</serial> </disk> <interface type="bridge"> <model type="virtio" /> <link state="up" /> <source bridge="ovirtmgmt" /> <alias name="ua-c711839d-37e9-4ac7-815e-4e1aaa9c41b3" /> <mac address="00:1a:4a:24:17:6d" /> <mtu size="1500" /> <filterref filter="vdsm-no-mac-spoofing" /> <bandwidth /> </interface> </devices> <pm> <suspend-to-disk enabled="no" /> <suspend-to-mem enabled="no" /> </pm> <os> <type arch="x86_64" machine="pc-q35-rhel8.4.0">hvm</type> <smbios mode="sysinfo" /> </os> <metadata> <ns1:qos /> <ovirt-vm:vm> <ovirt-vm:minGuaranteedMemoryMb type="int">1024</ovirt-vm:minGuaranteedMemoryMb> <ovirt-vm:clusterVersion>4.6</ovirt-vm:clusterVersion> <ovirt-vm:custom /> <ovirt-vm:device alias="ua-c711839d-37e9-4ac7-815e-4e1aaa9c41b3" mac_address="00:1a:4a:24:17:6d"> <ovirt-vm:custom /> </ovirt-vm:device> <ovirt-vm:device devtype="disk" name="vda"> <ovirt-vm:poolID>bbe89679-f676-489c-9c54-fbfe3366dc08</ovirt-vm:poolID> <ovirt-vm:volumeID>b862f01f-fab3-411c-8e23-2c97d0b2c18b</ovirt-vm:volumeID> <ovirt-vm:imageID>33198759-4a4e-4e2e-9605-c18e186b1256</ovirt-vm:imageID> <ovirt-vm:domainID>0b9240a5-862f-4573-9ba0-34c8f853269c</ovirt-vm:domainID> </ovirt-vm:device> <ovirt-vm:launchPaused>false</ovirt-vm:launchPaused> <ovirt-vm:resumeBehavior>auto_resume</ovirt-vm:resumeBehavior> <ovirt-vm:ballooningEnabled>true</ovirt-vm:ballooningEnabled> </ovirt-vm:vm> </metadata> </domain> 2. Hot plug disk 2021-08-18 16:24:49,659+0300 INFO (jsonrpc/1) [api.virt] START hotplugDisk(params={'vmId': 'bdbde39e-3528-4264-a4f7-afd4f36c13ea', 'xml': '<?xml version="1.0" encoding="UTF-8"?><hotplug><devices><disk snapshot="no" type="block" device="disk"><target dev="vda" bus="virtio"></target><source dev="/rhev/data-center/mnt/blockSD/0b9240a5-862f-4573-9ba0-34c8f853269c/images/b72350ce-4397-412f-a04b-76af4e3fb85f/2949a422-7fda-4e13-994c-d5fabab91b07"><seclabel model="dac" type="none" relabel="no"></seclabel></source><driver name="qemu" iothread="1" io="native" type="qcow2" error_policy="stop" cache="none"></driver><alias name="ua-b72350ce-4397-412f-a04b-76af4e3fb85f"></alias><serial>b72350ce-4397-412f-a04b-76af4e3fb85f</serial></disk></devices><metadata xmlns:ovirt-vm="http://ovirt.org/vm/1.0"><ovirt-vm:vm><ovirt-vm:device devtype="disk" name="vda"><ovirt-vm:poolID>bbe89679-f676-489c-9c54-fbfe3366dc08</ovirt-vm:poolID><ovirt-vm:volumeID>2949a422-7fda-4e13-994c-d5fabab91b07</ovirt-vm:volumeID><ovirt-vm:imageID>b72350ce-4397-412f-a04b-76af4e3fb85f</ovirt-vm:imageID><ovirt-vm:domainID>0b9240a5-862f-4573-9ba0-34c8f853269c</ovirt-vm:domainID></ovirt-vm:device></ovirt-vm:vm></metadata></hotplug>'}) from=::ffff:10.46.12.176,56592, flow_id=diskattachments_create_5eb976cf-5886, vmId=bdbde39e-3528-4264-a4f7-afd4f36c13ea (api:48) 2021-08-18 16:24:50,063+0300 INFO (jsonrpc/1) [virt.vm] (vmId='bdbde39e-3528-4264-a4f7-afd4f36c13ea') Hotplug disk xml: <?xml version='1.0' encoding='utf-8'?> <disk device="disk" snapshot="no" type="block"> <source dev="/rhev/data-center/mnt/blockSD/0b9240a5-862f-4573-9ba0-34c8f853269c/images/b72350ce-4397-412f-a04b-76af4e3fb85f/2949a422-7fda-4e13-994c-d5fabab91b07"> <seclabel model="dac" relabel="no" type="none" /> </source> <target bus="virtio" dev="vdb" /> <serial>b72350ce-4397-412f-a04b-76af4e3fb85f</serial> <driver cache="none" error_policy="stop" io="native" iothread="1" name="qemu" type="qcow2" /> <alias name="ua-b72350ce-4397-412f-a04b-76af4e3fb85f" /> </disk> (vm:3851) 2021-08-18 16:24:50,142+0300 INFO (jsonrpc/1) [api.virt] FINISH hotplugDisk return={'status': {'code': 0, 'message': 'Done'}, 'vmList': {}} from=::ffff:10.46.12.176,56592, flow_id=diskattachments_create_5eb976cf-5886, vmId=bdbde39e-3528-4264-a4f7-afd4f36c13ea (api:54) 3. Starting resize 2021-08-18 16:24:57,686+0300 INFO (jsonrpc/5) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': '2949a422-7fda-4e13-994c-d5fabab91b07', 'imageID': 'b72350ce-4397-412f-a04b-76af4e3fb85f', 'domainID': '0b9240a5-862f-4573-9ba0-34c8f853269c'}, newSize='2147483648') from=::ffff:10.46.12.176,56592, flow_id=diskattachments_update_ef4d3300-48fd, vmId=bdbde39e-3528-4264-a4f7-afd4f36c13ea (api:48) 4. Warnings about non-responsive vm... 2021-08-18 16:25:13,662+0300 WARN (periodic/0) [virt.periodic.VmDispatcher] could not run <class 'vdsm.virt.periodic.UpdateVolumes'> on ['bdbde39e-3528-4264-a4f7-afd4f36c13ea'] (periodic:290) 2021-08-18 16:25:34,244+0300 INFO (qgapoller/2) [vds] Failed to get guest info for vm=bdbde39e-3528-4264-a4f7-afd4f36c13ea, error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockResize) (qemuguestagent:522) 2021-08-18 16:25:34,375+0300 WARN (jsonrpc/4) [virt.vm] (vmId='bdbde39e-3528-4264-a4f7-afd4f36c13ea') Couldn't update metadata: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockResize) (vm:2263) 2021-08-18 16:25:45,526+0300 WARN (jsonrpc/1) [virt.vm] (vmId='bdbde39e-3528-4264-a4f7-afd4f36c13ea') Couldn't update metadata: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBlockResize) (vm:2263) 2021-08-18 16:25:57,806+0300 WARN (vdsm.Scheduler) [Executor] Worker blocked: <Worker name=jsonrpc/5 running <Task <JsonRpcTask {'jsonrpc': '2.0', 'method': 'VM.diskSizeExtend', 'params': {'vmID': 'bdbde39e-3528-4264-a4f7-afd4f36c13ea', 'driveSpecs': {'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': '2949a422-7fda-4e13-994c-d5fabab91b07', 'imageID': 'b72350ce-4397-412f-a04b-76af4e3fb85f', 'domainID': '0b9240a5-862f-4573-9ba0-34c8f853269c'}, 'newSize': '2147483648'}, 'id': 'a971f494-b891-4498-af44-30fa2ac21697'} at 0x7fecce79feb8> timeout=60, duration=60.12 at 0x7fecce79fc18> task#=43 at 0x7fecb4aa2470>, traceback: File: "/usr/lib64/python3.6/threading.py", line 887, in _bootstrap self._bootstrap_inner() File: "/usr/lib64/python3.6/threading.py", line 919, in _bootstrap_inner self.run() File: "/usr/lib64/python3.6/threading.py", line 867, in run self._target(*self._args, **self._kwargs) File: "/usr/lib/python3.6/site-packages/vdsm/common/concurrent.py", line 260, in run ret = func(*args, **kwargs) File: "/usr/lib/python3.6/site-packages/vdsm/executor.py", line 301, in _run self._execute_task() File: "/usr/lib/python3.6/site-packages/vdsm/executor.py", line 315, in _execute_task task() File: "/usr/lib/python3.6/site-packages/vdsm/executor.py", line 391, in __call__ self._callable() File: "/usr/lib/python3.6/site-packages/yajsonrpc/__init__.py", line 264, in __call__ self._handler(self._ctx, self._req) File: "/usr/lib/python3.6/site-packages/yajsonrpc/__init__.py", line 307, in _serveRequest response = self._handle_request(req, ctx) File: "/usr/lib/python3.6/site-packages/yajsonrpc/__init__.py", line 349, in _handle_request res = method(**params) File: "/usr/lib/python3.6/site-packages/vdsm/rpc/Bridge.py", line 198, in _dynamicMethod result = fn(*methodArgs) File: "<decorator-gen-381>", line 2, in diskSizeExtend File: "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 50, in method ret = func(*args, **kwargs) File: "<decorator-gen-380>", line 2, in diskSizeExtend File: "/usr/lib/python3.6/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File: "/usr/lib/python3.6/site-packages/vdsm/API.py", line 572, in diskSizeExtend return self.vm.diskSizeExtend(driveSpecs, newSize) File: "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 4727, in diskSizeExtend return self._diskSizeExtendCow(drive, newSizeBytes) File: "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 4617, in _diskSizeExtendCow self._dom.blockResize(drive.name, newSizeBytes, flags=flags) File: "/usr/lib/python3.6/site-packages/vdsm/virt/virdomain.py", line 101, in f ret = attr(*args, **kwargs) File: "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper ret = f(*args, **kwargs) File: "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper return func(inst, *args, **kwargs) File: "/usr/lib64/python3.6/site-packages/libvirt.py", line 1205, in blockResize ret = libvirtmod.virDomainBlockResize(self._o, disk, size, flags) (executor:363) Other warnings seen after disk resize started: 021-08-18 16:25:00,109+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) 2021-08-18 16:25:00,122+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) 2021-08-18 16:25:00,282+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) 2021-08-18 16:25:00,361+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) ... 2021-08-18 16:25:47,111+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) ... 2021-08-18 16:25:47,304+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) 2021-08-18 16:25:47,498+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) 2021-08-18 16:25:47,629+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) ... 2021-08-18 16:26:30,045+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) ... 2021-08-18 16:26:30,302+0300 WARN (dhcp-monitor) [root] Nic ovirtmgmt is not configured for IPv6 monitoring. (dhcp_monitor:174) iSCSI server details: # iscsiadm -m node 10.46.16.10:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe 10.46.16.10:3260,1034 iqn.1992-08.com.netapp:vserver-rhv-qe 10.46.16.10:3260,1048 iqn.1992-08.com.netapp:vserver-rhv-qe 10.46.16.9:3260,1033 iqn.1992-08.com.netapp:vserver-rhv-qe [2620:52:0:2e10:a0:98ff:febf:f1ea]:3260,1048 iqn.1992-08.com.netapp:vserver-rhv-qe We have one target using ipv6, but it is not used: # iscsiadm -m session tcp: [2] 10.46.16.10:3260,1034 iqn.1992-08.com.netapp:vserver-rhv-qe (non-flash) Based on vdsm log, there was no issue with storage during this time. The vm was fine until we tried the resize. The resize call stuck and the vm stopped responding. Looks like a deadlock in qemu.
(In reply to Daniel BerrangΓ© from comment #32) > The stack trace is quite unexpected in what it shows > > The main event loop thread shows that QEMU is in the process of exiting. it > has stopped the main event loop and is running 'qemu_cleanup' which triggers > a bdrv_drain_all: This is not surprising, I tried to shutdown the vm to enable libvirt logs cleanly for more testing. > This looks like a problem with gracefully shutting down the QEMU process, > which seems separate from the original block resize issues. Yes, we had such issues in the past and they were fixed more then once. > The hang in bdrv_drain_all though does still suggest storage related issues. We don't have anything showing that storage was not accessible at this time. Vdsm has a thread for each storage domain, checking that we can read from storage every 10 seconds. When storage is not readable, we see warning about that storage every 10 seconds. > I think we'll need to start a fresh booted VM and capture a correct libvirt > debug log that shows the QMP JSON replies too. I agree.
I tried few more time with 2 fresh vms, and reproduced the issue twice. It does not happen with virtio-scsi, only with virtio disk. Steps to reproduce: 1. Create vm 2. Attach 1g raw preallocated disk 3. Resize disk to 2g from RHV UI 4. Wait until the resize is completed 5. Check that the disk was resize in the guest 6. Write 2g of data to disk The VM pauses after starting the write. More details: Run 1: vdsm log: 2021-08-18 23:20:38,945+0300 INFO (jsonrpc/7) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': 'ba4364d4-bcff-4428-8ecb-006a9e48eaea', 'imageID': '4226aae1-108f-4150-801f-f9e73018855b', 'domainID' : '0b9240a5-862f-4573-9ba0-34c8f853269c'}, newSize='2147483648') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, vmId=8f5db04a-41cd-4175-a8f3-1dc8be44bb04 (api:48) 2021-08-18 23:20:38,946+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') moving from state preparing -> state preparing (task:624) 2021-08-18 23:20:38,947+0300 INFO (jsonrpc/7) [vdsm.api] START refreshVolume(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='4226aae1-108f-4150-801f-f9e73018855b', volUUID='ba4364d4-bcff-4428-8ecb-0 06a9e48eaea') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=530dd786-2452-4aea-ad62-4e0c2343b21e (api:48) 2021-08-18 23:20:38,947+0300 INFO (jsonrpc/7) [storage.LVM] Refreshing LVs (vg=0b9240a5-862f-4573-9ba0-34c8f853269c, lvs=('ba4364d4-bcff-4428-8ecb-006a9e48eaea',)) (lvm:1816) 2021-08-18 23:20:39,076+0300 INFO (jsonrpc/7) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=530dd786-2452-4aea-ad62-4e0c2343b21e (api:54) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') finished: None (task:1202) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') moving from state finished -> state finished (task:624) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:20:39,076+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='530dd786-2452-4aea-ad62-4e0c2343b21e') ref 0 aborting False (task:1000) 2021-08-18 23:20:39,077+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') moving from state finished -> state preparing (task:624) 2021-08-18 23:20:39,077+0300 INFO (jsonrpc/7) [vdsm.api] START getVolumeSize(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='4226aae1-108f-4150-801f-f9e73018855b', volUUID='ba4364d4-bcff-4428-8ecb-0 06a9e48eaea') from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=bd720612-8fcd-49a5-8df3-051cf0b5eb7c (api:48) 2021-08-18 23:20:39,077+0300 INFO (jsonrpc/7) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '2147483648', 'truesize': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, task_id=bd720612-8fcd-49a5-8df3-0 51cf0b5eb7c (api:54) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') finished: {'apparentsize': '2147483648', 'truesize': '2147483648'} (task:1202) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') moving from state finished -> state finished (task:624) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:20:39,078+0300 DEBUG (jsonrpc/7) [storage.TaskManager.Task] (Task='bd720612-8fcd-49a5-8df3-051cf0b5eb7c') ref 0 aborting False (task:1000) 2021-08-18 23:20:39,089+0300 INFO (jsonrpc/7) [api.virt] FINISH diskSizeExtend return={'status': {'code': 0, 'message': 'Done'}, 'size': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=ebfbffc9-3402-457b-8514-b4ebaaaa5a63, vmId=8f5db04a-41cd-4175-a8f3-1dc8be44bb04 (api:54) In the guest: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2G 0 disk sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ββvda1 252:1 0 1M 0 part ββvda2 252:2 0 100M 0 part /boot/efi ββvda3 252:3 0 9.9G 0 part / vdb 252:16 0 2G 0 disk # dd if=/dev/zero bs=1M count=2048 of=/dev/vdb conv=fsync After write was started, we see many events for "abnormal vmstop device" 2021-08-18 23:20:57,294+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') abnormal vm stop device ua-4226aae1-108f-4150-801f-f9e73018855b error (vm:5188) 2021-08-18 23:20:57,294+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') CPU stopped: onIOError (vm:6023) 2021-08-18 23:20:57,304+0300 INFO (libvirt/events) [virt.vm] (vmId='8f5db04a-41cd-4175-a8f3-1dc8be44bb04') CPU stopped: onSuspend (vm:6023) ... Libvirt log: 1. Successful resize 2021-08-18 20:20:39.079+0000: 1537: debug : virDomainBlockResize:6057 : dom=0x7f1c00002380, (VM: name=resize, uuid=8f5db04a-41cd-4175-a8f3-1dc8be44bb04), disk=vdc, size=2147483648, flags=0x1 2021-08-18 20:20:39.080+0000: 9003: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f1bfc04d0a0 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-6-format","size":2147483648},"id":"libvirt-4187"} len=111 ret=111 errno=0 2021-08-18 20:20:39.085+0000: 9003: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0x7f1bfc04d0a0 reply={"return": {}, "id": "libvirt-4187"} 2. IOError 2021-08-18 20:20:57.290+0000: 9003: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1629318057, "microseconds": 290744}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-6-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}}] 2021-08-18 20:20:57.291+0000: 9003: info : qemuMonitorJSONIOProcessLine:237 : QEMU_MONITOR_RECV_EVENT: mon=0x7f1bfc04d0a0 event={"timestamp": {"seconds": 1629318057, "microseconds": 290744}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": fa lse, "node-name": "libvirt-6-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}} Run 2: vdsm log: In the guest before the resize: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ββvda1 252:1 0 1M 0 part ββvda2 252:2 0 100M 0 part /boot/efi ββvda3 252:3 0 9.9G 0 part / vdb 252:16 0 1G 0 disk Testing writing before the resize: [root@vm-13-101 ~]# dd if=/dev/zero bs=1M count=1024 of=/dev/vdb conv=fsync 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.65022 s, 405 MB/s Successful resize vdsm.log: 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='ce1276cb-f421-4378-88cb-07657477309d') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,108+0300 DEBUG (jsonrpc/2) [storage.TaskManager.Task] (Task='ce1276cb-f421-4378-88cb-07657477309d') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,116+0300 DEBUG (check/loop) [storage.asyncevent] Process <subprocess.Popen object at 0x7fcdc44d7208> terminated (count=1) (asyncevent:660) 2021-08-18 23:48:30,116+0300 DEBUG (check/loop) [storage.check] FINISH check '/dev/3a3e5451-7d1c-4fc0-842a-dbea5bf36158/metadata' (rc=0, elapsed=0.04) (check:359) 2021-08-18 23:48:30,133+0300 INFO (jsonrpc/5) [api.virt] START diskSizeExtend(driveSpecs={'poolID': 'bbe89679-f676-489c-9c54-fbfe3366dc08', 'volumeID': '30e5f051-478a-4af8-add9-56327d59115b', 'imageID': 'ace9789b-e320-4e2f-a2bf-85c47fc87f36', 'domainID': '0b9240a5-862f-4573-9ba0-34c8f853269c'}, newSize='2147483648') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, vmId=e3c192b0-46e7-4c9d-bd40-1d7dcc574801 (api:48) 2021-08-18 23:48:30,134+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') moving from state preparing -> state preparing (task:624) 2021-08-18 23:48:30,134+0300 INFO (jsonrpc/5) [vdsm.api] START refreshVolume(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='ace9789b-e320-4e2f-a2bf-85c47fc87f36', volUUID='30e5f051-478a-4af8-add9-56327d59115b') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=7999cf22-2755-4ecd-8cf0-7654faf2defa (api:48) 2021-08-18 23:48:30,135+0300 INFO (jsonrpc/5) [storage.LVM] Refreshing LVs (vg=0b9240a5-862f-4573-9ba0-34c8f853269c, lvs=('30e5f051-478a-4af8-add9-56327d59115b',)) (lvm:1816) 2021-08-18 23:48:30,255+0300 INFO (jsonrpc/5) [vdsm.api] FINISH refreshVolume return=None from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=7999cf22-2755-4ecd-8cf0-7654faf2defa (api:54) 2021-08-18 23:48:30,255+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') finished: None (task:1202) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='7999cf22-2755-4ecd-8cf0-7654faf2defa') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,256+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') moving from state finished -> state preparing (task:624) 2021-08-18 23:48:30,257+0300 INFO (jsonrpc/5) [vdsm.api] START getVolumeSize(sdUUID='0b9240a5-862f-4573-9ba0-34c8f853269c', spUUID='bbe89679-f676-489c-9c54-fbfe3366dc08', imgUUID='ace9789b-e320-4e2f-a2bf-85c47fc87f36', volUUID='30e5f051-478a-4af8-add9-56327d59115b') from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=21be0a49-175d-4ffe-a672-2f1489e1c083 (api:48) 2021-08-18 23:48:30,257+0300 INFO (jsonrpc/5) [vdsm.api] FINISH getVolumeSize return={'apparentsize': '2147483648', 'truesize': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, task_id=21be0a49-175d-4ffe-a672-2f1489e1c083 (api:54) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') finished: {'apparentsize': '2147483648', 'truesize': '2147483648'} (task:1202) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') moving from state finished -> state finished (task:624) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.ResourceManager.Owner] Owner.releaseAll resources {} (resourceManager:738) 2021-08-18 23:48:30,257+0300 DEBUG (jsonrpc/5) [storage.TaskManager.Task] (Task='21be0a49-175d-4ffe-a672-2f1489e1c083') ref 0 aborting False (task:1000) 2021-08-18 23:48:30,278+0300 INFO (jsonrpc/5) [api.virt] FINISH diskSizeExtend return={'status': {'code': 0, 'message': 'Done'}, 'size': '2147483648'} from=::ffff:10.46.12.176,60588, flow_id=0f5315ff-1f0d-42a0-953b-345b25f1ae66, vmId=e3c192b0-46e7-4c9d-bd40-1d7dcc574801 (api:54) libvirtd.log: 2021-08-18 20:48:30.258+0000: 1536: debug : virDomainBlockResize:6057 : dom=0x7f1ba800be40, (VM: name=resize2, uuid=e3c192b0-46e7-4c9d-bd40-1d7dcc574801), disk=vdb, size=2147483648, flags=0x1 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjBeginJobInternal:843 : Starting job: job=modify agentJob=none asyncJob=none (vm=0x7f1ba83be870 name=resize2, current job=none agentJob=none async=none) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjBeginJobInternal:890 : Started job: modify (async=none vm=0x7f1ba83be870 name=resize2) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuDomainObjEnterMonitorInternal:5809 : Entering monitor (mon=0x7f1bfc04d620 vm=0x7f1ba83be870 name=resize2) 2021-08-18 20:48:30.258+0000: 1536: debug : qemuMonitorBlockResize:2223 : device=<null> nodename=libvirt-4-format size=2147483648 2021-08-18 20:48:30.258+0000: 1536: debug : qemuMonitorBlockResize:2225 : mon:0x7f1bfc04d620 vm:0x7f1ba83be870 fd:44 2021-08-18 20:48:30.258+0000: 1536: info : qemuMonitorSend:950 : QEMU_MONITOR_SEND_MSG: mon=0x7f1bfc04d620 msg={"execute":"block_resize","arguments":{"node-name":"libvirt-4-format","size":2147483648},"id":"libvirt-445"} fd=-1 2021-08-18 20:48:30.258+0000: 51419: info : qemuMonitorIOWrite:437 : QEMU_MONITOR_IO_WRITE: mon=0x7f1bfc04d620 buf={"execute":"block_resize","arguments":{"node-name":"libvirt-4-format","size":2147483648},"id":"libvirt-445"} len=110 ret=110 errno=0 2021-08-18 20:48:30.278+0000: 51419: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"return": {}, "id": "libvirt-445"}] 2021-08-18 20:48:30.278+0000: 51419: info : qemuMonitorJSONIOProcessLine:242 : QEMU_MONITOR_RECV_REPLY: mon=0x7f1bfc04d620 reply={"return": {}, "id": "libvirt-445"} In the guest, after the resize: # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 10G 0 disk ββvda1 252:1 0 1M 0 part ββvda2 252:2 0 100M 0 part /boot/efi ββvda3 252:3 0 9.9G 0 part / vdb 252:16 0 2G 0 disk Starting the write: # dd if=/dev/zero bs=1M count=2048 of=/dev/vdb conv=fsync (VM paused) After the write, many libvirt events for abnormal stop: vdsm.log: 2021-08-18 23:48:55,041+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') abnormal vm stop device ua-ace9789b-e320-4e2f-a2bf-85c47fc87f36 error (vm:5188) 2021-08-18 23:48:55,041+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onIOError (vm:6023) 2021-08-18 23:48:55,052+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onSuspend (vm:6023) 2021-08-18 23:48:55,084+0300 WARN (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') device vdb reported I/O error (vm:4021) 2021-08-18 23:48:55,085+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') abnormal vm stop device ua-ace9789b-e320-4e2f-a2bf-85c47fc87f36 error (vm:5188) 2021-08-18 23:48:55,085+0300 INFO (libvirt/events) [virt.vm] (vmId='e3c192b0-46e7-4c9d-bd40-1d7dcc574801') CPU stopped: onIOError (vm:6023) libvird.log: Many errors like: 2021-08-18 20:48:55.038+0000: 51419: debug : qemuMonitorJSONIOProcessLine:222 : Line [{"timestamp": {"seconds": 1629319735, "microseconds": 38077}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-4-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}}] 2021-08-18 20:48:55.038+0000: 51419: info : qemuMonitorJSONIOProcessLine:237 : QEMU_MONITOR_RECV_EVENT: mon=0x7f1bfc04d620 event={"timestamp": {"seconds": 1629319735, "microseconds": 38077}, "event": "BLOCK_IO_ERROR", "data": {"device": "", "nospace": false, "node-name": "libvirt-4-format", "reason": "Invalid argument", "operation": "write", "action": "stop"}} This looks like a qemu bug, since same operation works fine in virtio-scsi.
Created attachment 1815368 [details] Logs for second reproduction (vdsm, libvirt, backtrace, vm xml)
I reproduced third time, this time tracing qemu during the write, and indeed this write fail with EINVAL. 64522 01:09:03.228453 io_submit(0x7f1c2400e000, 1, [{aio_data=0, aio_lio_opcode=IOCB_CMD_PWRITEV, aio_fildes=90, aio_buf=[{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=180224},{iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=16384}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=16384}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=49152}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=65536}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=73728}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=12288}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=16384}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=16384}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=32768}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=32768}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=32768}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=69632}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=8192}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, ...], aio_offset=2076180480, aio_flags=IOCB_FLAG_RESFD, aio_resfd=37}]) = -1 EINVAL (Invalid argument) <0.000146> The offset is 1980 MiB, all the buffers are aligned, and the total length of the write is 716800. I don't see any reason why this would fail. The device on the host is: # realpath /rhev/data-center/mnt/blockSD/0b9240a5-862f-4573-9ba0-34c8f853269c/images/ee0b7563-ca20-4c31-87d0-60ec154a6f62/900287b2-213b-48f8-8348-cbe690182e4c /dev/dm-35 # grep -s "" /sys/block/dm-35/queue/* /sys/block/dm-35/queue/add_random:0 /sys/block/dm-35/queue/chunk_sectors:0 /sys/block/dm-35/queue/dax:0 /sys/block/dm-35/queue/discard_granularity:1048576 /sys/block/dm-35/queue/discard_max_bytes:33554432 /sys/block/dm-35/queue/discard_max_hw_bytes:33554432 /sys/block/dm-35/queue/discard_zeroes_data:0 /sys/block/dm-35/queue/fua:0 /sys/block/dm-35/queue/hw_sector_size:512 /sys/block/dm-35/queue/io_poll:0 /sys/block/dm-35/queue/io_poll_delay:0 /sys/block/dm-35/queue/iostats:0 /sys/block/dm-35/queue/io_timeout:0 /sys/block/dm-35/queue/logical_block_size:512 /sys/block/dm-35/queue/max_discard_segments:1 /sys/block/dm-35/queue/max_hw_sectors_kb:32767 /sys/block/dm-35/queue/max_integrity_segments:0 /sys/block/dm-35/queue/max_sectors_kb:64 /sys/block/dm-35/queue/max_segments:2048 /sys/block/dm-35/queue/max_segment_size:4096 /sys/block/dm-35/queue/minimum_io_size:4096 /sys/block/dm-35/queue/nomerges:0 /sys/block/dm-35/queue/nr_requests:128 /sys/block/dm-35/queue/nr_zones:0 /sys/block/dm-35/queue/optimal_io_size:65536 /sys/block/dm-35/queue/physical_block_size:4096 /sys/block/dm-35/queue/read_ahead_kb:4096 /sys/block/dm-35/queue/rotational:0 /sys/block/dm-35/queue/rq_affinity:0 /sys/block/dm-35/queue/scheduler:none /sys/block/dm-35/queue/unpriv_sgio:0 /sys/block/dm-35/queue/write_cache:write through /sys/block/dm-35/queue/write_same_max_bytes:8388608 /sys/block/dm-35/queue/write_zeroes_max_bytes:8388608 /sys/block/dm-35/queue/zone_append_max_bytes:0 /sys/block/dm-35/queue/zoned:none But in the guest we see: # grep -s "" /sys/block/vdb/queue/* /sys/block/vdb/queue/add_random:0 /sys/block/vdb/queue/chunk_sectors:0 /sys/block/vdb/queue/dax:0 /sys/block/vdb/queue/discard_granularity:512 /sys/block/vdb/queue/discard_max_bytes:2147483136 /sys/block/vdb/queue/discard_max_hw_bytes:2147483136 /sys/block/vdb/queue/discard_zeroes_data:0 /sys/block/vdb/queue/fua:0 /sys/block/vdb/queue/hw_sector_size:512 /sys/block/vdb/queue/io_poll:0 /sys/block/vdb/queue/io_poll_delay:-1 /sys/block/vdb/queue/iostats:1 /sys/block/vdb/queue/io_timeout:30000 /sys/block/vdb/queue/logical_block_size:512 /sys/block/vdb/queue/max_discard_segments:1 /sys/block/vdb/queue/max_hw_sectors_kb:2147483647 /sys/block/vdb/queue/max_integrity_segments:0 /sys/block/vdb/queue/max_sectors_kb:1280 /sys/block/vdb/queue/max_segments:254 /sys/block/vdb/queue/max_segment_size:4294967295 /sys/block/vdb/queue/minimum_io_size:512 /sys/block/vdb/queue/nomerges:0 /sys/block/vdb/queue/nr_requests:256 /sys/block/vdb/queue/nr_zones:0 /sys/block/vdb/queue/optimal_io_size:0 /sys/block/vdb/queue/physical_block_size:512 /sys/block/vdb/queue/read_ahead_kb:4096 /sys/block/vdb/queue/rotational:1 /sys/block/vdb/queue/rq_affinity:1 /sys/block/vdb/queue/scheduler:[mq-deadline] kyber bfq none /sys/block/vdb/queue/unpriv_sgio:0 /sys/block/vdb/queue/wbt_lat_usec:75000 /sys/block/vdb/queue/write_cache:write back /sys/block/vdb/queue/write_same_max_bytes:0 /sys/block/vdb/queue/write_zeroes_max_bytes:2147483136 /sys/block/vdb/queue/zoned:none Normalizing the device name and sorting, we can see very different values: # diff -u host-sorted.txt guest-sorted.txt --- host-sorted.txt 2021-08-19 01:49:39.903772225 +0300 +++ guest-sorted.txt 2021-08-19 01:49:52.462190593 +0300 @@ -1,36 +1,36 @@ /sys/block/DEVICE/queue/add_random:0 /sys/block/DEVICE/queue/chunk_sectors:0 /sys/block/DEVICE/queue/dax:0 -/sys/block/DEVICE/queue/discard_granularity:1048576 -/sys/block/DEVICE/queue/discard_max_bytes:33554432 -/sys/block/DEVICE/queue/discard_max_hw_bytes:33554432 +/sys/block/DEVICE/queue/discard_granularity:512 +/sys/block/DEVICE/queue/discard_max_bytes:2147483136 +/sys/block/DEVICE/queue/discard_max_hw_bytes:2147483136 /sys/block/DEVICE/queue/discard_zeroes_data:0 /sys/block/DEVICE/queue/fua:0 /sys/block/DEVICE/queue/hw_sector_size:512 /sys/block/DEVICE/queue/io_poll:0 -/sys/block/DEVICE/queue/io_poll_delay:0 -/sys/block/DEVICE/queue/iostats:0 -/sys/block/DEVICE/queue/io_timeout:0 +/sys/block/DEVICE/queue/io_poll_delay:-1 +/sys/block/DEVICE/queue/iostats:1 +/sys/block/DEVICE/queue/io_timeout:30000 /sys/block/DEVICE/queue/logical_block_size:512 /sys/block/DEVICE/queue/max_discard_segments:1 -/sys/block/DEVICE/queue/max_hw_sectors_kb:32767 +/sys/block/DEVICE/queue/max_hw_sectors_kb:2147483647 /sys/block/DEVICE/queue/max_integrity_segments:0 -/sys/block/DEVICE/queue/max_sectors_kb:64 -/sys/block/DEVICE/queue/max_segments:2048 -/sys/block/DEVICE/queue/max_segment_size:4096 -/sys/block/DEVICE/queue/minimum_io_size:4096 +/sys/block/DEVICE/queue/max_sectors_kb:1280 +/sys/block/DEVICE/queue/max_segments:254 +/sys/block/DEVICE/queue/max_segment_size:4294967295 +/sys/block/DEVICE/queue/minimum_io_size:512 /sys/block/DEVICE/queue/nomerges:0 -/sys/block/DEVICE/queue/nr_requests:128 +/sys/block/DEVICE/queue/nr_requests:256 /sys/block/DEVICE/queue/nr_zones:0 -/sys/block/DEVICE/queue/optimal_io_size:65536 -/sys/block/DEVICE/queue/physical_block_size:4096 +/sys/block/DEVICE/queue/optimal_io_size:0 +/sys/block/DEVICE/queue/physical_block_size:512 /sys/block/DEVICE/queue/read_ahead_kb:4096 -/sys/block/DEVICE/queue/rotational:0 -/sys/block/DEVICE/queue/rq_affinity:0 -/sys/block/DEVICE/queue/scheduler:none +/sys/block/DEVICE/queue/rotational:1 +/sys/block/DEVICE/queue/rq_affinity:1 +/sys/block/DEVICE/queue/scheduler:[mq-deadline] kyber bfq none /sys/block/DEVICE/queue/unpriv_sgio:0 -/sys/block/DEVICE/queue/write_cache:write through -/sys/block/DEVICE/queue/write_same_max_bytes:8388608 -/sys/block/DEVICE/queue/write_zeroes_max_bytes:8388608 -/sys/block/DEVICE/queue/zone_append_max_bytes:0 +/sys/block/DEVICE/queue/wbt_lat_usec:75000 +/sys/block/DEVICE/queue/write_cache:write back +/sys/block/DEVICE/queue/write_same_max_bytes:0 +/sys/block/DEVICE/queue/write_zeroes_max_bytes:2147483136 /sys/block/DEVICE/queue/zoned:none Qemu version tested: # rpm -q qemu-kvm qemu-kvm-5.2.0-16.module+el8.4.0+12191+622ad3bb.5.x86_64 I see that we have new qemu version now: 15:5.2.0-16.module+el8.4.0+12221+7bf85bbe.6 So this may be a duplicate of a fixed bug.
I reproduce again with latest qemu-kvm: # rpm -q qemu-kvm qemu-kvm-5.2.0-16.module+el8.4.0+12221+7bf85bbe.6.x86_64 I tried 2 times, each time the first disk attached to the vm and resize was good. I could write 2g after the resize without any issue. Then I detached the disk and created a new 1g disk, and resize it again. This time writing 2g failed with EINVAL. I traced qemu during the writes. I also check output of: grep -s "" /sys/block/dm-xx/queue/* | sed -e 's/dm-xx/DEVICE/' | sort -> host.txt on the host, and: grep -s "" /sys/block/vdb/queue/* | sed -e 's/vdb/DEVICE/' | sort -> guest.txt And the diff between the files. I also checked that guest device does not change after the resize. The host-guest.diff looks similar to the one we got with the previous build (see comment 38).
Created attachment 1815396 [details] device details and traces for comment 39
(In reply to sshmulev from comment #11) > (In reply to sshmulev from comment #10) > > Just forgot to mention the version it reproduced: > ovirt-engine-4.4.8.4-0.7.el8ev.noarch > vdsm-4.40.80.5-1.el8ev.x86_64 > QEMU emulator version 5.2.0 > (qemu-kvm-5.2.0-16.module+el8.4.0+12191+622ad3bb.5) > > This can be reproduced by automation, I am trying to reproduce it manually > right now. Here are the steps of the automation Test case that this issue has been reproduced: 1) Create origin VM and 4 disks (raw_virtio, raw_scsi_virtio, qcow2_virtio, qcow2_scsi_virtio) ,attach to the vm 2) Create a file system, mount to those 4 disks and create a file in each. 3) Make a template of this vm. 4) Create cloned VM from this template. 5) Run the VM -The VM is getting into pause mode in this phase. Although it reproduces in this automation flow I couldn't reproduce it manually, probably some step is missing.
Does this issue have same reason ? Bug 1989717 - [regression] blk_update_request: I/O error when generating load on iSCS
Adding info on how we debug this issue, hopefully this will others in the future. First we reproduced the issue using the test script (attachment 1815821 [details]). In the guest: ./test.sh /dev/vdb With virtio-scsi disk, the tests pass every time. With virtio disk, the the vm will pause after (2-7) iterations. # ./test.sh /dev/vda pass 001 ... ok pass 002 ... ok pass 003 ... ok pass 004 ... ok pass 005 ... We reproduced the issue with multiple times: qemu-kvm-5.2.0-16.module+el8.4.0+12221+7bf85bbe.6.x86_64 Then we downgraded qemu to: qemu-kvm-5.2.0-16.module+el8.4.0+12191+622ad3bb.5.x86_64 This is the last version before the suspected change (bug 1955187). With this version the issue was not reproducible. We run 100 iterations with strace, and 100 iterations without strace (in case tracing slows down the process hiding the bug). We tried to understand why io_submit() fails with EINVAL (comment 38). I wrote a script parsing strace output, checking if all the offsets and buffer sizes are aligned to 512 bytes, the number of iovec in a request and the total size of the request. The stats show that everything was valid in a failing request: unalinged requests: [] iov_len_max: 349 iov_len_total: 18497536 I'm not posting the script since it was wrong (more on this later). The only thing missing was the address of the buffers which strace does not show. We wrote a systemtap script (attachment 1815823 [details]) to get the complete info for the failing io_submit. Here is example fragment from the otuput: [info] tracking aio starts at Thu Aug 19 20:31:16 2021 [IO iothread1] io_submit(0x7ff19000c000, 1, 0x7ff18bffc4b0) iocb[ 0] = 0x7ff1394e56a0, fd = 22, opcode = 7, offset = 33923072, niovec = 1, iovec = 0x7ff184008d08 iovec[ 0] = 0x7ff184008d08, iov_base = 0x7ff1510b9000, iov_len = 4096 [IO iothread1] io_submit(0x7ff19000c000, 1, 0x7ff18bffc4b0) iocb[ 0] = 0x7ff18a0d31d0, fd = 19, opcode = 7, offset = 671514624, niovec = 4, iovec = 0x7ff18400f6c0 iovec[ 0] = 0x7ff18400f6c0, iov_base = 0x7ff154fa0000, iov_len = 4096 iovec[ 1] = 0x7ff18400f6d0, iov_base = 0x7ff14f508000, iov_len = 4096 iovec[ 2] = 0x7ff18400f6e0, iov_base = 0x7ff15e73b000, iov_len = 4096 iovec[ 3] = 0x7ff18400f6f0, iov_base = 0x7ff1502ec000, iov_len = 4096 [IO iothread1] io_submit(0x7ff19000c000, 1, 0x7ff18bffc4b0) iocb[ 0] = 0x7ff19013e1d0, fd = 19, opcode = 7, offset = 671531008, niovec = 10, iovec = 0x7ff18400bf10 iovec[ 0] = 0x7ff18400bf10, iov_base = 0x7ff15c82b000, iov_len = 4096 iovec[ 1] = 0x7ff18400bf20, iov_base = 0x7ff150cea000, iov_len = 4096 iovec[ 2] = 0x7ff18400bf30, iov_base = 0x7ff150758000, iov_len = 4096 iovec[ 3] = 0x7ff18400bf40, iov_base = 0x7ff150460000, iov_len = 4096 iovec[ 4] = 0x7ff18400bf50, iov_base = 0x7ff14f2f4000, iov_len = 4096 iovec[ 5] = 0x7ff18400bf60, iov_base = 0x7ff150832000, iov_len = 4096 iovec[ 6] = 0x7ff18400bf70, iov_base = 0x7ff14f34c000, iov_len = 4096 iovec[ 7] = 0x7ff18400bf80, iov_base = 0x7ff155ff2000, iov_len = 4096 iovec[ 8] = 0x7ff18400bf90, iov_base = 0x7ff152114000, iov_len = 4096 iovec[ 9] = 0x7ff18400bfa0, iov_base = 0x7ff150104000, iov_len = 4096 When reproducing again with tracing the last request shows: [IO iothread1] io_submit(0x7ff19000c000, 1, 0x7ff18bffc4a0) iocb[ 0] = 0x7ff12dfe17b0, fd = 32, opcode = 8, offset = 1153433600, niovec = 1137, iovec = 0x7ff1840289f0 iovec[ 0] = 0x7ff1840289f0, iov_base = 0x7ff15bec1000, iov_len = 258048 iovec[ 1] = 0x7ff184028a00, iov_base = 0x7ff15710c000, iov_len = 212992 iovec[ 2] = 0x7ff184028a10, iov_base = 0x7ff15c0fa000, iov_len = 4096 iovec[ 3] = 0x7ff184028a20, iov_base = 0x7ff15c1f6000, iov_len = 8192 ... This is clearly wrong - more than IOV_MAX (1024): niovec = 1137 Turns out that strace truncates long buffers list: {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}, ...], so the strace stats were wrong. We could use strace -v to avoid the truncation, but even then the script was wrong since we may have multiple iocb in single io_submit and script just extracting iov_len=(\d+) from all iocbs in a single io_submit. Kevin, can you explain why why limiting the number iovec broke by the fix for bug 1955187?
RHV needs this fix for 8.4.z. We have customers waiting for bug 1955187 and we cannot deliver the fix without fixing this issue.
I tried 8.4 and 8.5 ,but i do not reproduce this. According to discussion with Nir, this bug is not related to resize operation indeed. Red Hat Enterprise Linux release 8.4 (Ootpa) 4.18.0-305.el8.x86_64 qemu-kvm-common-5.2.0-16.module+el8.4.0+12221+7bf85bbe.6.x86_64 Red Hat Enterprise Linux release 8.5 Beta (Ootpa) 4.18.0-330.el8.x86_64 qemu-kvm-common-6.0.0-27.module+el8.5.0+12121+c40c8708.x86_64 Steps: 1γENV prepare Build iscsi server targetcli ls: o- / ................................................................................................ [...] o- backstores ..................................................................................... [...] | o- block ......................................................................... [Storage Objects: 0] | o- fileio ........................................................................ [Storage Objects: 1] | | o- one ........................................ [/home/iscsi/onex.img (30.0GiB) write-back activated] | | o- alua .......................................................................... [ALUA Groups: 1] | | o- default_tg_pt_gp .............................................. [ALUA state: Active/optimized] | o- pscsi ......................................................................... [Storage Objects: 0] | o- ramdisk ....................................................................... [Storage Objects: 0] o- iscsi ................................................................................... [Targets: 1] | o- iqn.2016-06.one.server:one-a ............................................................. [TPGs: 1] | o- tpg1 ...................................................................... [no-gen-acls, no-auth] | o- acls ................................................................................. [ACLs: 2] | | o- iqn.1994-05.com.redhat:clienta .............................................. [Mapped LUNs: 1] | | | o- mapped_lun0 ......................................................... [lun0 fileio/one (rw)] | | o- iqn.1994-05.com.redhat:clientb .............................................. [Mapped LUNs: 1] | | o- mapped_lun0 ......................................................... [lun0 fileio/one (rw)] | o- luns ................................................................................. [LUNs: 1] | | o- lun0 .................................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)] | o- portals ........................................................................... [Portals: 1] | o- 0.0.0.0:3260 ............................................................................ [OK] o- loopback ................................................................................ [Targets: 2.on the host attach the iscsi disk and create lvm pvcreate /dev/sdb vgcreate vg /dev/sdb lvcreate -L 3G -n lv1 vg lvcreate -L 3G -n lv2 vg (It is not necessary for raw format) qemu-img create -f qcow2 /dev/vg/lv1 2G qemu-img create -f qcow2 /dev/vg/lv2 2G 3.boot vm with lvm /usr/libexec/qemu-kvm \ -name src_vm1 \ -machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \ -m 8g \ -no-user-config -nodefaults \ -vga qxl \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,bus=pcie.0,multifunction=on \ -device pcie-root-port,id=pcie.0-root-port-2-1,chassis=3,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,id=pcie.0-root-port-2-2,chassis=4,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-8,slot=8,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-9,slot=9,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-2-2,addr=0x0,iothread=iothread0 \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \ -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=drive_image1 \ -device scsi-hd,id=os1,drive=drive_image1,bootindex=0 \ \ -blockdev node-name=host_device_stg,driver=host_device,aio=native,filename=/dev/vg/lv1,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg \ -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-4,addr=0x0,write-cache=on,id=stg,drive=drive_stg,rerror=stop,werror=stop \ \ \ -vnc :5 \ -qmp tcp:0:5955,server,nowait \ -monitor stdio \ -device virtio-net-pci,mac=9a:b5:b6:b1:b4:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-5,addr=0x0 \ -netdev tap,id=idxgXAlm 4.login guest and execute io ./test.sh /dev/vda 5.wait 2-3 minutes and hotplug new disk {"execute": "qmp_capabilities"} {"execute": "blockdev-add", "arguments": {"node-name": "file_stg2", "driver": "host_device", "auto-read-only": true, "discard": "unmap", "aio": "threads", "filename": "/dev/vg/lv2", "cache": {"direct": true, "no-flush": false}}} {"execute": "blockdev-add", "arguments": {"node-name": "data_stg2", "driver": "qcow2", "file": "file_stg2", "read-only": false, "cache": {"direct": true, "no-flush": false}}} {"execute": "device_add", "arguments": {"driver": "virtio-blk-pci", "id": "stg2", "drive": "data_stg2", "write-cache": "on", "iothread": "iothread1", "bus": "pcie.0-root-port-9", "addr": "0x0"}} 5.execute io on the plugged disk ./test.sh /dev/vdb But it does not hit the issue, and same result for raw format of the lvm. Also tested local file also not hit this issue. It looks like related to specific ENV? Any suggestion for the reproducer?
(In reply to qing.wang from comment #61) > I tried 8.4 and 8.5 ,but i do not reproduce this. > > According to discussion with Nir, this bug is not related to resize > operation indeed. > > But it does not hit the issue, and same result for raw format of the lvm. > Also tested local file also not hit this issue. > It looks like related to specific ENV? Nope, it reproduced on different ENVs. > > Any suggestion for the reproducer? Did you try with the flow in comment 40 / the original flow?
(In reply to qing.wang from comment #61) > But it does not hit the issue, and same result for raw format of the lvm. > Also tested local file also not hit this issue. > It looks like related to specific ENV? Can you check the value you get for /sys/block/sdb/queue/max_segments? The bug can only reproduce if it's larger than 1024. The storage Nir used had 2048 (see comment 38).
(In reply to qing.wang from comment #52) > Does this issue have same reason ? Bug 1989717 - [regression] > blk_update_request: I/O error when generating load on iSCS Potentially yes, but I don't have enough information to tell for sure.
(In reply to Kevin Wolf from comment #63) > (In reply to qing.wang from comment #61) > > But it does not hit the issue, and same result for raw format of the lvm. > > Also tested local file also not hit this issue. > > It looks like related to specific ENV? > > Can you check the value you get for /sys/block/sdb/queue/max_segments? The > bug can only reproduce if it's larger than 1024. The storage Nir used had > 2048 (see comment 38). on host root@dell-per440-07 /home $ lsblk -n ... sdd 8:48 0 20G 0 disk \u251c\u2500vg-lv1 253:3 0 3G 0 lvm \u2514\u2500vg-lv2 253:4 0 3G 0 lvm for f in /sys/block/sdd/queue/*; do echo -n "$f: "; cat $f; done /sys/block/sdd/queue/add_random: 1 /sys/block/sdd/queue/chunk_sectors: 0 /sys/block/sdd/queue/dax: 0 /sys/block/sdd/queue/discard_granularity: 0 /sys/block/sdd/queue/discard_max_bytes: 0 /sys/block/sdd/queue/discard_max_hw_bytes: 0 /sys/block/sdd/queue/discard_zeroes_data: 0 /sys/block/sdd/queue/fua: 1 /sys/block/sdd/queue/hw_sector_size: 512 /sys/block/sdd/queue/io_poll: 0 /sys/block/sdd/queue/io_poll_delay: -1 /sys/block/sdd/queue/iosched: cat: /sys/block/sdd/queue/iosched: Is a directory /sys/block/sdd/queue/iostats: 1 /sys/block/sdd/queue/io_timeout: 30000 /sys/block/sdd/queue/logical_block_size: 512 /sys/block/sdd/queue/max_discard_segments: 1 /sys/block/sdd/queue/max_hw_sectors_kb: 32767 /sys/block/sdd/queue/max_integrity_segments: 0 /sys/block/sdd/queue/max_sectors_kb: 8192 /sys/block/sdd/queue/max_segments: 2048 /sys/block/sdd/queue/max_segment_size: 4096 /sys/block/sdd/queue/minimum_io_size: 512 /sys/block/sdd/queue/nomerges: 0 /sys/block/sdd/queue/nr_requests: 254 /sys/block/sdd/queue/nr_zones: 0 /sys/block/sdd/queue/optimal_io_size: 8388608 /sys/block/sdd/queue/physical_block_size: 512 /sys/block/sdd/queue/read_ahead_kb: 16384 /sys/block/sdd/queue/rotational: 1 /sys/block/sdd/queue/rq_affinity: 1 /sys/block/sdd/queue/scheduler: [mq-deadline] kyber bfq none /sys/block/sdd/queue/unpriv_sgio: 0 /sys/block/sdd/queue/wbt_lat_usec: 75000 /sys/block/sdd/queue/write_cache: write back /sys/block/sdd/queue/write_same_max_bytes: 33553920 /sys/block/sdd/queue/write_zeroes_max_bytes: 33553920 /sys/block/sdd/queue/zone_append_max_bytes: 0 /sys/block/sdd/queue/zoned: none on guest: ... vda 252:0 0 2G 0 disk f in /sys/block/vda/queue/*; do echo -n "$f: "; cat $f; done /sys/block/vda/queue/add_random: 0 /sys/block/vda/queue/chunk_sectors: 0 /sys/block/vda/queue/dax: 0 /sys/block/vda/queue/discard_granularity: 512 /sys/block/vda/queue/discard_max_bytes: 2147483136 /sys/block/vda/queue/discard_max_hw_bytes: 2147483136 /sys/block/vda/queue/discard_zeroes_data: 0 /sys/block/vda/queue/fua: 0 /sys/block/vda/queue/hw_sector_size: 512 /sys/block/vda/queue/io_poll: 0 /sys/block/vda/queue/io_poll_delay: -1 /sys/block/vda/queue/iosched: cat: /sys/block/vda/queue/iosched: Is a directory /sys/block/vda/queue/iostats: 1 /sys/block/vda/queue/io_timeout: 30000 /sys/block/vda/queue/logical_block_size: 512 /sys/block/vda/queue/max_discard_segments: 1 /sys/block/vda/queue/max_hw_sectors_kb: 2147483647 /sys/block/vda/queue/max_integrity_segments: 0 /sys/block/vda/queue/max_sectors_kb: 1280 /sys/block/vda/queue/max_segments: 254 /sys/block/vda/queue/max_segment_size: 4294967295 /sys/block/vda/queue/minimum_io_size: 512 /sys/block/vda/queue/nomerges: 0 /sys/block/vda/queue/nr_requests: 256 /sys/block/vda/queue/nr_zones: 0 /sys/block/vda/queue/optimal_io_size: 0 /sys/block/vda/queue/physical_block_size: 512 /sys/block/vda/queue/read_ahead_kb: 4096 /sys/block/vda/queue/rotational: 1 /sys/block/vda/queue/rq_affinity: 1 /sys/block/vda/queue/scheduler: [mq-deadline] kyber bfq none /sys/block/vda/queue/unpriv_sgio: 0 /sys/block/vda/queue/wbt_lat_usec: 75000 /sys/block/vda/queue/write_cache: write back /sys/block/vda/queue/write_same_max_bytes: 0 /sys/block/vda/queue/write_zeroes_max_bytes: 2147483136 /sys/block/vda/queue/zone_append_max_bytes: 0 /sys/block/vda/queue/zoned: none
after host attach the iscsi disk ,then echo 64 > /sys/block/sdd/queue/max_sectors_kb start testing as #c61 the qemu crashed bt info #0 0x00007fc8ee00037f in raise () at /lib64/libc.so.6 #1 0x00007fc8edfeadb5 in abort () at /lib64/libc.so.6 #2 0x00007fc8edfeac89 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fc8edff8a76 in .annobin_assert.c_end () at /lib64/libc.so.6 #4 0x00005642d2ce3498 in qemu_iovec_init_extended (qiov=qiov@entry=0x7fc570c8cbe0, head_buf=head_buf@entry=0x0, head_len=head_len@entry=0, mid_qiov=mid_qiov@entry=0x7fc8d4020e78, mid_offset=<optimized out>, mid_len=mid_len@entry=8192, tail_buf=0x0, tail_len=0) at ../util/iov.c:428 #5 0x00005642d2ce358f in qemu_iovec_init_slice (qiov=qiov@entry=0x7fc570c8cbe0, source=source@entry=0x7fc8d4020e78, offset=<optimized out>, len=len@entry=8192) at ../util/iov.c:515 #6 0x00005642d2c17f86 in bdrv_driver_pwritev (bs=bs@entry=0x5642d470d160, offset=offset@entry=1368064, bytes=bytes@entry=8192, qiov=qiov@entry=0x7fc8d4020e78, qiov_offset=qiov_offset@entry=0, flags=flags@entry=0) at ../block/io.c:1227 #7 0x00005642d2c19df7 in bdrv_aligned_pwritev (child=0x5642d45bba00, req=0x7fc570c8cda0, offset=1368064, bytes=8192, align=<optimized out>, qiov=0x7fc8d4020e78, qiov_offset=0, flags=0) at ../block/io.c:2088 #8 0x00005642d2c1a685 in bdrv_co_pwritev_part (child=0x5642d45bba00, offset=<optimized out>, offset@entry=1368064, bytes=<optimized out>, bytes@entry=8192, qiov=<optimized out>, qiov@entry=0x7fc8d4020e78, qiov_offset=<optimized out>, qiov_offset@entry=0, flags=flags@entry=0) at ../block/io.c:2272 --Type <RET> for more, q to quit, c to continue without paging-- #9 0x00005642d2c0c640 in qcow2_co_pwritev_task (l2meta=<optimized out>, qiov_offset=<optimized out>, qiov=0x7fc8d4020e78, bytes=<optimized out>, offset=<optimized out>, host_offset=<optimized out>, bs=0x5642d4713450) at ../block/qcow2.c:2535 #10 0x00005642d2c0c640 in qcow2_co_pwritev_task_entry (task=<optimized out>) at ../block/qcow2.c:2565 #11 0x00005642d2c1ce91 in aio_task_co (opaque=0x7fc8d40502a0) at ../block/aio_task.c:45 #12 0x00005642d2cebfa3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 #13 0x00007fc8ee015f50 in __start_context () at /lib64/libc.so.6 #14 0x00007fc5734b42f0 in () #15 0x0000000000000000 in () But i am not sure it is a valid testing, just using same value with comment 38.
(In reply to qing.wang from comment #66) > after host attach the iscsi disk ,then > echo 64 > /sys/block/sdd/queue/max_sectors_kb > > start testing as #c61 the qemu crashed > > bt info > > #0 0x00007fc8ee00037f in raise () at /lib64/libc.so.6 > #1 0x00007fc8edfeadb5 in abort () at /lib64/libc.so.6 > #2 0x00007fc8edfeac89 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 > #3 0x00007fc8edff8a76 in .annobin_assert.c_end () at /lib64/libc.so.6 > #4 0x00005642d2ce3498 in qemu_iovec_init_extended > (qiov=qiov@entry=0x7fc570c8cbe0, head_buf=head_buf@entry=0x0, > head_len=head_len@entry=0, mid_qiov=mid_qiov@entry=0x7fc8d4020e78, > mid_offset=<optimized out>, mid_len=mid_len@entry=8192, tail_buf=0x0, > tail_len=0) at ../util/iov.c:428 > #5 0x00005642d2ce358f in qemu_iovec_init_slice > (qiov=qiov@entry=0x7fc570c8cbe0, source=source@entry=0x7fc8d4020e78, > offset=<optimized out>, len=len@entry=8192) at ../util/iov.c:515 > #6 0x00005642d2c17f86 in bdrv_driver_pwritev > (bs=bs@entry=0x5642d470d160, offset=offset@entry=1368064, > bytes=bytes@entry=8192, qiov=qiov@entry=0x7fc8d4020e78, > qiov_offset=qiov_offset@entry=0, flags=flags@entry=0) at ../block/io.c:1227 > #7 0x00005642d2c19df7 in bdrv_aligned_pwritev > (child=0x5642d45bba00, req=0x7fc570c8cda0, offset=1368064, bytes=8192, > align=<optimized out>, qiov=0x7fc8d4020e78, qiov_offset=0, flags=0) at > ../block/io.c:2088 > #8 0x00005642d2c1a685 in bdrv_co_pwritev_part (child=0x5642d45bba00, > offset=<optimized out>, > offset@entry=1368064, bytes=<optimized out>, bytes@entry=8192, > qiov=<optimized out>, > qiov@entry=0x7fc8d4020e78, qiov_offset=<optimized out>, > qiov_offset@entry=0, flags=flags@entry=0) > at ../block/io.c:2272 > --Type <RET> for more, q to quit, c to continue without paging-- > #9 0x00005642d2c0c640 in qcow2_co_pwritev_task > (l2meta=<optimized out>, qiov_offset=<optimized out>, > qiov=0x7fc8d4020e78, bytes=<optimized out>, offset=<optimized out>, > host_offset=<optimized out>, bs=0x5642d4713450) at ../block/qcow2.c:2535 > #10 0x00005642d2c0c640 in qcow2_co_pwritev_task_entry (task=<optimized out>) > at ../block/qcow2.c:2565 > #11 0x00005642d2c1ce91 in aio_task_co (opaque=0x7fc8d40502a0) at > ../block/aio_task.c:45 > #12 0x00005642d2cebfa3 in coroutine_trampoline (i0=<optimized out>, > i1=<optimized out>) > at ../util/coroutine-ucontext.c:173 > #13 0x00007fc8ee015f50 in __start_context () at /lib64/libc.so.6 > #14 0x00007fc5734b42f0 in () > #15 0x0000000000000000 in () > > > But i am not sure it is a valid testing, just using same value with comment > 38. coredump file: http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/qbugs/1994494/2021-08-23/core.qemu-kvm.0.7571555d3c4c494184247e24c4c38c25.2961.1629724670000000.lz4
Can we get the qa_ack+ please (needed for release+) - resolution of any issues can utilize this bug (and the related z-stream bug) since it has blocker+.
This issue indeed related to specific HW. It can not reproduce on all iscsi device. I can reproduce this issue if change the value max_sectors_kb to 64 in my ENV. original value is following for f in /sys/block/sdd/queue/[mlp]*; do echo -n "$f: "; cat $f; done /sys/block/sdd/queue/logical_block_size: 512 /sys/block/sdd/queue/max_discard_segments: 1 /sys/block/sdd/queue/max_hw_sectors_kb: 32767 /sys/block/sdd/queue/max_integrity_segments: 0 /sys/block/sdd/queue/max_sectors_kb: 8192 /sys/block/sdd/queue/max_segments: 2048 /sys/block/sdd/queue/max_segment_size: 4096 /sys/block/sdd/queue/minimum_io_size: 512 /sys/block/sdd/queue/physical_block_size: 512 change the value to 64 echo 64 > /sys/block/sdd/queue/max_sectors_kb for the other backend device like as loop device and scsi_debug. it can not reproduce. Test steps 1.build iscis target server root@dell-per440-07 /home/tmp $ targetcli ls o- / ........................................................................................... [...] o- backstores ................................................................................ [...] | o- block .................................................................... [Storage Objects: 0] | o- fileio ................................................................... [Storage Objects: 1] | | o- one ................................... [/home/iscsi/onex.img (15.0GiB) write-back activated] | | o- alua ..................................................................... [ALUA Groups: 1] | | o- default_tg_pt_gp ......................................... [ALUA state: Active/optimized] | o- pscsi .................................................................... [Storage Objects: 0] | o- ramdisk .................................................................. [Storage Objects: 0] o- iscsi .............................................................................. [Targets: 1] | o- iqn.2016-06.one.server:one-a ........................................................ [TPGs: 1] | o- tpg1 ................................................................. [no-gen-acls, no-auth] | o- acls ............................................................................ [ACLs: 2] | | o- iqn.1994-05.com.redhat:clienta ......................................... [Mapped LUNs: 1] | | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)] | | o- iqn.1994-05.com.redhat:clientb ......................................... [Mapped LUNs: 1] | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)] | o- luns ............................................................................ [LUNs: 1] | | o- lun0 ............................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)] | o- portals ...................................................................... [Portals: 1] | o- 0.0.0.0:3260 ....................................................................... [OK] o- loopback ........................................................................... [Targets: 0] 2.attach iscsi disk on host iscsiadm -m discovery -t st -p 127.0.0.1 iscsiadm -m node -T iqn.2016-06.one.server:one-a -p 127.0.0.1:3260 -l 3.change the value to 64 echo 64 > /sys/block/sdd/queue/max_sectors_kb 4.create lvm on the disk pvcreate /dev/sdd vgcreate vg /dev/sdd lvcreate -L 2560M -n lv1 vg;lvcreate -L 2560M -n lv2 vg #optional for qcow2 format qemu-img create -f qcow2 /dev/vg/lv1 2G;qemu-img create -f qcow2 /dev/vg/lv2 2G 5.boot the vm with two lvm as blk device /usr/libexec/qemu-kvm \ -name src_vm1 \ -machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \ -m 8g \ -no-user-config -nodefaults \ -vga qxl \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,bus=pcie.0,multifunction=on \ -device pcie-root-port,id=pcie.0-root-port-2-1,chassis=3,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,id=pcie.0-root-port-2-2,chassis=4,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-8,slot=8,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-9,slot=9,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-2-2,addr=0x0,iothread=iothread0 \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \ -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=drive_image1 \ -device scsi-hd,id=os1,drive=drive_image1,bootindex=0 \ \ -blockdev node-name=host_device_stg,driver=host_device,aio=native,filename=/dev/vg/lv1,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg \ -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-4,addr=0x0,write-cache=on,id=stg,drive=drive_stg,rerror=stop,werror=stop \ \ -blockdev node-name=host_device_stg2,driver=host_device,aio=native,filename=/dev/vg/lv2,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg2,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg2 \ -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-5,addr=0x0,write-cache=on,id=stg2,drive=drive_stg2,rerror=stop,werror=stop \ \ \ -vnc :5 \ -qmp tcp:0:5955,server,nowait \ -monitor stdio \ -device virtio-net-pci,mac=9a:b5:b6:b1:b4:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-6,addr=0x0 \ -netdev tap,id=idxgXAlm 6.execute io on two disk in guest root@bootp-73-199-218 /home $ cat test.sh for i in $(seq 100); do printf "pass %03d ... " $i dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none if [[ "$?" != "0" ]];then break;fi echo "ok" done ./test.sh /dev/vda ./test.sh /dev/vdb reproduce on Red Hat Enterprise Linux release 8.5 Beta (Ootpa) 4.18.0-330.el8.x86_64 qemu-kvm-common-6.0.0-27.module+el8.5.0+12121+c40c8708.x86_64 Passed testing on qemu-kvm-common-5.2.0-16.el8.6.bz1994494.x86_64 in comment #46
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.
Passed test on Red Hat Enterprise Linux release 8.5 Beta (Ootpa) 4.18.0-330.el8.x86_64 qemu-kvm-common-6.0.0-29.module+el8.5.0+12386+43574bac.x86_64 Test steps: 1.build iscis target server root@dell-per440-07 /home/tmp $ targetcli ls o- / ........................................................................................... [...] o- backstores ................................................................................ [...] | o- block .................................................................... [Storage Objects: 0] | o- fileio ................................................................... [Storage Objects: 1] | | o- one ................................... [/home/iscsi/onex.img (15.0GiB) write-back activated] | | o- alua ..................................................................... [ALUA Groups: 1] | | o- default_tg_pt_gp ......................................... [ALUA state: Active/optimized] | o- pscsi .................................................................... [Storage Objects: 0] | o- ramdisk .................................................................. [Storage Objects: 0] o- iscsi .............................................................................. [Targets: 1] | o- iqn.2016-06.one.server:one-a ........................................................ [TPGs: 1] | o- tpg1 ................................................................. [no-gen-acls, no-auth] | o- acls ............................................................................ [ACLs: 2] | | o- iqn.1994-05.com.redhat:clienta ......................................... [Mapped LUNs: 1] | | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)] | | o- iqn.1994-05.com.redhat:clientb ......................................... [Mapped LUNs: 1] | | o- mapped_lun0 .................................................... [lun0 fileio/one (rw)] | o- luns ............................................................................ [LUNs: 1] | | o- lun0 ............................. [fileio/one (/home/iscsi/onex.img) (default_tg_pt_gp)] | o- portals ...................................................................... [Portals: 1] | o- 0.0.0.0:3260 ....................................................................... [OK] o- loopback ........................................................................... [Targets: 0] 2.attach iscsi disk on host iscsiadm -m discovery -t st -p 127.0.0.1 iscsiadm -m node -T iqn.2016-06.one.server:one-a -p 127.0.0.1:3260 -l 3.change the value to 64 echo 64 > /sys/block/sdd/queue/max_sectors_kb 4.create lvms on the disk pvcreate /dev/sdd vgcreate vg /dev/sdd lvcreate -L 2560M -n lv1 vg;lvcreate -L 2560M -n lv2 vg lvcreate -L 2560M -n lv3 vg;lvcreate -L 2560M -n lv4 vg qemu-img create -f qcow2 /dev/vg/lv1 2G;qemu-img create -f qcow2 /dev/vg/lv2 2G qemu-img create -f qcow2 /dev/vg/lv3 2G;qemu-img create -f qcow2 /dev/vg/lv4 2G 5.boot the vm with two lvm as blk device /usr/libexec/qemu-kvm \ -name src_vm1 \ -machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \ -m 8g \ -no-user-config -nodefaults \ -vga qxl \ -device pcie-root-port,id=pcie.0-root-port-2,slot=2,bus=pcie.0,multifunction=on \ -device pcie-root-port,id=pcie.0-root-port-2-1,chassis=3,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,id=pcie.0-root-port-2-2,chassis=4,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,id=pcie.0-root-port-3,slot=3,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-4,slot=4,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-5,slot=5,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-6,slot=6,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-7,slot=7,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-8,slot=8,bus=pcie.0 \ -device pcie-root-port,id=pcie.0-root-port-9,slot=9,bus=pcie.0 \ -device qemu-xhci,id=usb1,bus=pcie.0-root-port-2-1,addr=0x0 \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -object iothread,id=iothread0 \ -object iothread,id=iothread1 \ -device virtio-scsi-pci,id=scsi0,bus=pcie.0-root-port-2-2,addr=0x0,iothread=iothread0 \ -device virtio-scsi-pci,id=scsi1,bus=pcie.0-root-port-8,addr=0x0 \ -blockdev driver=qcow2,file.driver=file,cache.direct=off,cache.no-flush=on,file.filename=/home/kvm_autotest_root/images/rhel840-64-virtio-scsi.qcow2,node-name=drive_image1 \ -device scsi-hd,id=os1,drive=drive_image1,bootindex=0 \ \ -blockdev node-name=host_device_stg,driver=host_device,aio=native,filename=/dev/vg/lv1,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg \ -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-4,addr=0x0,write-cache=on,id=stg,drive=drive_stg,rerror=stop,werror=stop \ \ -blockdev node-name=host_device_stg2,driver=host_device,aio=native,filename=/dev/vg/lv2,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg2,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg2 \ -device virtio-blk-pci,iothread=iothread1,bus=pcie.0-root-port-5,addr=0x0,write-cache=on,id=stg2,drive=drive_stg2,rerror=stop,werror=stop \ \ -blockdev node-name=host_device_stg3,driver=host_device,aio=native,filename=/dev/vg/lv3,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg3,driver=raw,cache.direct=on,cache.no-flush=off,file=host_device_stg3 \ -device scsi-hd,write-cache=on,id=stg3,drive=drive_stg3,rerror=stop,werror=stop \ \ -blockdev node-name=host_device_stg4,driver=host_device,aio=native,filename=/dev/vg/lv4,cache.direct=on,cache.no-flush=off,discard=unmap \ -blockdev node-name=drive_stg4,driver=qcow2,cache.direct=on,cache.no-flush=off,file=host_device_stg4 \ -device scsi-hd,write-cache=on,id=stg4,drive=drive_stg4,rerror=stop,werror=stop \ \ \ -vnc :5 \ -qmp tcp:0:5955,server,nowait \ -monitor stdio \ -device virtio-net-pci,mac=9a:b5:b6:b1:b4:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0-root-port-6,addr=0x0 \ -netdev tap,id=idxgXAlm 6.execute io on disks in guest root@bootp-73-199-218 /home $ cat test.sh for i in $(seq 100); do printf "pass %03d ... " $i dd if=/dev/zero bs=1M count=2048 of=$1 conv=fsync status=none if [[ "$?" != "0" ]];then break;fi echo "ok" done ./test.sh /dev/vda ./test.sh /dev/vdb ./test.sh /dev/sdb ./test.sh /dev/sdc All io passed and guest keep running status
*** Bug 1999114 has been marked as a duplicate of this bug. ***
*** Bug 1999051 has been marked as a duplicate of this bug. ***
*** Bug 1989717 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4684