Description of problem: VM is not running on the source host after migration - as expected, although there is a try to set the threshold for the disk, plus this should not be logged as an error. Version-Release number of selected component (if applicable): Versions: vdsm-4.40.70.6-1.el8ev.x86_64 ovirt-engine-4.4.7.6-0.11.el8ev QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+11536+725e25d9.2) How reproducible: Need to make multiple extensions during the migration to hit the issue. Steps to Reproduce: 1. Create a VM with a thin disk - better to make the disk big enough (for the many tries until we hit the extension during migration). 2. Better to modify in all the hosts the VM is migrated to/from, and SPM host: vi /etc/vdsm/vdsm.conf.d/volume-utilization.conf I put these values and then restart vdsm. [irs] volume_utilization_percent = 50 volume_utilization_chunk_mb = 128 3. Make the VM migration slow, so we can do multiple extends during the migration. This can be done using stress tool, for example: dnf install stress stress --VM 1 --VM-bytes 128M This step is optional if the migration finishes too fast (I could reproduce it without it). 4. Write data in the VM to trigger extension. for i in $(seq -w 11); do echo "writing..." dd if=/dev/zero bs=1M count=64 of=data.$i oflag=direct conv=fsync echo "Waiting..." sleep 6 done 5. Start the script in the VM and right after start the migration Actual results: 2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/8735fa39-a148-43df-a78e-4f2e5ccdf688/0c3111fb-ef7e-49cf-9638-b3ca25afb983): Requested operation is not valid: domain is not running (drivemonitor:128) 2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') cannot cont while Down (vm:1709) 2021-07-08 17:19:52,589+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/10883a68-06ec-460b-a115-d23260482fa9/6f350411-34ce-4b13-8214-17aa0ea829b9): Domain not found: no domain with matching uuid '2eb53807-6c20-4976-9a8b-31531b5fdef1' (New_VM1) (drivemonitor:128) 2021-07-08 17:19:52,590+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') cannot cont while Down (vm:1709) Expected results: This is an expected condition, so shouldn't be logged as an ERROR.
The bug show that when we try to set block threshold, we are still using virdomain.Notifying. But when migration is completed, the vm must switch to virdomain.Disconnected or virdomain.Defined, which avoid this issue by raising a specific virdomain.NotConnectedError, that can be safely silenced. The current error displayed here come from libvirt, and it does not have a specific error code that can be used to silence the error.
Vojtech, can you please verify this bug with a proper comment?
(In reply to Eyal Shenitzky from comment #2) > Vojtech, can you please verify this bug with a proper comment? now, there's no error message, but this debug message: 2021-08-04 10:18:28,730-0400 DEBUG (mailbox-hsm/0) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Domain not connected, skipping set block threshold fordrive 'sdd': Requested operation is not valid: domain is not running (drivemonitor:128) As for switching domains (see Nir's comment #1), this is WIP and I created BZ #2000046 to track this.
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022. Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.