Bug 1981079

Summary: Expected condition after migration is logged as an error and try to set threshold to migrated VM's disk.
Product: [oVirt] ovirt-engine Reporter: sshmulev
Component: BLL.StorageAssignee: Vojtech Juranek <vjuranek>
Status: CLOSED CURRENTRELEASE QA Contact: Avihai <aefrat>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.4.7.6CC: bugs, bzlotnik, nsoffer, vjuranek
Target Milestone: ovirt-4.5.0Keywords: ZStream
Target Release: 4.5.0Flags: pm-rhel: ovirt-4.5?
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-04-20 06:33:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description sshmulev 2021-07-11 08:47:02 UTC
Description of problem:
VM is not running on the source host after migration - as expected, although there is a try to set the threshold for the disk, plus this should not be logged as an error.

Version-Release number of selected component (if applicable):
Versions:
vdsm-4.40.70.6-1.el8ev.x86_64
ovirt-engine-4.4.7.6-0.11.el8ev
QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+11536+725e25d9.2)

How reproducible:
Need to make multiple extensions during the migration to hit the issue. 

Steps to Reproduce:
1. Create a VM with a thin disk - better to make the disk big enough (for the many tries until we hit the extension during migration).
2. Better to modify in all the hosts the VM is migrated to/from, and SPM host:
    vi /etc/vdsm/vdsm.conf.d/volume-utilization.conf I put these values and then restart vdsm.
    [irs]
    volume_utilization_percent = 50
    volume_utilization_chunk_mb = 128
    
    
3. Make the VM migration slow, so we can do multiple extends during 
   the migration.
   This can be done using stress tool, for example:
   dnf install stress
   stress --VM 1 --VM-bytes 128M
    
   This step is optional if the migration finishes too fast (I could reproduce it without it).

4.  Write data in the VM to trigger extension.
	for i in $(seq -w 11); do
	    echo "writing..."
	    dd if=/dev/zero bs=1M count=64 of=data.$i oflag=direct conv=fsync
	    echo "Waiting..."
	    sleep 6
	done
5. Start the script in the VM and right after start the migration	
	
	

Actual results:

2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/8735fa39-a148-43df-a78e-4f2e5ccdf688/0c3111fb-ef7e-49cf-9638-b3ca25afb983): Requested operation is not valid: domain is not running (drivemonitor:128)
2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') cannot cont while Down (vm:1709)


2021-07-08 17:19:52,589+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/10883a68-06ec-460b-a115-d23260482fa9/6f350411-34ce-4b13-8214-17aa0ea829b9): Domain not found: no domain with matching uuid '2eb53807-6c20-4976-9a8b-31531b5fdef1' (New_VM1) (drivemonitor:128)
2021-07-08 17:19:52,590+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') cannot cont while Down (vm:1709)


Expected results:

This is an expected condition, so shouldn't be logged as an ERROR.

Comment 1 Nir Soffer 2021-07-15 23:42:05 UTC
The bug show that when we try to set block threshold, we are still using
virdomain.Notifying. But when migration is completed, the vm must switch
to virdomain.Disconnected or virdomain.Defined, which avoid this issue
by raising a specific virdomain.NotConnectedError, that can be safely
silenced.

The current error displayed here come from libvirt, and it does not have
a specific error code that can be used to silence the error.

Comment 2 Eyal Shenitzky 2021-08-31 20:58:56 UTC
Vojtech, can you please verify this bug with a proper comment?

Comment 3 Vojtech Juranek 2021-09-01 09:28:22 UTC
(In reply to Eyal Shenitzky from comment #2)
> Vojtech, can you please verify this bug with a proper comment?

now, there's no error message, but this debug message:

2021-08-04 10:18:28,730-0400 DEBUG  (mailbox-hsm/0) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Domain not connected, skipping set block threshold fordrive 'sdd': Requested operation is not valid: domain is not running (drivemonitor:128)


As for switching domains (see Nir's comment #1), this is WIP and I created BZ #2000046 to track this.

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.