Bug 1981079 - Expected condition after migration is logged as an error and try to set threshold to migrated VM's disk.
Summary: Expected condition after migration is logged as an error and try to set thres...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.4.7.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ovirt-4.5.0
: 4.5.0
Assignee: Vojtech Juranek
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-11 08:47 UTC by sshmulev
Modified: 2022-04-20 06:33 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-04-20 06:33:59 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.5?


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 115683 0 master MERGED virt: don't log error during disk extension when VM is not running 2021-08-05 07:21:17 UTC
oVirt gerrit 115684 0 master MERGED virt: skip VM resume in cases when it's not supported 2021-08-05 07:21:20 UTC

Description sshmulev 2021-07-11 08:47:02 UTC
Description of problem:
VM is not running on the source host after migration - as expected, although there is a try to set the threshold for the disk, plus this should not be logged as an error.

Version-Release number of selected component (if applicable):
Versions:
vdsm-4.40.70.6-1.el8ev.x86_64
ovirt-engine-4.4.7.6-0.11.el8ev
QEMU emulator version 5.2.0 (qemu-kvm-5.2.0-16.module+el8.4.0+11536+725e25d9.2)

How reproducible:
Need to make multiple extensions during the migration to hit the issue. 

Steps to Reproduce:
1. Create a VM with a thin disk - better to make the disk big enough (for the many tries until we hit the extension during migration).
2. Better to modify in all the hosts the VM is migrated to/from, and SPM host:
    vi /etc/vdsm/vdsm.conf.d/volume-utilization.conf I put these values and then restart vdsm.
    [irs]
    volume_utilization_percent = 50
    volume_utilization_chunk_mb = 128
    
    
3. Make the VM migration slow, so we can do multiple extends during 
   the migration.
   This can be done using stress tool, for example:
   dnf install stress
   stress --VM 1 --VM-bytes 128M
    
   This step is optional if the migration finishes too fast (I could reproduce it without it).

4.  Write data in the VM to trigger extension.
	for i in $(seq -w 11); do
	    echo "writing..."
	    dd if=/dev/zero bs=1M count=64 of=data.$i oflag=direct conv=fsync
	    echo "Waiting..."
	    sleep 6
	done
5. Start the script in the VM and right after start the migration	
	
	

Actual results:

2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/8735fa39-a148-43df-a78e-4f2e5ccdf688/0c3111fb-ef7e-49cf-9638-b3ca25afb983): Requested operation is not valid: domain is not running (drivemonitor:128)
2021-07-06 13:39:34,683+0300 ERROR (mailbox-hsm/3) [virt.vm] (vmId='e7ef44e5-42e1-440e-8426-d718ffbe5dba') cannot cont while Down (vm:1709)


2021-07-08 17:19:52,589+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') Failed to set block threshold for drive 'sda' (/rhev/data-center/mnt/blockSD/aa73b4d0-a22d-4f65-b254-594b34f0e6a8/images/10883a68-06ec-460b-a115-d23260482fa9/6f350411-34ce-4b13-8214-17aa0ea829b9): Domain not found: no domain with matching uuid '2eb53807-6c20-4976-9a8b-31531b5fdef1' (New_VM1) (drivemonitor:128)
2021-07-08 17:19:52,590+0300 ERROR (mailbox-hsm/1) [virt.vm] (vmId='2eb53807-6c20-4976-9a8b-31531b5fdef1') cannot cont while Down (vm:1709)


Expected results:

This is an expected condition, so shouldn't be logged as an ERROR.

Comment 1 Nir Soffer 2021-07-15 23:42:05 UTC
The bug show that when we try to set block threshold, we are still using
virdomain.Notifying. But when migration is completed, the vm must switch
to virdomain.Disconnected or virdomain.Defined, which avoid this issue
by raising a specific virdomain.NotConnectedError, that can be safely
silenced.

The current error displayed here come from libvirt, and it does not have
a specific error code that can be used to silence the error.

Comment 2 Eyal Shenitzky 2021-08-31 20:58:56 UTC
Vojtech, can you please verify this bug with a proper comment?

Comment 3 Vojtech Juranek 2021-09-01 09:28:22 UTC
(In reply to Eyal Shenitzky from comment #2)
> Vojtech, can you please verify this bug with a proper comment?

now, there's no error message, but this debug message:

2021-08-04 10:18:28,730-0400 DEBUG  (mailbox-hsm/0) [virt.vm] (vmId='2dad9038-3e3a-4b5e-8d20-b0da37d9ef79') Domain not connected, skipping set block threshold fordrive 'sdd': Requested operation is not valid: domain is not running (drivemonitor:128)


As for switching domains (see Nir's comment #1), this is WIP and I created BZ #2000046 to track this.

Comment 6 Sandro Bonazzola 2022-04-20 06:33:59 UTC
This bugzilla is included in oVirt 4.5.0 release, published on April 20th 2022.

Since the problem described in this bug report should be resolved in oVirt 4.5.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.