Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2120040

Summary:	Missing execution job message for ReduceImage
Product:	[oVirt] ovirt-engine	Reporter:	sshmulev
Component:	BLL.Storage	Assignee:	Arik <ahadas>
Status:	CLOSED NEXTRELEASE	QA Contact:	Ilia Markelov <imarkelo>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	4.5.2.1	CC:	ahadas, bugs, dfodor, sfishbai
Target Milestone:	ovirt-4.5.3	Keywords:	Automation, ZStream
Target Release:	---	Flags:	pm-rhel: ovirt-4.5?
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	ovirt-engine-4.5.3	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-09-19 14:31:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	Storage	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description sshmulev 2022-08-21 07:01:43 UTC

Description of problem:
When reducing a disk via rest API request, the disk is not being locked during the operation, It is always in 'ok' status.


Version-Release number of selected component (if applicable):
ovirt-engine-4.5.2.2-0.1.el8ev
vdsm-4.50.2.2-1.el8ev

How reproducible:
100%

Steps to Reproduce:
1. on API request send:
POST /ovirt-engine/api/disks/<disk_ID>/reduce
body:
<action>
    <async>true</async>
 </action>
2. Check in logs for disk lock during the reduce disk operation


Actual results:
The disk is in 'ok' status during the reduce disk operation

Expected results:
The disk should be locked during the reduce disk operation

Additional info:
The issue was found in automation test "TestCase25756".
The issue is also seen in ovirt-engine-4.4.10.7-0.4.el8ev - not a regression

Comment 1 sshmulev 2022-08-21 12:10:56 UTC

The issue that the disk is not locked leads to automation failure:
For example according to the automation script:
1. reduce disk
2. wait for disk status to be 'ok'
3. Turn on the VM that has the reduced disk - Sometimes it takes a bit longer for the reduce operation to be done, so the action of turning on the VM fails (if the disk was actually locked then the action for running the VM would have to wait for it to change the status)

Comment 2 Arik 2022-08-21 12:13:32 UTC

right, when memory-locks are used the client should monitor the execution of operations differently - how about setting a correlation id and monitor when the job associated with that correlation id is completed instead?

Comment 3 sshmulev 2022-08-22 09:28:37 UTC

The automation test can be changed by monitoring whether the job of reducing is done, but I thought that in general disk should be locked during storage operations, isn't it?

Comment 4 Arik 2022-08-22 09:38:12 UTC

(In reply to sshmulev from comment #3)
> The automation test can be changed by monitoring whether the job of reducing
> is done, but I thought that in general disk should be locked during storage
> operations, isn't it?

yes, it should and it is locked but using an in-memory lock (that prevents other action from operating on the disk) and therefore it's not reflected by its status

Comment 5 sshmulev 2022-08-22 10:34:02 UTC

ok, got it, so maybe we can close the bug then.
will refactor automation to rely on reduce job logs to make sure it's done.

Comment 6 Arik 2022-08-22 14:08:18 UTC

(In reply to sshmulev from comment #5)
> will refactor automation to rely on reduce job logs to make sure it's done.

ok, we talked offline and agreed on setting a correlation id rather than relying on logs, but other than that sounds good - let's see if that works as an alternative to checking the disk status

Comment 7 Arik 2022-08-28 11:58:42 UTC

just need to add a job for ReduceImage

Comment 8 Arik 2022-08-29 07:57:25 UTC

The message would be:
Reducing the actual size of Disk <disk alias>
See:
https://github.com/oVirt/ovirt-engine/commit/8d580cbcaaf9d45792622c26cc3e6864154b357e

Comment 9 Casper (RHV QE bot) 2022-09-19 14:31:51 UTC

This bug has low overall severity and passed an automated regression suite, and is not going to be further verified by QE. If you believe special care is required, feel free to re-open to ON_QA status.