Bug 2137207 - The RemoveDisk job finishes before the disk was removed from the DB
Summary: The RemoveDisk job finishes before the disk was removed from the DB
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.5.3
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ovirt-4.5.3
: ---
Assignee: Mark Kemel
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-24 07:32 UTC by sshmulev
Modified: 2022-11-16 12:17 UTC (History)
3 users (show)

Fixed In Version: ovirt-engine-4.5.3.2
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-11-16 12:17:27 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 715 0 None Merged core: add callback to RemoveDiskCommand 2022-10-26 07:20:48 UTC
Github oVirt ovirt-engine pull 716 0 None open Backport: add callback to RemoveDiskCommand 2022-10-26 07:43:06 UTC
Red Hat Issue Tracker RHV-47989 0 None None None 2022-10-24 07:58:29 UTC
Red Hat Product Errata RHSA-2022:8502 0 None None None 2022-11-16 12:17:37 UTC

Description sshmulev 2022-10-24 07:32:07 UTC
Description of problem:
The job for RemoveDisk finishes before the disk was actually removed from the DB - which leads to faulty errors if relying on this information.
Some tests in our automation remove a VM right after this operation which leads to failure because the disk is locked.

Version-Release number of selected component (if applicable):
ovirt-engine-4.5.3.1-2.el8ev
vdsm-4.50.3.4-1.el8ev


How reproducible:
100%

Steps to Reproduce:
1. Create a VM and attach to it a disk
2. Remove the VM
3. Based on the jobs in the DB, when the RemoveDisk job is done, remove the VM. 
(This might be needed to run with automation flow to reproduce it)

Actual results:
As a result of bug 1836318 fix https://github.com/oVirt/ovirt-engine/pull/656 
The operation of removing the VM fails because the disk is still locked - The disk is still during the removal operation, although it was reported in the DB that the operation was done.

In our tier2 we have 161 failures and 145 in tier3 due to this issue.
As a result we have many leftovers and non of the tests are valid for verification - which blocks us from deliver the version.

We tried to put sleep after the operation of remove disk but the sleep in not consistent due to the fact that not all the tests have the same disk sizes, same flow, this could lead to other failures, automation bugs, refactoring, and stabilization. Since this is a global function that is being used by different teams in RHV QE, this also could be a conflict.

In addition, we can't rely on that a customer is not likely to reproduce the same issue - because we can't know which flow he is using after removing a disk.

Expected results:
When the RemoveDisk job is removed from the DB the disk should be unlocked as well.

Comment 8 Arik 2022-10-28 20:31:53 UTC
from a functional point of view, the severity of this bug is rather low as the disk was removed few milliseconds after the job is completed and so it's unlikely to affect user flows - setting the severity accordingly. however, this bug was prioritized since many test cases in our automation failed because of that and it was complicated to adjust those test cases.

Comment 9 sshmulev 2022-10-30 11:01:50 UTC
Verified.

Tier2 and tier3 have stabilized after this fix.
TCs that have failed before due to this bug, now pass successfully as before.

Versions:
rhv-4.5.3-4
ovirt-engine-4.5.3.2-1.el8ev.noarch
vdsm-4.50.3.4-1.el8ev.x86_64

Comment 13 errata-xmlrpc 2022-11-16 12:17:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.3] bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:8502


Note You need to log in before you can comment on or make changes to this bug.