Bug 1840732

Summary: VM can be started during ofline disk migration when the disk is locked
Product: [oVirt] ovirt-engine Reporter: ladislav.humenik
Component: GeneralAssignee: Bella Khizgiyaev <bkhizgiy>
Status: CLOSED CURRENTRELEASE QA Contact: Ilan Zuckerman <izuckerm>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.9.1CC: bugs, bzlotnik, eshenitz, marcel.hanke, michal.skrivanek, mtessun, sfishbai
Target Milestone: ovirt-4.4.2Flags: pm-rhel: ovirt-4.4+
mtessun: planning_ack+
Target Release: 4.4.2.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.4.2.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-18 07:13:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine-log none

Description ladislav.humenik 2020-05-27 14:04:32 UTC
Description of problem:
VM can be started while there is ongoing offline disk move job. This can lead to corrupted data inside VM and eventuely ovirt will pause the VM to prevent IO.

Version-Release number of selected component (if applicable):
4.3.9-1.el7

How reproducible:


Steps to Reproduce:
1. power off VM
2. trigger disk move
3. start the VM

Actual results:
VM will be started

Expected results:
Error msg, VM can not be started while underlaying storage operation or similar.

Additional info:

Comment 1 ladislav.humenik 2020-05-27 14:08:21 UTC
Description of problem:
VM can be started while there is ongoing offline disk move job. This can lead to corrupted data inside VM and eventuely ovirt will pause the VM to prevent IO.

Version-Release number of selected component (if applicable):
4.3.9-1.el7

How reproducible:


Steps to Reproduce:
1. power off VM
2. trigger disk move
3. start the VM

Actual results:
VM will be started

Expected results:
Error msg, VM can not be started while underlaying storage operation or similar.

Additional info:

Comment 2 RHEL Program Management 2020-05-28 04:21:46 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Michal Skrivanek 2020-05-28 04:24:36 UTC
Is it not ImageLocked?

Comment 4 ladislav.humenik 2020-05-28 07:01:30 UTC
Created attachment 1692935 [details]
engine-log

Comment 5 ladislav.humenik 2020-05-28 07:03:15 UTC
status ImageLocked is expected, but we get Down = 0

#status while disk move
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'blafaselbest';
   vm_name    |               vm_guid                | status 
--------------+--------------------------------------+--------
 blafaselbest | b8bb01a7-d93d-4e48-87d4-97c134004544 |      0

Comment 6 Bella Khizgiyaev 2020-07-09 10:44:36 UTC
Seams that the issue was fixed in the current version (4.4)
I wasn't able to find the patch with the fix but when trying to reproduce the below steps I did recived a message:

1. power off VM
2. trigger disk move
3. start the VM

when trying to power up the VM I received an error message "Failed to power up VM1: disk is currently locked, please try later"
and the VM isn't powered on until the disk move wasn't complete.

Can you please verify on your 4.4 env if this bug still happening?

Comment 7 ladislav.humenik 2020-07-16 10:47:27 UTC
Hi, tested in 4.4.1.8-1.el8

VM powered off and triggered move

engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
              image_guid              |     creation_date      |     size     |               it_guid                |               parentid               | imagestatus |        lastmodifie
d        |            vm_snapshot_id            | volume_type | volume_format |            image_group_id            |         _create_date          |         _update_date          | active 
| volume_classification | qcow_compat 
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
 4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 |           2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc |           2 |             5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t      
|                     0 |           0
(1 row)

and the status of the VM during that time:
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      0
(1 row)

engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      9
(1 row)

engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      2
(1 row)
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      1

and the image is still in copying:

engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
              image_guid              |     creation_date      |     size     |               it_guid                |               parentid               | imagestatus |        lastmodifie
d        |            vm_snapshot_id            | volume_type | volume_format |            image_group_id            |         _create_date          |         _update_date          | active 
| volume_classification | qcow_compat 
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
 4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 |           2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc |           2 |             5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t      
|                     0 |           0
(1 row)

Comment 8 Benny Zlotnik 2020-07-27 14:00:35 UTC
Are you running this scenario on an HA VM?

Comment 9 ladislav.humenik 2020-07-27 14:04:05 UTC
Yes, all our VMs have check-box HA

Comment 10 Bella Khizgiyaev 2020-07-29 15:46:33 UTC
The issue was fixe, should work now on both HA and regular VMs.

Comment 11 Ilan Zuckerman 2020-08-11 08:48:51 UTC
Verified on rhv-release-4.4.2-2-001.noarch :

1. power off VM
2. trigger disk move
3. start the VM

Expected: The vm shouldn't be allowed to start in the middle of disk migration
Actual: The vm is not starting. The user is getting the following message from the UI: "Cannot run VM: The following disks are locked: latest-rhel-guest-image-8.2-infra. Please try again in a few minutes."

After the disk migrates, the vm is started as expected (when attempting to start it once more).

Comment 12 Sandro Bonazzola 2020-09-18 07:13:10 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.