Bug 1840732

Summary:

VM can be started during ofline disk migration when the disk is locked

Product:

[oVirt] ovirt-engine

Reporter:

ladislav.humenik

Component:

General

Assignee:

Bella Khizgiyaev <bkhizgiy>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Ilan Zuckerman <izuckerm>

Severity:

high

Docs Contact:

Priority:

high

Version:

4.3.9.1

CC:

bugs, bzlotnik, eshenitz, marcel.hanke, michal.skrivanek, mtessun, sfishbai

Target Milestone:

ovirt-4.4.2

Flags:

pm-rhel: ovirt-4.4+
mtessun: planning_ack+

Target Release:

4.4.2.1

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

ovirt-engine-4.4.2.1

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-09-18 07:13:10 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
engine-log	none

Description ladislav.humenik 2020-05-27 14:04:32 UTC

Description of problem:
VM can be started while there is ongoing offline disk move job. This can lead to corrupted data inside VM and eventuely ovirt will pause the VM to prevent IO.

Version-Release number of selected component (if applicable):
4.3.9-1.el7

How reproducible:


Steps to Reproduce:
1. power off VM
2. trigger disk move
3. start the VM

Actual results:
VM will be started

Expected results:
Error msg, VM can not be started while underlaying storage operation or similar.

Additional info:

Comment 1 ladislav.humenik 2020-05-27 14:08:21 UTC

Description of problem:
VM can be started while there is ongoing offline disk move job. This can lead to corrupted data inside VM and eventuely ovirt will pause the VM to prevent IO.

Version-Release number of selected component (if applicable):
4.3.9-1.el7

How reproducible:


Steps to Reproduce:
1. power off VM
2. trigger disk move
3. start the VM

Actual results:
VM will be started

Expected results:
Error msg, VM can not be started while underlaying storage operation or similar.

Additional info:

Comment 2 RHEL Program Management 2020-05-28 04:21:46 UTC

The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 3 Michal Skrivanek 2020-05-28 04:24:36 UTC

Is it not ImageLocked?

Comment 4 ladislav.humenik 2020-05-28 07:01:30 UTC

Created attachment 1692935 [details]
engine-log

Comment 5 ladislav.humenik 2020-05-28 07:03:15 UTC

status ImageLocked is expected, but we get Down = 0

#status while disk move
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'blafaselbest';
   vm_name    |               vm_guid                | status 
--------------+--------------------------------------+--------
 blafaselbest | b8bb01a7-d93d-4e48-87d4-97c134004544 |      0

Comment 6 Bella Khizgiyaev 2020-07-09 10:44:36 UTC

Seams that the issue was fixed in the current version (4.4)
I wasn't able to find the patch with the fix but when trying to reproduce the below steps I did recived a message:

1. power off VM
2. trigger disk move
3. start the VM

when trying to power up the VM I received an error message "Failed to power up VM1: disk is currently locked, please try later"
and the VM isn't powered on until the disk move wasn't complete.

Can you please verify on your 4.4 env if this bug still happening?

Comment 7 ladislav.humenik 2020-07-16 10:47:27 UTC

Hi, tested in 4.4.1.8-1.el8

VM powered off and triggered move

engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
              image_guid              |     creation_date      |     size     |               it_guid                |               parentid               | imagestatus |        lastmodifie
d        |            vm_snapshot_id            | volume_type | volume_format |            image_group_id            |         _create_date          |         _update_date          | active 
| volume_classification | qcow_compat 
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
 4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 |           2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc |           2 |             5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t      
|                     0 |           0
(1 row)

and the status of the VM during that time:
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      0
(1 row)

engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      9
(1 row)

engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      2
(1 row)
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
 vm_name |               vm_guid                | status 
---------+--------------------------------------+--------
 move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d |      1

and the image is still in copying:

engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
              image_guid              |     creation_date      |     size     |               it_guid                |               parentid               | imagestatus |        lastmodifie
d        |            vm_snapshot_id            | volume_type | volume_format |            image_group_id            |         _create_date          |         _update_date          | active 
| volume_classification | qcow_compat 
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
 4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 |           2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc |           2 |             5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t      
|                     0 |           0
(1 row)

Comment 8 Benny Zlotnik 2020-07-27 14:00:35 UTC

Are you running this scenario on an HA VM?

Comment 9 ladislav.humenik 2020-07-27 14:04:05 UTC

Yes, all our VMs have check-box HA

Comment 10 Bella Khizgiyaev 2020-07-29 15:46:33 UTC

The issue was fixe, should work now on both HA and regular VMs.

Comment 11 Ilan Zuckerman 2020-08-11 08:48:51 UTC

Verified on rhv-release-4.4.2-2-001.noarch :

1. power off VM
2. trigger disk move
3. start the VM

Expected: The vm shouldn't be allowed to start in the middle of disk migration
Actual: The vm is not starting. The user is getting the following message from the UI: "Cannot run VM: The following disks are locked: latest-rhel-guest-image-8.2-infra. Please try again in a few minutes."

After the disk migrates, the vm is started as expected (when attempting to start it once more).

Comment 12 Sandro Bonazzola 2020-09-18 07:13:10 UTC

This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.