Bug 1840732
| Summary: | VM can be started during ofline disk migration when the disk is locked | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | ladislav.humenik | ||||
| Component: | General | Assignee: | Bella Khizgiyaev <bkhizgiy> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Ilan Zuckerman <izuckerm> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 4.3.9.1 | CC: | bugs, bzlotnik, eshenitz, marcel.hanke, michal.skrivanek, mtessun, sfishbai | ||||
| Target Milestone: | ovirt-4.4.2 | Flags: | pm-rhel:
ovirt-4.4+
mtessun: planning_ack+ |
||||
| Target Release: | 4.4.2.1 | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | ovirt-engine-4.4.2.1 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-09-18 07:13:10 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
ladislav.humenik
2020-05-27 14:04:32 UTC
Description of problem: VM can be started while there is ongoing offline disk move job. This can lead to corrupted data inside VM and eventuely ovirt will pause the VM to prevent IO. Version-Release number of selected component (if applicable): 4.3.9-1.el7 How reproducible: Steps to Reproduce: 1. power off VM 2. trigger disk move 3. start the VM Actual results: VM will be started Expected results: Error msg, VM can not be started while underlaying storage operation or similar. Additional info: The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Is it not ImageLocked? Created attachment 1692935 [details]
engine-log
status ImageLocked is expected, but we get Down = 0 #status while disk move engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'blafaselbest'; vm_name | vm_guid | status --------------+--------------------------------------+-------- blafaselbest | b8bb01a7-d93d-4e48-87d4-97c134004544 | 0 Seams that the issue was fixed in the current version (4.4) I wasn't able to find the patch with the fix but when trying to reproduce the below steps I did recived a message: 1. power off VM 2. trigger disk move 3. start the VM when trying to power up the VM I received an error message "Failed to power up VM1: disk is currently locked, please try later" and the VM isn't powered on until the disk move wasn't complete. Can you please verify on your 4.4 env if this bug still happening? Hi, tested in 4.4.1.8-1.el8
VM powered off and triggered move
engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodifie
d | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_date | _update_date | active
| volume_classification | qcow_compat
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc | 2 | 5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t
| 0 | 0
(1 row)
and the status of the VM during that time:
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
vm_name | vm_guid | status
---------+--------------------------------------+--------
move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d | 0
(1 row)
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
vm_name | vm_guid | status
---------+--------------------------------------+--------
move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d | 9
(1 row)
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
vm_name | vm_guid | status
---------+--------------------------------------+--------
move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d | 2
(1 row)
engine=# select vm_name, vm.vm_guid, status from vm_static vm inner join vm_dynamic vd on vd.vm_guid=vm.vm_guid where vm_name like 'move-me';
vm_name | vm_guid | status
---------+--------------------------------------+--------
move-me | 8d06d81a-2734-42ab-a5ab-5e468bf81b9d | 1
and the image is still in copying:
engine=# select * from images where image_guid = '4d6a139b-66b3-487a-ac8e-22d9f31722d7';
image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodifie
d | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_date | _update_date | active
| volume_classification | qcow_compat
--------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+-------------------
---------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+-------------------------------+--------
+-----------------------+-------------
4d6a139b-66b3-487a-ac8e-22d9f31722d7 | 2020-07-16 10:27:30+00 | 214748364800 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 2 | 2020-07-16 10:27:3
0.042+00 | bff44c63-f44c-4d85-ae48-8d2f45ce47cc | 2 | 5 | 50213168-616c-4a15-a302-b3f26fec9ee2 | 2020-07-16 10:27:30.044064+00 | 2020-07-16 10:27:37.937232+00 | t
| 0 | 0
(1 row)
Are you running this scenario on an HA VM? Yes, all our VMs have check-box HA The issue was fixe, should work now on both HA and regular VMs. Verified on rhv-release-4.4.2-2-001.noarch : 1. power off VM 2. trigger disk move 3. start the VM Expected: The vm shouldn't be allowed to start in the middle of disk migration Actual: The vm is not starting. The user is getting the following message from the UI: "Cannot run VM: The following disks are locked: latest-rhel-guest-image-8.2-infra. Please try again in a few minutes." After the disk migrates, the vm is started as expected (when attempting to start it once more). This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |