Hide Forgot
+++ This bug is a downstream clone. The original bug is: +++ +++ bug 1695026 +++ ====================================================================== Description of problem: The VM was having below structure before the LSM and the snapshot id was f9feefbe and 8619b674 === image_guid | image_group_id | parentid | vm_snapshot_id | imagestatus --------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------- b31e31e6-3f23-4b48-8f08-8f8c87524b14 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | f9feefbe-16b7-4343-bcd7-7d43213ee765 | 1 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 1 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 1 8f0b95c1-d63c-4bd5-8f6d-28c1995796d7 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | f9feefbe-16b7-4343-bcd7-7d43213ee765 | 1 === The source storage domain was having 0 GB disk space. Then a live storage migration of the disk 73ac2a39 was executed which created snapshot id 3067f179. (structure during the LSM operation) === image_guid | image_group_id | parentid | vm_snapshot_id | imagestatus --------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------- b31e31e6-3f23-4b48-8f08-8f8c87524b14 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 3067f179-8620-483c-bacd-6aea46b499be | 1 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 1 ed7c6d73-0d17-470b-8152-42fa45d2caea | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 8f0b95c1-d63c-4bd5-8f6d-28c1995796d7 | 3067f179-8620-483c-bacd-6aea46b499be | 2 8f0b95c1-d63c-4bd5-8f6d-28c1995796d7 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | f9feefbe-16b7-4343-bcd7-7d43213ee765 | 2 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 2 === The snapshot operation during the LSM failed because of low storage space in the source storage domain. The engine successfully deleted the snapshot 3067f179 from the snapshot table as a part of rollback. === engine=# select * from snapshots where snapshot_id = '3067f179-8620-483c-bacd-6aea46b499be'; snapshot_id | vm_id | snapshot_type | status | description | creation_date | app_list | vm_configuration | _create_date | _update_date | memory_metadata_disk_id | memory_du mp_disk_id | vm_configuration_broken -------------+-------+---------------+--------+-------------+---------------+----------+------------------+--------------+--------------+-------------------------+---------- -----------+------------------------- (0 rows) === However, the other image 83ca424e is still pointing to 3067f179 which doesn't exist in the snapshot table. (structure after failed snapshot operation) === image_guid | image_group_id | parentid | vm_snapshot_id | imagestatus --------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------- b31e31e6-3f23-4b48-8f08-8f8c87524b14 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 3067f179-8620-483c-bacd-6aea46b499be | 1 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 1 8f0b95c1-d63c-4bd5-8f6d-28c1995796d7 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | f9feefbe-16b7-4343-bcd7-7d43213ee765 | 1 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 1 === Now if you try to delete the old snapshot of disk 83ca424e, it will fail with NPE and image will be marked as illegal. === image_guid | image_group_id | parentid | vm_snapshot_id | imagestatus --------------------------------------+--------------------------------------+--------------------------------------+--------------------------------------+------------- b31e31e6-3f23-4b48-8f08-8f8c87524b14 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 3067f179-8620-483c-bacd-6aea46b499be | 1 0d4b6990-9d8c-417d-afbf-5cafb8207fc3 | 83ca424e-408e-43a6-8371-5f3cc90677a5 | 00000000-0000-0000-0000-000000000000 | 8619b674-c6d8-44d0-98d9-fee3dbd9d4ea | 4 0d0a93c9-8a7a-4401-9b7c-cc251b23e7f4 | 73ac2a39-d52c-4aa3-bd90-2e42a9af81f6 | 00000000-0000-0000-0000-000000000000 | f9feefbe-16b7-4343-bcd7-7d43213ee765 | 1 2019-04-01 00:00:55,761-04 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-50) [356357f5-391b-458a-9f03-fcbd96759637] null: java.lang.NullPointerException at org.ovirt.engine.core.common.action.MergeParameters.<init>(MergeParameters.java:30) [common.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.buildMergeParameters(RemoveSnapshotSingleDiskLiveCommand.java:191) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.performNextOperation(RemoveSnapshotSingleDiskLiveCommand.java:114) [bll.jar:] at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:32) [bll.jar:] at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:77) [bll.jar:] === Manual database changes are required to correct the structure. Version-Release number of selected component (if applicable): I tested it in rhvm-4.3.1.1-0.1.el7.noarch The customer is in 4.2 and hence issue is there in old releases as well. How reproducible: 100% Steps to Reproduce: 1. Create a VM with 2 disks. 2. Create a snapshot for this VM including both the disk. 3. Fill the storage domain so that there is 0GB free space. 4. Try to migrate one of the disks. It will fail during the snapshot operation since there is no free space in the source storage domain. 5. Check the structure of the image in the database. The other disk will be pointing to an inexistent snapshot. 6. Try to delete the snapshot created in [2]. One will fail with NPE. Actual results: Low disk space in source storage domain during "Live Storage Migration" can result in a nonexistent snapshot. Expected results: The engine should not initiate the LSM if there is low disk space on source storage domain. We are already preventing the manual snapshot operation if there is low disk space on the storage domain. However, the snapshot created during the LSM doesn't check about free space in the source storage domain before initiating the operation. Also, we should not create an inconsistent structure if the snapshot operation failed during LSM. Additional info: (Originally by Nijin Ashok)
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops
Tested using: ovirt-engine-4.3.4.1-0.1.el7.noarch Tested using the above scenario: 1. Create a VM with 2 disks. 2. Create a snapshot for this VM including both the disk. got 2 snapshots: Active_VM and snap_1: From the database: snaphosts: engine=# select snapshot_id, vm_id, status, description from snapshots where vm_id='22778cd4-25b3-4d00-86f4-0f3b009077c1'; snapshot_id | vm_id | status | description --------------------------------------+--------------------------------------+--------+------------- db1b3c93-2466-443b-b8cc-ddbff9f21345 | 22778cd4-25b3-4d00-86f4-0f3b009077c1 | OK | snap_1 cb23b281-351d-4bc4-b391-f7da887460a3 | 22778cd4-25b3-4d00-86f4-0f3b009077c1 | OK | Active VM (2 rows) images: engine=# select image_guid, image_group_id, parentid, vm_snapshot_id from images; image_guid | image_group_id | parentid | vm_snapshot_id --------------------------------------+--------------------------------------+--------------------------------------+-------------------------------------- 5c76875d-c18e-4f2c-9d98-50a212e85523 | b7a2feba-2b62-48e4-8749-e7a9ad7f9e88 | 00000000-0000-0000-0000-000000000000 | db1b3c93-2466-443b-b8cc-ddbff9f21345 da45376d-0820-4b19-b510-29657e03f05c | c8ceb520-a7ee-4a9d-98a5-62804c36e098 | 1464b05a-509b-4f9a-892c-2a8a67b982f2 | cb23b281-351d-4bc4-b391-f7da887460a3 ef55999e-5d48-4e04-86c6-45b4fbe52781 | b7a2feba-2b62-48e4-8749-e7a9ad7f9e88 | 5c76875d-c18e-4f2c-9d98-50a212e85523 | cb23b281-351d-4bc4-b391-f7da887460a3 1464b05a-509b-4f9a-892c-2a8a67b982f2 | c8ceb520-a7ee-4a9d-98a5-62804c36e098 | 00000000-0000-0000-0000-000000000000 | db1b3c93-2466-443b-b8cc-ddbff9f21345 ******* start the VM ********* 3. Fill the storage domain so that there is 0GB free space: Created a new preallocated disk using all available space in the storage domain (iscsi_0): Engine log: 2019-06-02 08:55:21,476+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-Man agedThreadFactory-engineScheduled-Thread-43) [] EVENT_ID: IRS_DISK_SPACE_LOW_ERROR(201), Critical, Low disk spa ce. iscsi_0 domain has 0 GB of free space. 4. Try to migrate one of the disks. It will fail during the snapshot operation since there is no free space in the source storage domain. The operation failed with this error in the engine log as expected: 2019-06-02 08:58:30,321+03 WARN [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (default task-40) [d330c72f-1861-4e0d-8b60-4e1e74954209] Validation of action 'LiveMigrateDisk' failed for user admin@internal-authz. Reasons: VAR__ACTION__MOVE,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iscsi_0 5. Check the structure of the image in the database. The other disk will be pointing to an inexistent snapshot. At this point - still got 2 snapshots from step 2 and same images: engine=# select image_guid, image_group_id, parentid, vm_snapshot_id from images where image_group_id in ('b7a2feba-2b62-48e4-8749-e7a9ad7f9e88', 'c8ceb520-a7ee-4a9d-98a5-62804c36e098'); image_guid | image_group_id | parentid | vm_snapshot_id --------------------------------------+--------------------------------------+--------------------------------------+-------------------------------------- 5c76875d-c18e-4f2c-9d98-50a212e85523 | b7a2feba-2b62-48e4-8749-e7a9ad7f9e88 | 00000000-0000-0000-0000-000000000000 | db1b3c93-2466-443b-b8cc-ddbff9f21345 da45376d-0820-4b19-b510-29657e03f05c | c8ceb520-a7ee-4a9d-98a5-62804c36e098 | 1464b05a-509b-4f9a-892c-2a8a67b982f2 | cb23b281-351d-4bc4-b391-f7da887460a3 ef55999e-5d48-4e04-86c6-45b4fbe52781 | b7a2feba-2b62-48e4-8749-e7a9ad7f9e88 | 5c76875d-c18e-4f2c-9d98-50a212e85523 | cb23b281-351d-4bc4-b391-f7da887460a3 1464b05a-509b-4f9a-892c-2a8a67b982f2 | c8ceb520-a7ee-4a9d-98a5-62804c36e098 | 00000000-0000-0000-0000-000000000000 | db1b3c93-2466-443b-b8cc-ddbff9f21345 (4 rows) 6. Try to delete the snapshot created in [2]. One will fail with NPE. Failed to remove the snapshot from step 2 (snap_1) due to low space on storage - no NPE has seen in the logs From the UI: "Cannot remove Snapshot. Low disk space on Storage Domain iscsi_0." From the engine log: 2019-06-02 09:43:15,436+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-48) [c28cbf70-fba7-4ab5-824b-7b31835e3926] Lock Acquired to object 'EngineLock:{exclusiveLocks='[22778cd4-25b3-4d00-86f4-0f3b009077c1=VM]', sharedLocks=''}' 2019-06-02 09:43:15,589+03 WARN [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-48) [c28cbf70-fba7-4ab5-824b-7b31835e3926] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISK_SPACE_LOW_ON_STORAGE_DOMAIN,$storageName iscsi_0 2019-06-02 09:43:15,590+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-48) [c28cbf70-fba7-4ab5-824b-7b31835e3926] Lock freed to object 'EngineLock:{exclusiveLocks='[22778cd4-25b3-4d00-86f4-0f3b009077c1=VM]', sharedLocks=''}' Power off VM and free some space from the storage Try to delete the snapshot again: 2019-06-02 09:50:42,377+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-53) [21194e6f-775b-439b-9f50-25e96212c3bb] Lock Acquired to object 'EngineLock:{exclusiveLocks='[22778cd4-25b3-4d00-86f4-0f3b009077c1=VM]', sharedLocks=''}' 2019-06-02 09:50:42,487+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-53) [21194e6f-775b-439b-9f50-25e96212c3bb] Running command: RemoveSnapshotCommand internal: false. Entities affected : ID: 22778cd4-25b3-4d00-86f4-0f3b009077c1 Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER 2019-06-02 09:50:42,499+03 INFO [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-53) [21194e6f-775b-439b-9f50-25e96212c3bb] Lock freed to object 'EngineLock:{exclusiveLocks='[22778cd4-25b3-4d00-86f4-0f3b009077c1=VM]', sharedLocks=''}' .... 2019-06-02 09:51:47,575+03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-62) [21194e6f-775b-439b-9f50-25e96212c3bb] EVENT_ID: USER_REMOVE_SNAPSHOT_FINISHED_SUCCESS(356), Snapshot 'snap_1' deletion for VM 'test_vm' has been completed. snapshot after delete: engine=# select snapshot_id, vm_id, status, description from snapshots where vm_id='22778cd4-25b3-4d00-86f4-0f3b009077c1'; snapshot_id | vm_id | status | description --------------------------------------+--------------------------------------+--------+------------- cb23b281-351d-4bc4-b391-f7da887460a3 | 22778cd4-25b3-4d00-86f4-0f3b009077c1 | OK | Active VM (1 row) images: engine=# select image_guid, image_group_id, parentid, vm_snapshot_id from images where image_group_id in ('b7a2feba-2b62-48e4-8749-e7a9ad7f9e88', 'c8ceb520-a7ee-4a9d-98a5-62804c36e098'); image_guid | image_group_id | parentid | vm_snapshot_id --------------------------------------+--------------------------------------+--------------------------------------+-------------------------------------- 5c76875d-c18e-4f2c-9d98-50a212e85523 | b7a2feba-2b62-48e4-8749-e7a9ad7f9e88 | 00000000-0000-0000-0000-000000000000 | cb23b281-351d-4bc4-b391-f7da887460a3 1464b05a-509b-4f9a-892c-2a8a67b982f2 | c8ceb520-a7ee-4a9d-98a5-62804c36e098 | 00000000-0000-0000-0000-000000000000 | cb23b281-351d-4bc4-b391-f7da887460a3 (2 rows) Works as expected. Moving to VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1566
sync2jira