We noticed backups started failing after some tests on 1 VM. Fails with: 2021-01-22 12:29:03,496+01 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.StartVmBackupVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-51) [ce60eb94-2579-4d50-a6f3-21f0ef0a431f] Command 'StartVmBackupVDSCommand(HostName = ovirt-ovn003, VmBackupVDSParameters:{hostId='e8facc41-7088-4c5a-9a69-680d1d5e6dee', backupId='ca6dbdc9-d067-40c0-804d-0da802cc6b6d', requireConsistency='false'})' execution failed: VDSGenericException: VDSErrorException: Failed to StartVmBackupVDS, error = Checkpoint Error: {'parent_checkpoint_id': None, 'leaf_checkpoint_id': '0e9cedb2-dcd9-4736-bdb1-72654e74b3ba', 'vm_id': '011fc4cd-4831-49b3-8456-ebdcdf9a310b', 'reason': 'Parent checkpoint ID does not match the actual leaf checkpoint'}, code = 1610 After digging in the logs what caused this, it seems to be some race condition: 1. Deleting the checkpoint: 2021-01-21 18:48:43,016+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (default task-41) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[011fc4cd-4831-49b3-8456-ebdcdf9a310b=VM]'}' 2021-01-21 18:48:43,021+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (default task-41) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Running command: DeleteVmCheckpointCommand internal: false. Entities affected : ID: 011fc4cd-4831-49b3-8456-ebdcdf9a310b Type: VMAction group BACKUP_DISK with role type ADMIN, ID: a12c4bf3-e6a4-4d0e-9994-beb5ebc0cfb8 Type: DiskAction group BACKUP_DISK with role type ADMIN 2021-01-21 18:48:43,032+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-41) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] EVENT_ID: VM_CHECKPOINT_DELETED(10,797), Checkpoint 0e9cedb2-dcd9-4736-bdb1-72654e74b3ba for VM vm-test1 deleted (User: admin@internal-authz). 2021-01-21 18:48:43,978+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-73) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Redefine previous VM checkpoints for VM '011fc4cd-4831-49b3-8456-ebdcdf9a310b' 2021-01-21 18:48:45,991+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Deleting VmCheckpoint '0e9cedb2-dcd9-4736-bdb1-72654e74b3ba' 2021-01-21 18:48:45,991+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DeleteVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] START, DeleteVmCheckpointsVDSCommand(HostName = ovirt-ovn003, VmCheckpointsVDSParameters:{hostId='e8facc41-7088-4c5a-9a69-680d1d5e6dee', vmId='011fc4cd-4831-49b3-8456-ebdcdf9a310b'}), log id: 66cfbf75 2021-01-21 18:48:45,997+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.DeleteVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] FINISH, DeleteVmCheckpointsVDSCommand, return: 2021-01-21 18:48:59,077+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Ending command 'org.ovirt.engine.core.bll.DeleteVmCheckpointCommand' successfully. 2021-01-21 18:48:59,081+01 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Trying to release a shared lock for key: '011fc4cd-4831-49b3-8456-ebdcdf9a310bVM' , but lock does not exist 2021-01-21 18:48:59,081+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[011fc4cd-4831-49b3-8456-ebdcdf9a310b=VM]'}' 2021-01-21 18:48:59,088+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-29) [2478aaec-eccb-4360-ad5a-b0574bb8dc15] EVENT_ID: VM_CHECKPOINT_DELETED(10,797), Checkpoint 0e9cedb2-dcd9-4736-bdb1-72654e74b3ba for VM vm-test1 deleted (User: admin@internal-authz). 2. Deleting the same checkpoint, at the same moment: 2021-01-21 18:48:48,916+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (default task-41) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[011fc4cd-4831-49b3-8456-ebdcdf9a310b=VM]'}' 2021-01-21 18:48:48,922+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (default task-41) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Running command: DeleteVmCheckpointCommand internal: false. Entities affected : ID: 011fc4cd-4831-49b3-8456-ebdcdf9a310b Type: VMAction group BACKUP_DISK with role type ADMIN, ID: a12c4bf3-e6a4-4d0e-9994-beb5ebc0cfb8 Type: DiskAction group BACKUP_DISK with role type ADMIN 2021-01-21 18:48:48,930+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-41) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] EVENT_ID: VM_CHECKPOINT_DELETED(10,797), Checkpoint 0e9cedb2-dcd9-4736-bdb1-72654e74b3ba for VM vm-test1 deleted (User: admin@internal-authz). 2021-01-21 18:48:49,002+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Redefine previous VM checkpoints for VM '011fc4cd-4831-49b3-8456-ebdcdf9a310b' 2021-01-21 18:48:51,053+01 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Command 'DeleteVmCheckpoint' id: 'f13aa032-9ed6-4d2d-987a-9ea4ff381752' with children [] failed when attempting to perform the next operation, marking as 'ACTIVE' 2021-01-21 18:48:51,053+01 ERROR [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] null: java.lang.NullPointerException 2021-01-21 18:48:51,053+01 ERROR [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Command 'DeleteVmCheckpoint' id 'f13aa032-9ed6-4d2d-987a-9ea4ff381752' failed executing step 'DELETE_CHECKPOINT' 2021-01-21 18:48:51,053+01 INFO [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-22) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Command 'DeleteVmCheckpoint' id: 'f13aa032-9ed6-4d2d-987a-9ea4ff381752' child commands '[]' executions were completed, status 'FAILED' 2021-01-21 18:48:52,057+01 ERROR [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-25) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Ending command 'org.ovirt.engine.core.bll.DeleteVmCheckpointCommand' with failure. 2021-01-21 18:48:52,060+01 WARN [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-25) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Trying to release a shared lock for key: '011fc4cd-4831-49b3-8456-ebdcdf9a310bVM' , but lock does not exist 2021-01-21 18:48:52,060+01 INFO [org.ovirt.engine.core.bll.DeleteVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-25) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[011fc4cd-4831-49b3-8456-ebdcdf9a310b=VM]'}' 2021-01-21 18:48:52,066+01 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-25) [18fca750-21f0-4c3e-98f9-98e1fdcb0f45] EVENT_ID: VM_CHECKPOINT_DELETED(10,797), Checkpoint 0e9cedb2-dcd9-4736-bdb1-72654e74b3ba for VM vm-test1 deleted (User: admin@internal-authz). The redefine triggered by 18fca750-21f0-4c3e-98f9-98e1fdcb0f45 2021-01-21 18:48:49,005+01 INFO [org.ovirt.engine.core.bll.RedefineVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] Running command: RedefineVmCheckpointCommand internal: true. Entities affected : ID: 011fc4cd-4831-49b3-8456-ebdcdf9a310b Type: VMAction group BACKUP_DISK with role type ADMIN, ID: a12c4bf3-e6a4-4d0e-9994-beb5ebc0cfb8 Type: DiskAction group BACKUP_DISK with role type ADMIN 2021-01-21 18:48:49,006+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] START, ListVmCheckpointsVDSCommand(HostName = ovirt-ovn003, VdsAndVmIDVDSParametersBase:{hostId='e8facc41-7088-4c5a-9a69-680d1d5e6dee', vmId='011fc4cd-4831-49b3-8456-ebdcdf9a310b'}), log id: 1906e285 2021-01-21 18:48:49,009+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ListVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] FINISH, ListVmCheckpointsVDSCommand, return: [], log id: 1906e285 2021-01-21 18:48:49,009+01 INFO [org.ovirt.engine.core.bll.RedefineVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] Redefine VM '011fc4cd-4831-49b3-8456-ebdcdf9a310b' checkpoint '0e9cedb2-dcd9-4736-bdb1-72654e74b3ba' 2021-01-21 18:48:49,011+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RedefineVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] START, RedefineVmCheckpointsVDSCommand(HostName = ovirt-ovn003, VmCheckpointsVDSParameters:{hostId='e8facc41-7088-4c5a-9a69-680d1d5e6dee', vmId='011fc4cd-4831-49b3-8456-ebdcdf9a310b'}), log id: 1f052f79 2021-01-21 18:48:49,040+01 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RedefineVmCheckpointsVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] FINISH, RedefineVmCheckpointsVDSCommand, return: 2021-01-21 18:48:49,041+01 INFO [org.ovirt.engine.core.bll.RedefineVmCheckpointCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-78) [70691619] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[011fc4cd-4831-49b3-8456-ebdcdf9a310b=VM]'}' On VDSM: 2021-01-21 18:48:52,432+0100 INFO (jsonrpc/4) [api.virt] START delete_checkpoints(checkpoint_ids=['0e9cedb2-dcd9-4736-bdb1-72654e74b3ba']) from=::ffff:x.91,42386, flow_id=2478aaec-eccb-4360-ad5a-b0574bb8dc15, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:48) 2021-01-21 18:48:52,436+0100 INFO (jsonrpc/4) [api.virt] FINISH delete_checkpoints return={'result': {'checkpoint_ids': ['0e9cedb2-dcd9-4736-bdb1-72654e74b3ba']}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:x.91,42386, flow_id=2478aaec-eccb-4360-ad5a-b0574bb8dc15, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:54) 2021-01-21 18:48:55,447+0100 INFO (jsonrpc/7) [api.virt] START list_checkpoints() from=::ffff:x.91,42386, flow_id=70691619, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:48) 2021-01-21 18:48:55,448+0100 INFO (jsonrpc/7) [api.virt] FINISH list_checkpoints return={'result': [], 'status': {'code': 0, 'message': 'Done'}} from=::ffff:x.91,42386, flow_id=70691619, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:54) 2021-01-21 18:48:55,453+0100 INFO (jsonrpc/1) [api.virt] START redefine_checkpoints(checkpoints=[{'xml': '<domaincheckpoint>\n <name>0e9cedb2-dcd9-4736-bdb1-72654e74b3ba</name>\n <description>checkpoint for backup '5c9f4ff5-67db-48cd-83b1-9633eab6f219'</description>\n <creationTime>1611251255</........ ]) from=::ffff:x.91,42386, flow_id=70691619, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:48) 2021-01-21 18:48:55,479+0100 INFO (jsonrpc/1) [api.virt] FINISH redefine_checkpoints return={'result': {'checkpoint_ids': ['0e9cedb2-dcd9-4736-bdb1-72654e74b3ba']}, 'status': {'code': 0, 'message': 'Done'}} from=::ffff:x.91,42386, flow_id=70691619, vmId=011fc4cd-4831-49b3-8456-ebdcdf9a310b (api:54) So it seems like after the checkpoint was deleted, the redefine caused it to get added again. But it was completely removed from the oVirt database. I guess some more locking is needed to make both deleted do not cause issues. The checkpoint is gone from the database, but its still active if you check via libvirt.
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
Please describe steps to verify this. thanks.
Steps to reproduce: 1) Create a VM with disk enabled for backup 2) Run the VM 3) start a full backup 4) finalize the backup 5) start and incremental backup 6) finalize the incremental backup 7) Turn off the VM 8) Start the VM 9) When the VM is UP remove the root checkpoint and immediately, remove it again from a different shell/client. Expected results - The first call to remove the checkpoint should succeed, the second should fail.
Verified on rhv-4.4.6-4 according steps from comment #3 Second (immediate) attempt of removing the root checkpoint failed as expected. 9) When the VM is UP remove the root checkpoint and immediately, remove it again from a different shell/client. First shell: [root@storage-ge13-vdsm2 examples]# python3 remove_checkpoint.py 0d688b5a-799d-4f83-956c-0c999afa3461 -c engine [ 0.0 ] Removing root checkpoint for VM '0d688b5a-799d-4f83-956c-0c999afa3461' [ 0.0 ] Looking up checkpoints 0d688b5a-799d-4f83-956c-0c999afa3461 [ 2.6 ] Removing root checkpoint 'b7c5a5a6-8199-4ccb-8877-6fa264d85be6' [ 4.6 ] Root checkpoint removed successfully Second shell: [root@storage-ge13-vdsm2 examples]# python3 remove_checkpoint.py 0d688b5a-799d-4f83-956c-0c999afa3461 -c engine [ 0.0 ] Removing root checkpoint for VM '0d688b5a-799d-4f83-956c-0c999afa3461' [ 0.0 ] Looking up checkpoints 0d688b5a-799d-4f83-956c-0c999afa3461 [ 1.6 ] Removing root checkpoint 'b7c5a5a6-8199-4ccb-8877-6fa264d85be6' Traceback (most recent call last): File "remove_checkpoint.py", line 63, in <module> checkpoint_service.remove() File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 34479, in remove self._internal_remove(headers, query, wait) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 271, in _internal_remove return future.wait() if wait else future File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 268, in callback self._check_fault(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot remove ${type}. The VM is during a checkpoint removal operation.]". HTTP response code is 409. Engine log: 2021-04-19 11:14:40,994+03 INFO [org.ovirt.engine.core.bll.storage.backup.DeleteVmCheckpointCommand] (default task-86) [fe7dca46-912e-49c0-b664-3d38464d5b10] Redefine previous VM checkpoints for VM '0d688b5a-799d-4f83-956c-0c999afa3461' 2021-04-19 11:14:41,008+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-85) [] Operation Failed: [Cannot remove ${type}. The VM is during a checkpoint removal operation.]
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.