Following exception appeared in the engine.log, details are in attached customer case: 2021-09-04 03:18:19,019+02 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 1342fe90 2021-09-04 03:18:19,019+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 5fb14bf5 2021-09-04 03:18:19,044+02 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] transaction rolled back 2021-09-04 03:18:19,044+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] Failed invoking callback end method 'onSucceeded' for command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' with exception 'null', the callback is marked for end method retries 2021-09-04 03:18:19,054+02 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] Command 'RemoveSnapshot' (id: 'e1013db3-7c37-4ad8-a44d-5054be2c6e2f') waiting on child command id: 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' type:'RemoveSnapshotSingleDiskLive' to complete 2021-09-04 03:19:38,932+02 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 784329b1 2021-09-04 03:19:38,956+02 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] transaction rolled back 2021-09-04 03:19:38,965+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Failed invoking callback end method 'onSucceeded' for command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' with exception 'null', the callback is marked for end method retries but max number of retries have been attempted. The command will be marked as Failed. 2021-09-04 03:19:38,966+02 INFO [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Exception in invoking callback of command RemoveSnapshotSingleDiskLive (f9c566d0-963c-4c19-8db3-3cfafa0b447a): NullPointerException:␠ 2021-09-04 03:19:38,966+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Error invoking callback method 'onSucceeded' for 'SUCCEEDED' command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' 2021-09-04 03:19:38,966+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Exception: java.lang.NullPointerException ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.isRemoveTopImageMemoryNeeded(RemoveSnapshotSingleDiskCommandBase.java:267) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.removeTopImageMemoryIfNeeded(RemoveSnapshotSingleDiskCommandBase.java:290) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.handleBackwardMerge(RemoveSnapshotSingleDiskCommandBase.java:257) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.lambda$syncDbRecords$1(RemoveSnapshotSingleDiskCommandBase.java:182) ␉·······at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:181) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.syncDbRecords(RemoveSnapshotSingleDiskCommandBase.java:172) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.onSucceeded(RemoveSnapshotSingleDiskLiveCommand.java:232) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback.onSucceeded(RemoveSnapshotSingleDiskLiveCommandCallback.java:27) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.endCallback(CommandCallbacksPoller.java:69) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:166) ␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) ␉·······at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ␉·······at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) ␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) ␉·······at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ␉·······at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ␉·······at java.base/java.lang.Thread.run(Thread.java:829) ␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)
Martin, Can you provide engine and VDSM logs?
Hello I've attached another case that is suffering the same issue and produce db snapshot data inconsistencies that interrupt backup operations and create case with a huge amount of VMs affected.
Fixing this should be fairly easy but we need to figure out the flow this happens in, seems to be related to backward merge
1. Edit /usr/lib/python3.6/site-packages/vdsm/virt/jobs/snapshot.py Add a two minute sleep in line 493: time.sleep(120) self._vm.run_dom_snapshot(snapxml, snap_flags) 2. Create a snapshot on a VM 3. Run the VM on the edited host 4. Create a live snapshot, wait a minute and shutdown the VM The operation should fail, but there should be a snapshot that should be removed without issues
(In reply to Benny Zlotnik from comment #13) > The operation should fail, but there should be a snapshot that should be > removed without issues to be more concrete: ... but there should be a snapshot that can be removed manually without issues
Verified - RemoveSnapshotSingleDisk command ran without issues and the live snapshot 's2_live' deletion for VM 'new_vm' was completed successfully. ovirt-engine-4.5.1.2-0.11.el8ev.noarch vdsm-4.50.1.3-1.el8ev.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5555