Bug 2001923 - NPE during RemoveSnapshotSingleDisk command
Summary: NPE during RemoveSnapshotSingleDisk command
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.7
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.5.1
: ---
Assignee: Benny Zlotnik
QA Contact: Shir Fishbain
URL:
Whiteboard:
Depends On: 2081294
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-07 13:11 UTC by Martin Perina
Modified: 2024-12-20 20:57 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.5.1.2
Doc Type: Bug Fix
Doc Text:
Previously, when a failed VM snapshot was removed from the Manager database while the volume remained on the storage, subsequent operations failed because there was a discrepancy between the storage and the database. Now, the VM snapshot is retained if the volume is not removed from the storage.
Clone Of:
Environment:
Last Closed: 2022-07-14 12:54:31 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-engine pull 403 0 None open core: do not remove snapshot from DB if related images were not removed 2022-06-13 09:53:35 UTC
Red Hat Issue Tracker RHV-43398 0 None None None 2021-09-07 13:12:49 UTC
Red Hat Product Errata RHSA-2022:5555 0 None None None 2022-07-14 12:55:15 UTC

Description Martin Perina 2021-09-07 13:11:51 UTC
Following exception appeared in the engine.log, details are in attached customer case:

2021-09-04 03:18:19,019+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetVolumeInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 1342fe90                         
2021-09-04 03:18:19,019+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 5fb14bf5                           
2021-09-04 03:18:19,044+02 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] transaction rolled back                                                                                                                                            
2021-09-04 03:18:19,044+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] Failed invoking callback end method 'onSucceeded' for command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' with exception 'null', the callback is marked for end method retries 
2021-09-04 03:18:19,054+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-37) [c49799a3-82bd-459a-a9cb-385c960dd366] Command 'RemoveSnapshot' (id: 'e1013db3-7c37-4ad8-a44d-5054be2c6e2f') waiting on child command id: 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' type:'RemoveSnapshotSingleDiskLive' to complete

2021-09-04 03:19:38,932+02 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine.core.common.businessentities.storage.DiskImage@f684c2bf, log id: 784329b1                           
2021-09-04 03:19:38,956+02 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] transaction rolled back                                                                                                                                            
2021-09-04 03:19:38,965+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Failed invoking callback end method 'onSucceeded' for command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a' with exception 'null', the callback is marked for end method retries but max number of retries have been attempted. The command will be marked as Failed.
2021-09-04 03:19:38,966+02 INFO  [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Exception in invoking callback of command RemoveSnapshotSingleDiskLive (f9c566d0-963c-4c19-8db3-3cfafa0b447a): NullPointerException:␠                                  
2021-09-04 03:19:38,966+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Error invoking callback method 'onSucceeded' for 'SUCCEEDED' command 'f9c566d0-963c-4c19-8db3-3cfafa0b447a'                                                            
2021-09-04 03:19:38,966+02 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-88) [c49799a3-82bd-459a-a9cb-385c960dd366] Exception: java.lang.NullPointerException                                                                                                                              
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.isRemoveTopImageMemoryNeeded(RemoveSnapshotSingleDiskCommandBase.java:267)                                                                                                                                                                                     
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.removeTopImageMemoryIfNeeded(RemoveSnapshotSingleDiskCommandBase.java:290)                                                                                                                                                                                     
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.handleBackwardMerge(RemoveSnapshotSingleDiskCommandBase.java:257)     
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.lambda$syncDbRecords$1(RemoveSnapshotSingleDiskCommandBase.java:182)  
␉·······at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:181)                                 
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskCommandBase.syncDbRecords(RemoveSnapshotSingleDiskCommandBase.java:172)           
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.onSucceeded(RemoveSnapshotSingleDiskLiveCommand.java:232)             
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback.onSucceeded(RemoveSnapshotSingleDiskLiveCommandCallback.java:27)                                                                                                                                                                                       
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.endCallback(CommandCallbacksPoller.java:69)                                            
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethodsImpl(CommandCallbacksPoller.java:166)                             
␉·······at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109)                                 
␉·······at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)                                                                                            
␉·······at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)                                                                                                   
␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383)                                                                                                                                          
␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534)                                                                                                                                                 
␉·······at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)                                                                                    
␉·······at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)                                                                                    
␉·······at java.base/java.lang.Thread.run(Thread.java:829)                                                                                                                              
␉·······at org.glassfish.javax.enterprise.concurrent.0.redhat-1//org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250)

Comment 2 Eyal Shenitzky 2021-09-13 14:10:59 UTC
Martin,
Can you provide engine and VDSM logs?

Comment 7 José Enrique 2022-02-01 17:21:11 UTC
Hello I've attached another case that is suffering the same issue and produce db snapshot data inconsistencies that interrupt backup operations and create case with a huge amount of VMs affected.

Comment 8 Arik 2022-02-01 20:37:41 UTC
Fixing this should be fairly easy but we need to figure out the flow this happens in, seems to be related to backward merge

Comment 13 Benny Zlotnik 2022-06-21 07:48:09 UTC
1. Edit /usr/lib/python3.6/site-packages/vdsm/virt/jobs/snapshot.py
Add a two minute sleep in line 493:
                time.sleep(120)
                self._vm.run_dom_snapshot(snapxml, snap_flags)

2. Create a snapshot on a VM
3. Run the VM on the edited host
4. Create a live snapshot, wait a minute and shutdown the VM

The operation should fail, but there should be a snapshot that should be removed without issues

Comment 14 Arik 2022-06-21 08:12:43 UTC
(In reply to Benny Zlotnik from comment #13)
> The operation should fail, but there should be a snapshot that should be
> removed without issues

to be more concrete: ... but there should be a snapshot that can be removed manually without issues

Comment 20 Shir Fishbain 2022-06-23 11:33:47 UTC
Verified - RemoveSnapshotSingleDisk command ran without issues and the live snapshot 's2_live' deletion for VM 'new_vm' was completed successfully.

ovirt-engine-4.5.1.2-0.11.el8ev.noarch
vdsm-4.50.1.3-1.el8ev.x86_64

Comment 24 errata-xmlrpc 2022-07-14 12:54:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.1] security, bug fix and update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5555


Note You need to log in before you can comment on or make changes to this bug.