Created attachment 1728640 [details] logs Description of problem: Multiple NullPointerExceptions [1] are seen on the engine log when removing the VM which was previously fully backed up, and its backup checkpoints were removed. As we are not sure what are the consequences of those exceptions, a customer might get a feeling that some disastrous event had just took place. Version-Release number of selected component (if applicable): rhv-release-4.4.3-13-001.noarch libvirt-6.6.0-6.module vdsm-4.40.35.1-1.el8ev.x86_64 How reproducible: 100% Steps to Reproduce: 0. Enable incrementalbackup feature on the engine: engine-config -s "IsIncrementalBackupSupported=true" --cver 4.5 && systemctl restart ovirt-engine 1. Create a VM from template and start it 2. Make a full backup of its disks + Check that a checkpoint is created 3. Run SDK example script for removing the root checkpoint (make sure first to create a config file required for the script in /root/.config/ovirt.conf: $ Python3 remove_checkpoint.py <VM uuid> -c engine 4. Remove the VM Actual results: VM removed, but many NullPointerExceptions are seen in the engine log Expected results: No NullPointerExceptions should appear Additional info: Attaching log of engine.log vdsm.log libvirt.log in debug level. Note that in those logs, the backup and checkpoints removal was done twice. But in order to reproduce it, once is enough. [1]: 2020-11-12 09:08:28,030+02 INFO [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-32) [vms_delete_8e8db172-5248-4aab ] Adding task '87a6d0d8-6c49-4937-9faf-d71259820f72' (Parent Command 'RemoveVm', Parameters Type 'org.ovirt.engine.core.common.async tasks.AsyncTaskParameters'), polling hasn't started yet.. 2020-11-12 09:08:28,043+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [84900f53-bff9-4d50-94c6-95fc17a1def3] An exception has occurred while trying to create a command object for command 'DeleteVmCheckpoint' with parameters 'VmCheckpointParameters:{commandId='1065e5ee-2b42-4cc0-ade2-4c8c9b693959', user='admin', commandType='Unknown', vmId='399891fe-e891-4791-a822-5eb1a2e22c3b'}': WELD-000049: Unable to invoke protected final void org.ovirt.engine.core.bll.CommandBase.postConstruct() on org.ovirt.engine.core.bll.DeleteVmCheckpointCommand@1264be5d 2020-11-12 09:08:28,043+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [84900f53-bff9-4d50-94c6-95fc17a1def3] Exception: java.lang.NullPointerException at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.DeleteVmCheckpointCommand.init(DeleteVmCheckpointCommand.java:74) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.postConstruct(CommandBase.java:218) at jdk.internal.reflect.GeneratedMethodAccessor147.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:83) at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:66) at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) at deployment.engine.ear//org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:41) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:104) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandsRepository.retrieveCommand(CommandsRepository.java:167) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandsRepository.retrieveCommand(CommandsRepository.java:154) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.retrieveCommand(CommandCoordinatorImpl.java:127) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorUtil.retrieveCommand(CommandCoordinatorUtil.java:309) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.getCommand(ChildCommandsCallbackBase.java:189) at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.onSucceeded(ChildCommandsCallbackBase
DeleteVmCheckpoint is called after RemoveVm already removed the VM from the database and the former has no protection for this scenario
(In reply to Arik from comment #1) > DeleteVmCheckpoint is called after RemoveVm already removed the VM from the > database and the former has no protection for this scenario RemoveVm called after DeleteVmCheckpoint finished its execution but before the command ended successfully. DeleteVmCheckpoint has a validation for the status of the VM (UP only), might be related to the command locks.
(In reply to Eyal Shenitzky from comment #2) > RemoveVm called after DeleteVmCheckpoint finished its execution but before > the command ended successfully. > DeleteVmCheckpoint has a validation for the status of the VM (UP only), > might be related to the command locks. Since we cannot prevent from powering-off the VM until the command ends, we must handle all the options when the VM powered off during the command execution.
Eyal, is there particular verification scenario for this in your mind? Or just watching the logs during the steps i mentioned in the description would be sufficient?
(In reply to Ilan Zuckerman from comment #4) > Eyal, is there particular verification scenario for this in your mind? > Or just watching the logs during the steps i mentioned in the description > would be sufficient? I used the steps you provided.
The test was done on storage-ge-2 environment: version for testing : [root@storage-ge-2 ~]# rpm -qa | grep release rhv-release-4.4.6-6-001.noarch redhat-release-8.4-0.6.el8.x86_64 redhat-release-eula-8.4-0.6.el8.x86_64 I verified the bug with the steps mentioned above. Verification Summary and conclusions - After completing the flow mentioned - the expected log output matched the actual log output (I got no exceptions and no ERRORS in the engine log). Bug verified.
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021. Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.