Bug 1897049 - [CBT][incremental backup] Multiple NullPointerExceptions during VM removal after backing up the VM and removing the backup checkpoints
Summary: [CBT][incremental backup] Multiple NullPointerExceptions during VM removal af...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.4.3
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.4.6
: 4.4.6.4
Assignee: Eyal Shenitzky
QA Contact: Amit Sharir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-12 07:51 UTC by Ilan Zuckerman
Modified: 2021-08-02 07:04 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.4.6.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-05 05:36:14 UTC
oVirt Team: Storage
Embargoed:
asharir: testing_plan_complete+


Attachments (Terms of Use)
logs (437.10 KB, application/zip)
2020-11-12 07:51 UTC, Ilan Zuckerman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 113777 0 master MERGED core: refactor delete VM checkpoint to not use CoCo framework 2021-03-10 09:26:36 UTC
oVirt gerrit 113778 0 master MERGED core: take an exclusive lock on the VM during backup operations 2021-03-10 09:26:43 UTC

Description Ilan Zuckerman 2020-11-12 07:51:22 UTC
Created attachment 1728640 [details]
logs

Description of problem:
Multiple NullPointerExceptions [1] are seen on the engine log when removing the VM which was previously fully backed up, and its backup checkpoints were removed.
As we are not sure what are the consequences of those exceptions, a customer might get a feeling that some disastrous event had just took place.

Version-Release number of selected component (if applicable):
rhv-release-4.4.3-13-001.noarch
libvirt-6.6.0-6.module
vdsm-4.40.35.1-1.el8ev.x86_64


How reproducible:
100%

Steps to Reproduce:
0. Enable incrementalbackup feature on the engine:
engine-config -s "IsIncrementalBackupSupported=true" --cver 4.5 && systemctl restart ovirt-engine
1. Create a VM from template and start it
2. Make a full backup of its disks + Check that a checkpoint is created
3. Run SDK example script for removing the root checkpoint (make sure first to create a config file required for the script in /root/.config/ovirt.conf:
$ Python3 remove_checkpoint.py <VM uuid> -c engine
4. Remove the VM

Actual results:
VM removed, but many NullPointerExceptions are seen in the engine log

Expected results:
No NullPointerExceptions should appear

Additional info:
Attaching log of engine.log vdsm.log libvirt.log in debug level.
Note that in those logs, the backup and checkpoints removal was done twice. But in order to reproduce it, once is enough.

[1]:
2020-11-12 09:08:28,030+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (default task-32) [vms_delete_8e8db172-5248-4aab
] Adding task '87a6d0d8-6c49-4937-9faf-d71259820f72' (Parent Command 'RemoveVm', Parameters Type 'org.ovirt.engine.core.common.async
tasks.AsyncTaskParameters'), polling hasn't started yet..
2020-11-12 09:08:28,043+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [84900f53-bff9-4d50-94c6-95fc17a1def3] An exception has occurred while trying to create a command object for command 'DeleteVmCheckpoint' with parameters 'VmCheckpointParameters:{commandId='1065e5ee-2b42-4cc0-ade2-4c8c9b693959', user='admin', commandType='Unknown', vmId='399891fe-e891-4791-a822-5eb1a2e22c3b'}': WELD-000049: Unable to invoke protected final void org.ovirt.engine.core.bll.CommandBase.postConstruct() on org.ovirt.engine.core.bll.DeleteVmCheckpointCommand@1264be5d
2020-11-12 09:08:28,043+02 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-89) [84900f53-bff9-4d50-94c6-95fc17a1def3] Exception: java.lang.NullPointerException
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.DeleteVmCheckpointCommand.init(DeleteVmCheckpointCommand.java:74)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandBase.postConstruct(CommandBase.java:218)
        at jdk.internal.reflect.GeneratedMethodAccessor147.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:83)
        at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:66)
        at org.jboss.weld.core.4.Final-redhat-00001//org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122)
        at deployment.engine.ear//org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:41)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:104)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandsRepository.retrieveCommand(CommandsRepository.java:167)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandsRepository.retrieveCommand(CommandsRepository.java:154)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.retrieveCommand(CommandCoordinatorImpl.java:127)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.tasks.CommandCoordinatorUtil.retrieveCommand(CommandCoordinatorUtil.java:309)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.getCommand(ChildCommandsCallbackBase.java:189)
        at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.ChildCommandsCallbackBase.onSucceeded(ChildCommandsCallbackBase

Comment 1 Arik 2020-11-12 08:35:10 UTC
DeleteVmCheckpoint is called after RemoveVm already removed the VM from the database and the former has no protection for this scenario

Comment 2 Eyal Shenitzky 2020-11-12 08:50:38 UTC
(In reply to Arik from comment #1)
> DeleteVmCheckpoint is called after RemoveVm already removed the VM from the
> database and the former has no protection for this scenario

RemoveVm called after DeleteVmCheckpoint finished its execution but before the command ended successfully.
DeleteVmCheckpoint has a validation for the status of the VM (UP only), might be related to the command locks.

Comment 3 Eyal Shenitzky 2020-11-12 12:39:45 UTC
(In reply to Eyal Shenitzky from comment #2)
> RemoveVm called after DeleteVmCheckpoint finished its execution but before
> the command ended successfully.
> DeleteVmCheckpoint has a validation for the status of the VM (UP only),
> might be related to the command locks.

Since we cannot prevent from powering-off the VM until the command ends,
we must handle all the options when the VM powered off during the command execution.

Comment 4 Ilan Zuckerman 2021-04-06 13:12:45 UTC
Eyal, is there particular verification scenario for this in your mind?
Or just watching the logs during the steps i mentioned in the description would be sufficient?

Comment 5 Eyal Shenitzky 2021-04-07 05:49:40 UTC
(In reply to Ilan Zuckerman from comment #4)
> Eyal, is there particular verification scenario for this in your mind?
> Or just watching the logs during the steps i mentioned in the description
> would be sufficient?

I used the steps you provided.

Comment 6 Amit Sharir 2021-04-26 13:53:19 UTC
The test was done on storage-ge-2 environment:

version for testing :
[root@storage-ge-2 ~]# rpm -qa | grep release
rhv-release-4.4.6-6-001.noarch
redhat-release-8.4-0.6.el8.x86_64
redhat-release-eula-8.4-0.6.el8.x86_64

I verified the bug with the steps mentioned above. 



Verification Summary and conclusions -

After completing the flow mentioned - the expected log output matched the actual log output (I got no exceptions and no ERRORS in the engine log).
Bug verified.

Comment 7 Sandro Bonazzola 2021-05-05 05:36:14 UTC
This bugzilla is included in oVirt 4.4.6 release, published on May 4th 2021.

Since the problem described in this bug report should be resolved in oVirt 4.4.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.