Bug 2083652

Summary: VM backup doesn't work after many backups of the vm. 'GetVmOvfByVmIdQuery' failed: null
Product: [oVirt] ovirt-engine Reporter: Yury.Panchenko
Component: Backup-Restore.VMsAssignee: Arik <ahadas>
Status: CLOSED CURRENTRELEASE QA Contact: Avihai <aefrat>
Severity: high Docs Contact: bugs <bugs>
Priority: unspecified    
Version: 4.4.10.7CC: aefrat, ahadas, bugs, nsoffer, Yury.Panchenko
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-15 16:28:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine logs none

Description Yury.Panchenko 2022-05-10 14:28:51 UTC
Created attachment 1878328 [details]
engine logs

Description of problem:
There are two vm
the vms are powered on
The vms have been backed up every 10 min during 2 days
After that the backup was failed, because the ovirt couldn't return the vms objects.

Version-Release number of selected component (if applicable):
RHV 4.4.10
ovirt-4.4.10

Actual results:
2022-05-10 11:02:38,816+02 ERROR [org.ovirt.engine.core.bll.GetVmOvfByVmIdQuery] (default task-7384) [45e09bd2-c8a6-4801-b451-914822736ce6] Query 'GetVmOvfByVmIdQuery' failed: null
2022-05-10 11:02:38,817+02 ERROR [org.ovirt.engine.core.bll.GetVmOvfByVmIdQuery] (default task-7384) [45e09bd2-c8a6-4801-b451-914822736ce6] Exception: java.lang.NullPointerException
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.ovf.OvfOvirtWriter.writeDisk(OvfOvirtWriter.java:109)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.ovf.OvfWriter.lambda$buildDisk$3(OvfWriter.java:176)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.ovf.OvfWriter.buildDisk(OvfWriter.java:174)
	at org.ovirt.engine.core.utils//org.ovirt.engine.core.utils.ovf.IOvfBuilder.build(IOvfBuilder.java:33)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.utils.ovf.OvfManager.exportVm(OvfManager.java:77)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.ovfstore.OvfHelper.buildMetadataDictionaryForVm(OvfHelper.java:228)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.storage.ovfstore.OvfHelper.generateOvfConfigurationForVm(OvfHelper.java:185)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.GetVmOvfByVmIdQuery.generateOvfConfig(GetVmOvfByVmIdQuery.java:50)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.GetVmOvfByVmIdQuery.executeQueryCommand(GetVmOvfByVmIdQuery.java:40)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.QueriesCommandBase.executeCommand(QueriesCommandBase.java:106)
	at org.ovirt.engine.core.dal//org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.executor.DefaultBackendQueryExecutor.execute(DefaultBackendQueryExecutor.java:14)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runQueryImpl(Backend.java:520)
	at deployment.engine.ear.bll.jar//org.ovirt.engine.core.bll.Backend.runQuery(Backend.java:489)
	at jdk.internal.reflect.GeneratedMethodAccessor63.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.jboss.as.ee.2.GA-redhat-00003//org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52)
	at org.jboss.invocation.3.Final-redhat-00001//org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
	at org.jboss.invocation.3.Final-redhat-00001//org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509)
	at org.jboss.as.weld.ejb.2.GA-redhat-00003//org.jboss.as.weld.ejb.DelegatingInterceptorInvocationContext.proceed(DelegatingInterceptorInvocationContext.java:92)
	at org.jboss.weld.core.6.Final-redhat-00001//org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.interceptorChainCompleted(WeldInvocationContextImpl.java:107)
	at org.jboss.weld.core.6.Final-redhat-00001//org.jboss.weld.interceptor.proxy.WeldInvocationContextImpl.proceed(WeldInvocationContextImpl.java:126)
	at org.ovirt.engine.core.common//org.ovirt.engine.core.common.di.interceptor.LoggingInterceptor.apply(LoggingInterceptor.java:12)
	at jdk.internal.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

Expected results:
Sucessfull backup

Comment 1 Arik 2022-05-10 20:33:21 UTC
Seems like DiskVmElement is missing
I'm not sure what initiates the calls to GetVmOvfByVmId in this environment by as from 11:02:38, when the first NPE occurs, it is called quite often - and I don't see in the log anything suspicious around that time, just executions of UpdateDisk
So the engine.log doesn't seem to have a hint on what went wrong there
Yury, can you identify the VM that it fails on? attach DB dump?

Comment 2 Yury.Panchenko 2022-05-13 13:00:16 UTC
Hello there,
I've provided the db backup via email to Arik.
Thanks.

Comment 3 Arik 2022-05-15 16:28:12 UTC
Thanks Yury, the db dump has shed more light on this.

The problem with the two VMs you mentioned that GetVmOvfByVmId fails for is that they have a disk that is associated with more than one incomplete image transfers:

veeam=# select disk_id, phase from image_transfers where disk_id='283ed51b-5852-4e36-bad2-3c02ce5cf1cd';
               disk_id                | phase 
--------------------------------------+-------
 283ed51b-5852-4e36-bad2-3c02ce5cf1cd |     7
 283ed51b-5852-4e36-bad2-3c02ce5cf1cd |     7

veeam=# select disk_id, phase from image_transfers where disk_id='b7c31938-012c-4226-9ef4-a71d0b740906';
               disk_id                | phase 
--------------------------------------+-------
 b7c31938-012c-4226-9ef4-a71d0b740906 |     7
 b7c31938-012c-4226-9ef4-a71d0b740906 |     7
(2 rows)

You can execute the following to fix that:
delete from image_transfers where disk_id in ('283ed51b-5852-4e36-bad2-3c02ce5cf1cd', 'b7c31938-012c-4226-9ef4-a71d0b740906');

The root cause, having image transfers stuck in phase 7 (FINALIZING_SUCCESS), is fixed in 4.5.0, see bz 2043984