Bug 1758052 - Create VM rollback fails with NPE leaving VM locked - Exception in invoking callback of command AddVmFromTemplate- Exception: javax.ejb.EJBTransactionRolledbackException
Summary: Create VM rollback fails with NPE leaving VM locked - Exception in invoking c...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.3.6.6
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.3.6
: 4.3.6.7
Assignee: Benny Zlotnik
QA Contact: Avihai
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-03 07:19 UTC by Avihai
Modified: 2019-10-25 13:44 UTC (History)
1 user (show)

Fixed In Version: ovirt-engine-4.3.6.7
Clone Of:
Environment:
Last Closed: 2019-10-25 13:44:37 UTC
oVirt Team: Storage
Embargoed:
pm-rhel: ovirt-4.3+
pm-rhel: blocker?
aefrat: testing_ack+


Attachments (Terms of Use)
TC11141_logs (2.00 MB, application/zip)
2019-10-03 07:19 UTC, Avihai
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 103806 0 'None' MERGED core: use isRunning instead != VMStatus.DOWN 2020-04-22 15:23:16 UTC
oVirt gerrit 103811 0 'None' MERGED core: use isRunning instead != VMStatus.DOWN 2020-04-22 15:23:16 UTC
oVirt gerrit 103899 0 'None' MERGED core: fix template creation rollback 2020-04-22 15:23:17 UTC

Description Avihai 2019-10-03 07:19:05 UTC
Created attachment 1622194 [details]
TC11141_logs

Description of problem:
seen in 2 TC's so far that fail to create VM from template and leaves the VM locked, this is why IMHO this bug is urgent.

In both cases after VM from template fails we see the following exception:
2019-10-01 21:09:57,201+03 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [] EVENT_ID: USER_ADD_VM_FINISHED_FAILURE(60), Failed to complete VM vm_TestCase18982_0121045989 creation.
2019-10-01 21:09:57,229+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [vms_create_8df9a84a-3778-40fb] Failed invoking callback end method 'onFailed' for command '6767e13a-d40e-4408-a11d-c6febe7e031b' with exception 'null', the callback is marked for end method retries but max number of retries have been attempted. The command will be marked as Failed.
2019-10-01 21:09:57,229+03 INFO  [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [vms_create_8df9a84a-3778-40fb] Exception in invoking callback of command AddVmFromTemplate (6767e13a-d40e-4408-a11d-c6febe7e031b): NullPointerException: 
2019-10-01 21:09:57,229+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [vms_create_8df9a84a-3778-40fb] Error invoking callback method 'onFailed' for 'FAILED' command '6767e13a-d40e-4408-a11d-c6febe7e031b'
2019-10-01 21:09:57,229+03 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-83) [vms_create_8df9a84a-3778-40fb] Exception: javax.ejb.EJBTransactionRolledbackException
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInCallerTx(CMTTxInterceptor.java:203) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:420) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509)
        at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:81) [weld-ejb.jar:3.0.6.Final-redhat-00002]
        at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:89)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:47) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:67) [wildfly-ejb3-7.2.4.GA-redhat-00002.jar:7.2.4.GA-redhat-00002]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50)
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422)
:
.............................................................................
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_222]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.access$201(ManagedScheduledThreadPoolExecutor.java:383) [javax.enterprise.concurrent.jar:1.0
.0.redhat-1]
        at org.glassfish.enterprise.concurrent.internal.ManagedScheduledThreadPoolExecutor$ManagedScheduledFutureTask.run(ManagedScheduledThreadPoolExecutor.java:534) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_222]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_222]
        at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_222]
        at org.glassfish.enterprise.concurrent.ManagedThreadFactoryImpl$ManagedThread.run(ManagedThreadFactoryImpl.java:250) [javax.enterprise.concurrent.jar:1.0.0.redhat-1]
Caused by: java.lang.NullPointerException

Version-Release number of selected component (if applicable):


How reproducible:
Hard to reproduce, seen only in 2 TC's so far.
Trying to reproduce by running the TC's ~12 times so far did not reproduce the issue,

Steps to Reproduce:
1.Create VM from template - and make sure operation fails(like restart SPM VDSM)


Actual results:
VM creation fails and VM and it's images remain locked.
NPE seen in the engine log.

Expected results:
VM creation fails rollback(VM removed) smoothly without issues.
No NPE should be seen.

Additional info:
So far issue seen in two cases in Tier3 storage suit:
1)TC11141
Copy disks from a VM that was created from a template
2)TC18982:
Create VM from Template and kill 'vdsmd' on SPM with only SPM and no HSM available

Comment 1 Avihai 2019-10-03 07:21:39 UTC
Version-Release number of selected component (if applicable):
ovirt-engine-4.3.6.6-0.1.el7.noarch
vdsm-4.30.33-1.el7ev.x86_64
libvirt-4.5.0-23.el7_7.1.x86_64
qemu-img-rhev-2.12.0-33.el7.x86_64

Comment 2 Avihai 2019-10-03 09:42:04 UTC
I managed to reproduce this issue easily on another enviroment running TestCase18982.
Create VM from Template and kill 'vdsmd' on SPM with only SPM and no HSM available

Comment 3 Tal Nisan 2019-10-03 11:58:29 UTC
Hard to reproduce without a specific test case means it's not urgent, setting to high

Comment 4 Avihai 2019-10-03 12:18:00 UTC
(In reply to Tal Nisan from comment #3)
> Hard to reproduce without a specific test case means it's not urgent,
> setting to high

See comment 2, this is easy to reproduce.
Create VM from Template and kill 'vdsmd' on SPM , setting it back to urgent.

This is urgent also because it effects:
1) VM which is locked with all its disks lock.
2) Locked VM can not be worked on or removed.
3) Deactivating a storage domain that has this VM will fail as this VM is not in the "down" state.

Comment 5 RHEL Program Management 2019-10-03 15:02:33 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 6 Avihai 2019-10-06 08:46:58 UTC
To clarify we had 2 TC's here each running create VM from template:

1)TestCase11141 - create VM from template as thin + restart VDSM cause VM to be locked.
Should be fixed by https://gerrit.ovirt.org/#/c/103811/

2)TestCase18982 - create VM from template as clone + restart VDSM cause VM to be locked - NOT fixed yet , moving bug back to assigned

Comment 7 Avihai 2019-10-08 11:41:10 UTC
Verified on 4.3.6.7-0.1.el7.

Tested both manual cases restarting VDSM while creating a VM from templated as thin/clone - passed, VM is not locked.
Also ran failed automation TestCase18982 and TestCase11141 - passed.

Comment 8 Sandro Bonazzola 2019-10-25 13:44:37 UTC
This bugzilla is included in oVirt 4.3.6 async release, published on October 8th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.