Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 868887

Summary: [Storage] [Clone VM from snapshot] VM stuck in image lock status forever after restarting vdsm daemon while VM is being cloned from snapshot.
Product: Red Hat Enterprise Virtualization Manager Reporter: Leonid Natapov <lnatapov>
Component: ovirt-engineAssignee: Vered Volansky <vered>
Status: CLOSED CURRENTRELEASE QA Contact: Leonid Natapov <lnatapov>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, vered, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: SF7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
engine.log
none
engine log specified in comment 11
none
engine logs for comment 11 none

Description Leonid Natapov 2012-10-22 11:23:59 UTC
[Storage] [Clone VM from snapshot] VM stucked in image lock status forever after restarting vdsm daemon while VM is being cloned  from snapshot.
---
Scenario:
I am creating a new VM from existing VM's snapshots e.g. cloning a VM from snapshot. A process of creation new VM started (new VM and it's disks enter image locked status). At this point,vdsm daemon on the host (single host in the system) is being restarted. After vdsm daemom is up, VM stays in image locked status,although its disks has OK status. On the backend side I have an exception in AddVmFromSnapshot function. Here is the crash. 
Full logs (vdsm and engine) attached.

------------------

2012-10-22 12:45:15,107 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-49) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type AddVmFromSnapshot threw an exception: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to RemoveVMVDS, error = Unknown pool id, pool not connected: ('0b0d4fd0-1b96-11e2-9077-af929e80aa9d',)
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:211) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:1366) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.RemoveVmInSpm(VmCommand.java:205) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VmCommand.RemoveVmInSpm(VmCommand.java:201) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AddVmCommand.endWithFailure(AddVmCommand.java:693) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AddVmFromSnapshotCommand.endWithFailure(AddVmFromSnapshotCommand.java:242) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalEndWithFailure(CommandBase.java:498) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:446) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1302) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:142) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:109) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:390) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:356) [engine-bll.jar:]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_03-icedtea]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_03-icedtea]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_03-icedtea]
        at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_03-icedtea]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:]
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) [:1.7.0_03-icedtea]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_03-icedtea]
        at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_03-icedtea]
        at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:211) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:363) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:194) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.1.1.Final-redhat-1]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view9.endAction(Unknown Source) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:134) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:23) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:104) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_03-icedtea]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_03-icedtea]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_03-icedtea]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_03-icedtea]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_03-icedtea]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_03-icedtea]

Comment 1 Leonid Natapov 2012-10-22 11:26:20 UTC
Created attachment 631417 [details]
logs

Comment 2 Ayal Baron 2012-10-22 16:51:56 UTC
looks like the brother of bug 840525

Comment 3 Leonid Natapov 2012-10-23 09:36:16 UTC
Another scenario - same problem: while cloning  VM from snapshot I block connection from the host to storage domain (using iptables) and after a minute or so restore the connection. The source VM goes to paused mode (which I guess is ok since we have a storage issue) and the destination VM goes to image lock. Both VMs stays at their statues forever. Paused VM can not be started because we fail to rollback and delete the snapshot from the database. We also failed to create a process for a task. (AsyncTaskManager failed to create a task). 
So ,I have a situation when I have two VMs one of them in paused mode and can not be started and the second one in image locked status,so basically both VMs are dead.

Comment 5 Leonid Natapov 2012-10-23 10:15:39 UTC
I have to mention that first scenario happened when source VM was down,so cloning was done from the regular snapshot and the second scenario happened  when the source VM was up,so it was a live snapshot (that's why soutrce vm entered the pause mode),so in the first case only the destination VM is  affected and in the second case both VMs are affected.

Comment 6 Vered Volansky 2012-10-23 13:29:30 UTC
Created attachment 632124 [details]
engine.log

Comment 7 Vered Volansky 2012-10-23 14:02:02 UTC
Sorry, refresh issues, my comment is gone with the wind.

Failing to reproduce by the following steps:

1. Cloning a snapshot while VM is down.
2. While VM and disks are still locked, restarting the vdsm deamon (using service vdsmd restart).

Results:
daemon restart is OK.
In the UI - After a while the new VM seems to e in the VMs list and then disappears, while the last message dictates "Failed to complete VM <name> creation."

UI behaviour seems fine, but engine log (attached in my last comment) shows NPEs.

Second flow:
1. Cloning a snapshot while VM is up.
2. The same as above.

Results, the same as above, though each step is taking a lot longer, I.e.:
VM and disks are locked for a long time before the clone shows up in the VM list, and then it takes quite a while until it disappears from the list and the Failed to complete VM <name> creation." appears. I didn't try anything during that time. just let it run its course.

Comment 8 Ayal Baron 2012-10-23 20:34:11 UTC
Leonid, please see Vered's comment above.
Are the steps the same as you did? if not, please provide full steps to reproduce.
If yes then there are still issues to fix here but I don't see why this would be a test blocker?

Comment 9 Leonid Natapov 2012-10-23 22:27:01 UTC
First of all in the second scenario I am blocking connection to storage domain and not restarting vdsm daemon. The behavior is different. Than,try no to do service vdsmd restart but service vdsmd stop and after a minute or so service vdsmd start. Same in the second scenario. Block the connection to SD ,wait a minute or so and than restore the connection to SD.

Comment 10 Vered Volansky 2012-10-27 17:45:08 UTC
Created attachment 634294 [details]
engine log specified in comment 11

Comment 11 Vered Volansky 2012-10-27 17:46:16 UTC
Behaviour doesn't seem consistent and hard to reproduce.

Each Scenario starts with -
1. Start clone from snapshot.
2. While the new VM is locked:

Scenario 1 - VM is down.
Stop connection to host using iptables.

Results:
New VM creation is failed with relevant message.
Tasks In host exist with task result cleanSuccess.
engine log attached as engine_down_iptables.log.

Scenario 2 - VM is down.
1. in host - service vdsmd stop.
2. wait for about a minute.
3. service vdsmd start.

Same results as scenario 1.
engine log attached as engine_down_stopD.log.

Scenario 3 - VM is up.
1. in host - service vdsmd stop.
2. wait for about a minute.
3. service vdsmd start.

Resutls:
New VM remains in status IMAGE_LOCKED. Some of it's disks are also locked.
Original VM snapshots tab is all disabled.
engine log attached as engine_up_stopD.log.
Tasks in host exist with status.

Scenario 4 - VM is up.
Otherwise the same as scenario 1.
Results are also the same as scenario 1.
engine log attached as engine_down_stopD.log.

Comment 12 Vered Volansky 2012-10-27 17:50:26 UTC
Created attachment 634295 [details]
engine logs for comment 11

Comment 16 Vered Volansky 2013-02-07 08:40:13 UTC
The was bug reproduced under the following conditions:
Vm down, vdsm was stopped using iptables, snapshot of 5 pre-allocated disks of 3G each.
There was an NPE in BaseImagesCommand.setImageStatus() resulting from checking the wrong disk image for null before applying.
This was fixed in: http://gerrit.ovirt.org/#/c/11833/
Change-Id: Ica20a2542c07a355720503c25a14fbae3394c4d4

Comment 17 Vered Volansky 2013-02-07 11:34:37 UTC
merged u/s

Comment 18 Leonid Natapov 2013-03-05 11:24:41 UTC
sf9. fixed. was unable to reproduce.

Comment 19 Itamar Heim 2013-06-11 08:22:02 UTC
3.2 has been released

Comment 20 Itamar Heim 2013-06-11 08:24:33 UTC
3.2 has been released