Bug 868887
| Summary: | [Storage] [Clone VM from snapshot] VM stuck in image lock status forever after restarting vdsm daemon while VM is being cloned from snapshot. | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Leonid Natapov <lnatapov> | ||||||||||
| Component: | ovirt-engine | Assignee: | Vered Volansky <vered> | ||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Leonid Natapov <lnatapov> | ||||||||||
| Severity: | high | Docs Contact: | |||||||||||
| Priority: | high | ||||||||||||
| Version: | 3.1.0 | CC: | abaron, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, vered, yeylon, ykaul | ||||||||||
| Target Milestone: | --- | ||||||||||||
| Target Release: | 3.2.0 | ||||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | storage | ||||||||||||
| Fixed In Version: | SF7 | Doc Type: | Bug Fix | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | Type: | Bug | |||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 631417 [details]
logs
looks like the brother of bug 840525 Another scenario - same problem: while cloning VM from snapshot I block connection from the host to storage domain (using iptables) and after a minute or so restore the connection. The source VM goes to paused mode (which I guess is ok since we have a storage issue) and the destination VM goes to image lock. Both VMs stays at their statues forever. Paused VM can not be started because we fail to rollback and delete the snapshot from the database. We also failed to create a process for a task. (AsyncTaskManager failed to create a task). So ,I have a situation when I have two VMs one of them in paused mode and can not be started and the second one in image locked status,so basically both VMs are dead. I have to mention that first scenario happened when source VM was down,so cloning was done from the regular snapshot and the second scenario happened when the source VM was up,so it was a live snapshot (that's why soutrce vm entered the pause mode),so in the first case only the destination VM is affected and in the second case both VMs are affected. Created attachment 632124 [details]
engine.log
Sorry, refresh issues, my comment is gone with the wind. Failing to reproduce by the following steps: 1. Cloning a snapshot while VM is down. 2. While VM and disks are still locked, restarting the vdsm deamon (using service vdsmd restart). Results: daemon restart is OK. In the UI - After a while the new VM seems to e in the VMs list and then disappears, while the last message dictates "Failed to complete VM <name> creation." UI behaviour seems fine, but engine log (attached in my last comment) shows NPEs. Second flow: 1. Cloning a snapshot while VM is up. 2. The same as above. Results, the same as above, though each step is taking a lot longer, I.e.: VM and disks are locked for a long time before the clone shows up in the VM list, and then it takes quite a while until it disappears from the list and the Failed to complete VM <name> creation." appears. I didn't try anything during that time. just let it run its course. Leonid, please see Vered's comment above. Are the steps the same as you did? if not, please provide full steps to reproduce. If yes then there are still issues to fix here but I don't see why this would be a test blocker? First of all in the second scenario I am blocking connection to storage domain and not restarting vdsm daemon. The behavior is different. Than,try no to do service vdsmd restart but service vdsmd stop and after a minute or so service vdsmd start. Same in the second scenario. Block the connection to SD ,wait a minute or so and than restore the connection to SD. Created attachment 634294 [details] engine log specified in comment 11 Behaviour doesn't seem consistent and hard to reproduce. Each Scenario starts with - 1. Start clone from snapshot. 2. While the new VM is locked: Scenario 1 - VM is down. Stop connection to host using iptables. Results: New VM creation is failed with relevant message. Tasks In host exist with task result cleanSuccess. engine log attached as engine_down_iptables.log. Scenario 2 - VM is down. 1. in host - service vdsmd stop. 2. wait for about a minute. 3. service vdsmd start. Same results as scenario 1. engine log attached as engine_down_stopD.log. Scenario 3 - VM is up. 1. in host - service vdsmd stop. 2. wait for about a minute. 3. service vdsmd start. Resutls: New VM remains in status IMAGE_LOCKED. Some of it's disks are also locked. Original VM snapshots tab is all disabled. engine log attached as engine_up_stopD.log. Tasks in host exist with status. Scenario 4 - VM is up. Otherwise the same as scenario 1. Results are also the same as scenario 1. engine log attached as engine_down_stopD.log. Created attachment 634295 [details] engine logs for comment 11 The was bug reproduced under the following conditions: Vm down, vdsm was stopped using iptables, snapshot of 5 pre-allocated disks of 3G each. There was an NPE in BaseImagesCommand.setImageStatus() resulting from checking the wrong disk image for null before applying. This was fixed in: http://gerrit.ovirt.org/#/c/11833/ Change-Id: Ica20a2542c07a355720503c25a14fbae3394c4d4 merged u/s sf9. fixed. was unable to reproduce. 3.2 has been released 3.2 has been released |
[Storage] [Clone VM from snapshot] VM stucked in image lock status forever after restarting vdsm daemon while VM is being cloned from snapshot. --- Scenario: I am creating a new VM from existing VM's snapshots e.g. cloning a VM from snapshot. A process of creation new VM started (new VM and it's disks enter image locked status). At this point,vdsm daemon on the host (single host in the system) is being restarted. After vdsm daemom is up, VM stays in image locked status,although its disks has OK status. On the backend side I have an exception in AddVmFromSnapshot function. Here is the crash. Full logs (vdsm and engine) attached. ------------------ 2012-10-22 12:45:15,107 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-49) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type AddVmFromSnapshot threw an exception: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to RemoveVMVDS, error = Unknown pool id, pool not connected: ('0b0d4fd0-1b96-11e2-9077-af929e80aa9d',) at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:211) [engine-bll.jar:] at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:1366) [engine-bll.jar:] at org.ovirt.engine.core.bll.VmCommand.RemoveVmInSpm(VmCommand.java:205) [engine-bll.jar:] at org.ovirt.engine.core.bll.VmCommand.RemoveVmInSpm(VmCommand.java:201) [engine-bll.jar:] at org.ovirt.engine.core.bll.AddVmCommand.endWithFailure(AddVmCommand.java:693) [engine-bll.jar:] at org.ovirt.engine.core.bll.AddVmFromSnapshotCommand.endWithFailure(AddVmFromSnapshotCommand.java:242) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalEndWithFailure(CommandBase.java:498) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:446) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1302) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInRequired(TransactionSupport.java:142) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:109) [engine-utils.jar:] at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:390) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:356) [engine-bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_03-icedtea] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_03-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_03-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_03-icedtea] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:] at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) [:1.7.0_03-icedtea] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_03-icedtea] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_03-icedtea] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:36) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:53) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:211) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:363) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:194) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:173) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-1] at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.1.1.Final-redhat-1] at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view9.endAction(Unknown Source) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:134) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:23) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:104) [engine-bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:64) [engine-utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_03-icedtea] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_03-icedtea] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_03-icedtea] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [rt.jar:1.7.0_03-icedtea] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [rt.jar:1.7.0_03-icedtea] at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_03-icedtea]