Description of problem: Disk remains locked when engine fails during CloneImageGroupVolumesStructureCommand stage of the cold move command Version-Release number of selected component (if applicable): vdsm-4.19.2-1.gitd9c3ccb.el7.centos.x86_64 ovirt-engine-4.1.0.3-0.0.master.20170122091652.gitc6fc2c2.el7.centos.noarch How reproducible: All the time Steps to Reproduce: 1. Create VM with preallocated disk and power VM off 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CloneImageGroupVolumesStructureCommand is reported on the engine. Move fails on one disk and second disk remains in locked state Actual results: The Move fails and Disk is reported as locked, progress bar also stuck on 16% Expected results: Move should succeed Additional info: From Engine Log: 2017-01-23 13:55:35,310+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (org.ovirt.thread.pool-6-thread-41) [649abdb5] Running command: CloneImageGroupVolumesStructureCommand internal: true. 2017-01-23 13:55:35,995+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (DefaultQuartzScheduler9) [649abdb5] Starting child command 1 of 1, image 'a538cc5b-cb08-479a-bc39-2097a99a18e9' 2017-01-23 13:55:36,087+02 INFO [org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand] (org.ovirt.thread.pool-6-thread-41) [23fc7592] Running command: CloneImageGroupVolumesStructureCommand internal: true. RESTARTED THE ENGINE AT THIS POINT *********
Created attachment 1243582 [details] server, vdsm, engine.log Adding logs
Freddy, I guess it's the same cause from bug 1415502?
Kevin, dis you restart only the engine or also the HSM ?
Seems like this should've been fixed by BZ 1393459. Ravi, aside from our checks please take a look.
(In reply to Fred Rolland from comment #3) > Kevin, dis you restart only the engine or also the HSM ? Only the engine was restarted
I am unable to reproduce this on master. Moving the disks between nfs storage domains completes as expected after server restart.
What is the status of this issue? It seems very severe.
According to the log, there is an infra issue. However Ravi was not able to reproduce. Kevin, can you please test again on latest version ?
Test(In reply to Fred Rolland from comment #8) > According to the log, there is an infra issue. > However Ravi was not able to reproduce. > > Kevin, can you please test again on latest version ? Tested again with the following code: -------------------------------------------------------- ovirt-engine-4.1.0.4-0.1.el7.noarch rhevm-4.1.0.4-0.1.el7.noarch vdsm-4.19.4-1.el7ev.x86_64 Ran the same scenario as before with the new code: ------------------------------------------------------- Steps to Reproduce: 1. Create VM with preallocated disk and power VM off 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CloneImageGroupVolumesStructureCommand is reported on the engine. THE MOVE WORKS fine now.
This Bug reproduced in our automation: Description of problem: Disk remains locked when engine fails during stage 'CopyImageGroupWithDataCmd' of the cold move command Version-Release number of selected component (if applicable): vdsm - 4.19.11-1.el7ev.x86_64 Ovirt-engine- 4.19.11-1.el7ev.x86_64 How reproducible: 70% Steps to Reproduce: 1. Create VM with 4 disks(virtio-SCSI cow, virtio-SCSI raw, virtio cow, virtio raw) 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CopyImageGroupWithDataCmd is reported in the engine log. Actual results: The Move fails and Disk is reported as locked Expected results: Move should succeed
Created attachment 1275842 [details] vdsm and engine log
Exception from engine log: 2017-05-03 10:17:52,534+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (DefaultQuartzScheduler1) [disks_syncAction_197a1ec3-b940-4d6f] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.job.Step.addStep(Step.java:223) [common.jar:] at org.ovirt.engine.core.bll.job.ExecutionHandler.addSubStep(ExecutionHandler.java:386) [bll.jar:] at org.ovirt.engine.core.bll.job.ExecutionHandler.addTaskStep(ExecutionHandler.java:369) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.createTask(CoCoAsyncTaskHelper.java:66) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.createTask(CommandCoordinatorImpl.java:261) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorUtil.createTask(CommandCoordinatorUtil.java:111) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTaskImpl(CommandBase.java:1773) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1740) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1652) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1701) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand.executeCommand(CreateVolumeContainerCommand.java:82) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:511) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:493) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:697) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437) at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437) at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:83) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:263) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:243) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:66) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:64) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356) at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:636) at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:61) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356) at org.jboss.invocation.PrivilegedWithCombinerInterceptor.processInvocation(PrivilegedWithCombinerInterceptor.java:80) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:198) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73) at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view4.runInternalAction(Unknown Source) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:433) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.EnterpriseBeanProxyMethodHandler.invoke(EnterpriseBeanProxyMethodHandler.java:128) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.ovirt.engine.core.bll.BackendCommandObjectsHandler$BackendInternal$BackendLocal$2049259618$Proxy$_$$_Weld$EnterpriseProxy$.runInternalAction(Unknown Source) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalAction(CommandBase.java:2452) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2477) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2472) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.createImage(CloneImageGroupVolumesStructureCommand.java:126) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.performNextOperation(CloneImageGroupVolumesStructureCommand.java:89) [bll.jar:] at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:29) [bll.jar:] at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:63) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:114) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'ovirt-engine-4.1.2' doesn't contain patch 'https://gerrit.ovirt.org/76608'] gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.1.2 For more info please contact: infra
(In reply to rhev-integ from comment #14) > INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Tag 'ovirt-engine-4.1.2' doesn't contain patch Wrong tag, please re-run on ovirt-engine-4.1.2.2 > 'https://gerrit.ovirt.org/76608'] > gitweb: > https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ > ovirt-engine-4.1.2 > > For more info please contact: infra
Moving to ON_QA. This bug is in ovirt-engine-4.1.2.2 tag from which the engine was built.
Verified with the following code: ----------------------------------- ovirt-engine-4.1.2.1-0.1.el7.noarch rhevm-4.1.2.1-0.1.el7.noarch vdsm-4.19.14-1.el7ev.x86_64 Verified with the following scenario: ---------------------------------------- Steps to Reproduce: 1. Create VM with 4 disks(virtio-SCSI cow, virtio-SCSI raw, virtio cow, virtio raw) 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CopyImageGroupWithDataCmd is reported in the engine log >>>> The disk is successfully moved Moving to VERIFIED!
a patch was merged to fix the tag issue. Anton, can you follow up on this and verify it works properly for next build?
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days