Bug 1415691
Summary: | New HSM infra - Disk remains locked when engine fails during engine task | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Kevin Alon Goldblatt <kgoldbla> | ||||||
Component: | BLL.Storage | Assignee: | Fred Rolland <frolland> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 4.1.0 | CC: | amarchuk, amureini, bugs, eedri, eshenitz, frolland, kgoldbla, laravot, rhev-integ, rnori, tnisan, ylavi | ||||||
Target Milestone: | ovirt-4.1.2 | Keywords: | Reopened | ||||||
Target Release: | 4.1.2.2 | Flags: | rule-engine:
ovirt-4.1+
rule-engine: exception+ |
||||||
Hardware: | All | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2017-05-23 08:22:46 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Kevin Alon Goldblatt
2017-01-23 12:57:15 UTC
Created attachment 1243582 [details]
server, vdsm, engine.log
Adding logs
Freddy, I guess it's the same cause from bug 1415502? Kevin, dis you restart only the engine or also the HSM ? Seems like this should've been fixed by BZ 1393459. Ravi, aside from our checks please take a look. (In reply to Fred Rolland from comment #3) > Kevin, dis you restart only the engine or also the HSM ? Only the engine was restarted I am unable to reproduce this on master. Moving the disks between nfs storage domains completes as expected after server restart. What is the status of this issue? It seems very severe. According to the log, there is an infra issue. However Ravi was not able to reproduce. Kevin, can you please test again on latest version ? Test(In reply to Fred Rolland from comment #8) > According to the log, there is an infra issue. > However Ravi was not able to reproduce. > > Kevin, can you please test again on latest version ? Tested again with the following code: -------------------------------------------------------- ovirt-engine-4.1.0.4-0.1.el7.noarch rhevm-4.1.0.4-0.1.el7.noarch vdsm-4.19.4-1.el7ev.x86_64 Ran the same scenario as before with the new code: ------------------------------------------------------- Steps to Reproduce: 1. Create VM with preallocated disk and power VM off 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CloneImageGroupVolumesStructureCommand is reported on the engine. THE MOVE WORKS fine now. This Bug reproduced in our automation: Description of problem: Disk remains locked when engine fails during stage 'CopyImageGroupWithDataCmd' of the cold move command Version-Release number of selected component (if applicable): vdsm - 4.19.11-1.el7ev.x86_64 Ovirt-engine- 4.19.11-1.el7ev.x86_64 How reproducible: 70% Steps to Reproduce: 1. Create VM with 4 disks(virtio-SCSI cow, virtio-SCSI raw, virtio cow, virtio raw) 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CopyImageGroupWithDataCmd is reported in the engine log. Actual results: The Move fails and Disk is reported as locked Expected results: Move should succeed Created attachment 1275842 [details]
vdsm and engine log
Exception from engine log: 2017-05-03 10:17:52,534+03 ERROR [org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand] (DefaultQuartzScheduler1) [disks_syncAction_197a1ec3-b940-4d6f] Exception: java.lang.NullPointerException at org.ovirt.engine.core.common.job.Step.addStep(Step.java:223) [common.jar:] at org.ovirt.engine.core.bll.job.ExecutionHandler.addSubStep(ExecutionHandler.java:386) [bll.jar:] at org.ovirt.engine.core.bll.job.ExecutionHandler.addTaskStep(ExecutionHandler.java:369) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CoCoAsyncTaskHelper.createTask(CoCoAsyncTaskHelper.java:66) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorImpl.createTask(CommandCoordinatorImpl.java:261) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCoordinatorUtil.createTask(CommandCoordinatorUtil.java:111) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTaskImpl(CommandBase.java:1773) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1740) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1652) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.createTask(CommandBase.java:1701) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CreateVolumeContainerCommand.executeCommand(CreateVolumeContainerCommand.java:82) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1251) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1391) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:2055) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:164) [utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:103) [utils.jar:] at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1451) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:397) [bll.jar:] at org.ovirt.engine.core.bll.executor.DefaultBackendActionExecutor.execute(DefaultBackendActionExecutor.java:13) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:511) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:493) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:697) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121]at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437) at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:70) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:80) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.weld.ejb.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:93) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:437) at org.jboss.weld.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:73) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:83) [wildfly-weld-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ee.concurrent.ConcurrentContextInterceptor.processInvocation(ConcurrentContextInterceptor.java:45) [wildfly-ee-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InitialInterceptor.processInvocation(InitialInterceptor.java:21) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.interceptors.ComponentDispatcherInterceptor.processInvocation(ComponentDispatcherInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.singleton.SingletonComponentInstanceAssociationInterceptor.processInvocation(SingletonComponentInstanceAssociationInterceptor.java:53) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:263) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:243) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.invocationmetrics.WaitTimeInterceptor.processInvocation(WaitTimeInterceptor.java:43) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.security.SecurityContextInterceptor.processInvocation(SecurityContextInterceptor.java:100) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.deployment.processors.StartupAwaitInterceptor.processInvocation(StartupAwaitInterceptor.java:22) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:66) [wildfly-ejb3-7.0.5.GA-redhat-2.jar:7.0.5.GA-redhat-2] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ContextClassLoaderInterceptor.processInvocation(ContextClassLoaderInterceptor.java:64) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356) at org.wildfly.security.manager.WildFlySecurityManager.doChecked(WildFlySecurityManager.java:636) at org.jboss.invocation.AccessCheckingInterceptor.processInvocation(AccessCheckingInterceptor.java:61) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.InterceptorContext.run(InterceptorContext.java:356) at org.jboss.invocation.PrivilegedWithCombinerInterceptor.processInvocation(PrivilegedWithCombinerInterceptor.java:80) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:198) at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:185) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:340) at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:73) at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view4.runInternalAction(Unknown Source) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.jboss.weld.util.reflection.Reflections.invokeAndUnwrap(Reflections.java:433) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.EnterpriseBeanProxyMethodHandler.invoke(EnterpriseBeanProxyMethodHandler.java:128) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.EnterpriseTargetBeanInstance.invoke(EnterpriseTargetBeanInstance.java:56) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.jboss.weld.bean.proxy.ProxyMethodHandler.invoke(ProxyMethodHandler.java:100) [weld-core-impl.jar:2.3.3.Final-redhat-1] at org.ovirt.engine.core.bll.BackendCommandObjectsHandler$BackendInternal$BackendLocal$2049259618$Proxy$_$$_Weld$EnterpriseProxy$.runInternalAction(Unknown Source) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalAction(CommandBase.java:2452) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2477) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInternalActionWithTasksContext(CommandBase.java:2472) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.createImage(CloneImageGroupVolumesStructureCommand.java:126) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.image.CloneImageGroupVolumesStructureCommand.performNextOperation(CloneImageGroupVolumesStructureCommand.java:89) [bll.jar:] at org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback.childCommandsExecutionEnded(SerialChildCommandsExecutionCallback.java:29) [bll.jar:] at org.ovirt.engine.core.bll.ChildCommandsCallbackBase.doPolling(ChildCommandsCallbackBase.java:63) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:114) [bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_121] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_121] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_121] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_121] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121] Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'ovirt-engine-4.1.2' doesn't contain patch 'https://gerrit.ovirt.org/76608'] gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.1.2 For more info please contact: infra (In reply to rhev-integ from comment #14) > INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Tag 'ovirt-engine-4.1.2' doesn't contain patch Wrong tag, please re-run on ovirt-engine-4.1.2.2 > 'https://gerrit.ovirt.org/76608'] > gitweb: > https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ > ovirt-engine-4.1.2 > > For more info please contact: infra Moving to ON_QA. This bug is in ovirt-engine-4.1.2.2 tag from which the engine was built. Verified with the following code: ----------------------------------- ovirt-engine-4.1.2.1-0.1.el7.noarch rhevm-4.1.2.1-0.1.el7.noarch vdsm-4.19.14-1.el7ev.x86_64 Verified with the following scenario: ---------------------------------------- Steps to Reproduce: 1. Create VM with 4 disks(virtio-SCSI cow, virtio-SCSI raw, virtio cow, virtio raw) 2. Select to Move the block disk to another domain 3. Restart the ENGINE as soon as the CopyImageGroupWithDataCmd is reported in the engine log >>>> The disk is successfully moved Moving to VERIFIED! a patch was merged to fix the tag issue. Anton, can you follow up on this and verify it works properly for next build? The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days |