Description of problem: While verifying this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1695018 I've noticed a side effect where the disk is getting stuck in "Cannot move Virtual Disk. Disk ${DiskName} is being moved or copied" state. In the UI and database, the status is OK. This occurs when I try to move the disk at a specific time just when the snapshot is being removed. Before that point, all my tries failed as expected because the disk was locked. From the engine log: Starting the LSM: 2019-05-30 15:23:48,579+03 INFO [org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand] (default task-57) [3cbe9333-f0be-4796-ac0d-06f75c323aba] Running command: LiveMigrateDiskCommand internal: true. Entities affected : ID: f38a13cf-7774-40fc-8836-ac455ee47957 Type: DiskAction group DISK_LIVE_STORAGE_MIGRATION with role type USER Creating snapshot: 2019-05-30 15:23:48,715+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotForVmCommand] (default task-57) [3cbe9333-f0be-4796-ac0d-06f75c323aba] Running command: CreateSnapshotForVmCommand internal: true. Entities affected : ID: 1ebc059a-4ae1-4391-bb83-9f966f54eb7e Type: VMAction group MANIPULATE_VM_SNAPSHOTS with role type USER Ending create snapshot command: 2019-05-30 15:24:08,714+03 INFO [org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-88) [3cbe9333-f0be-4796-ac0d-06f75c323aba] Ending command 'org.ovirt.engine.core.bll.snapshots.CreateSnapshotCommand' successfully. ----- Try to move the disk to other storage domain using REST-API ----- 2019-05-30 15:26:35,151+03 ERROR [org.ovirt.engine.core.bll.CommandsFactory] (default task-57) [d43e1908-ccb6-4424-b1ea-cc272d307e19] Exception: java.lang.NullPointerException at org.ovirt.engine.core.bll.storage.disk.image.BaseImagesCommand.getImageGroupId(BaseImagesCommand.java:180) [bll.jar:] at org.ovirt.engine.core.bll.storage.disk.MoveOrCopyDiskCommand.getImageGroupId(MoveOrCopyDiskCommand.java:126) [bll.jar:] at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.getVm(LiveMigrateDiskCommand.java:522) [bll.jar:] at org.ovirt.engine.core.bll.storage.lsm.LiveMigrateDiskCommand.init(LiveMigrateDiskCommand.java:140) [bll.jar:] at org.ovirt.engine.core.bll.CommandBase.postConstruct(CommandBase.java:212) [bll.jar:] at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.invokeMethods(DefaultLifecycleCallbackInvoker.java:83) [weld-core-impl.jar:3.0.5.Final-redhat-00001] at org.jboss.weld.injection.producer.DefaultLifecycleCallbackInvoker.postConstruct(DefaultLifecycleCallbackInvoker.java:66) [weld-core-impl.jar:3.0.5.Final-redhat-00001] at org.jboss.weld.injection.producer.BasicInjectionTarget.postConstruct(BasicInjectionTarget.java:122) [weld-core-impl.jar:3.0.5.Final-redhat-00001] at org.ovirt.engine.core.di.Injector.injectMembers(Injector.java:31) [vdsbroker.jar:] at org.ovirt.engine.core.bll.CommandsFactory.createCommand(CommandsFactory.java:100) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:449) [bll.jar:] at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:656) [bll.jar:] at sun.reflect.GeneratedMethodAccessor186.invoke(Unknown Source) [:1.8.0_212] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_212] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_212] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptor.java:52) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.delegateInterception(Jsr299BindingsInterceptor.java:78) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.doMethodInterception(Jsr299BindingsInterceptor.java:88) at org.jboss.as.weld.interceptors.Jsr299BindingsInterceptor.processInvocation(Jsr299BindingsInterceptor.java:101) at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:63) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.invocationmetrics.ExecutionTimeInterceptor.processInvocation(ExecutionTimeInterceptor.java:43) [wildfly-ejb3-7.2.1.GA-redhat-00004.jar:7.2.1.GA-redhat-00004] ..... very long trace..... And another exception: 2019-05-30 15:26:35,242+03 ERROR [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-57) [d43e1908-ccb6-4424-b1ea-cc272d307e19] Command 'org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand' failed: java.lang.NullPointerException 2019-05-30 15:26:35,242+03 ERROR [org.ovirt.engine.core.bll.storage.disk.MoveDiskCommand] (default task-57) [d43e1908-ccb6-4424-b1ea-cc272d307e19] Exception: javax.ejb.EJBException: java.lang.NullPointerException at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:223) [wildfly-ejb3-7.2.1.GA-redhat-00004.jar:7.2.1.GA-redhat-00004] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:418) [wildfly-ejb3-7.2.1.GA-redhat-00004.jar:7.2.1.GA-redhat-00004] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:148) [wildfly-ejb3-7.2.1.GA-redhat-00004.jar:7.2.1.GA-redhat-00004] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:509) at org.jboss.weld.module.ejb.AbstractEJBRequestScopeActivationInterceptor.aroundInvoke(AbstractEJBRequestScopeActivationInterceptor.java:72) [weld-ejb.jar:3.0.5.Final-redhat-00001] at org.jboss.as.weld.ejb.EjbRequestScopeActivationInterceptor.processInvocation(EjbRequestScopeActivationInterceptor.java:89) at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:422) at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [wildfly-ejb3-7.2.1.GA-redhat-00004.jar:7.2.1.GA-redhat-00004] ..... very long trace..... And the request response: 2019-05-30 15:26:35,283+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource ] (default task-57) [] Operation Failed: [Internal Engine Error] and since this point, the engine log is flooded with this: 2019-05-30 15:26:35,481+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-38) [af802aea-a0cf-4e38-89f4-1185fc941366] Failed to acquire lock and wait lock 'EngineLoc k:{exclusiveLocks='[d9e041e8-4446-46ef-ad80-c28aaa5eaa05=LIVE_STORAGE_MIGRATION]', sharedLocks= ''}' 2019-05-30 15:26:35,508+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-38) [af802aea-a0cf-4e38-89f4-1185fc941366] Failed to acquire lock and wait lock 'EngineLoc k:{exclusiveLocks='[d9e041e8-4446-46ef-ad80-c28aaa5eaa05=LIVE_STORAGE_MIGRATION]', sharedLocks= ''}' 2019-05-30 15:26:38,457+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-38) [af802aea-a0cf-4e38-89f4-1185fc941366] Failed to acquire lock and wait lock 'EngineLoc k:{exclusiveLocks='[d9e041e8-4446-46ef-ad80-c28aaa5eaa05=LIVE_STORAGE_MIGRATION]', sharedLocks= ''}' 2019-05-30 15:26:38,487+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-38) [af802aea-a0cf-4e38-89f4-1185fc941366] Failed to acquire lock and wait lock 'EngineLoc k:{exclusiveLocks='[d9e041e8-4446-46ef-ad80-c28aaa5eaa05=LIVE_STORAGE_MIGRATION]', sharedLocks= ''}' 2019-05-30 15:26:38,588+03 INFO [org.ovirt.engine.core.bll.lock.InMemoryLockManager] (default task-38) [af802aea-a0cf-4e38-89f4-1185fc941366] Failed to acquire lock and wait lock 'EngineLoc k:{exclusiveLocks='[d9e041e8-4446-46ef-ad80-c28aaa5eaa05=LIVE_STORAGE_MIGRATION]', sharedLocks= ''}' ----- another LSM try after the Internal Engine Error ----- 2019-05-30 15:35:26,153+03 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource ] (default task-66) [] Operation Failed: [Cannot move Virtual Disk. Disk ${DiskName} is being m oved or copied.] <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <action> <fault> <detail>[Cannot move Virtual Disk. Disk ${DiskName} is being moved or copied.]</detail> <reason>Operation Failed</reason> </fault> <status>failed</status> <storage_domain id="661cc303-9356-4aa8-8dfd-eb3bc1163d45"/> </action> Version-Release number of selected component (if applicable): happened on both 4.3.4 and 4.2.8: 4.3.4: ovirt-engine-4.3.4.1-0.1.el7.noarch vdsm-4.30.16-3.el7ev.x86_64 4.2.8: ovirt-engine-4.2.8.7-0.1.el7ev.noarch vdsm-4.20.49-3.el7ev.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create a VM with disk on iscsi_0 SD 2. Start the VM and wait for "up" status 3. Issue LSM on the disk to iscsi_1 using REST-API 4. Issue another LSM to iscsi_2 using REST-API - will fail as the disk is locked 5. Wait for "RemoveSnapshotCommand" in the engine log 6. Issue another LSM to iscsi_2 using REST-API - the request get stuck and finally "Internal Engine Error" 7. If 6 not finishes - abort the request 8. Issue LSM on the disk to iscsi_2 using REST-API - this time it's "Cannot move Virtual Disk. Disk ${DiskName} is being moved or copied" Since this point, the disk is in OK status but still can't be moved (both REST-API and UI). Actual results: The 1st LSM finishes successfully but the 2nd one causes a side effect where the disk can't be move Expected results: The disk should be available for move regardless the tries during the last LSM Additional info:
Tested using: ovirt-engine-4.3.6-0.1.el7.noarch Verified using the steps in the bug description before, during and after the RemoveSnapshotCommand multiple times. Actual result: During all the LSM process, the image was locked and the response (REST API) was "Cannot move Virtual Disk. Disk test_vm_Disk1 is being moved or copied". Moving to VERIFIED.
This bugzilla is included in oVirt 4.3.6 release, published on September 26th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.6 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.