Created attachment 757059 [details] logs Description of problem: This happened to me in my setup and I found a way to reproduce it. a vm crashed during syncImageData and engine started reporting the following in a loop while vm disk remained in image locked: 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found I found a way to reproduce this 100% which is to kill -9 the vm pid during syncImageData. Version-Release number of selected component (if applicable): sf17.4 How reproducible: 100% Steps to Reproduce: 1. on a two hosts cluster with multiple iscsi storage create a vm with one disk 2. run the vm 3. when the vm changes status to up live storage migrate the disk 4. during syncImageData step, kill -9 the vm's pid Actual results: 1. engine shows the following error in the logs in a loop: 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found 2. vm's disk remains in image locked until we run unlock_entity script. Expected results: we should be able to recover in case a vm crashes duruing LSM. Additional info:logs 2013-06-05 10:12:03,563 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] Cannot get vdsManager for vdsid=00000000-0000-0000-0000-000000000000 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.initializeVdsBroker(VdsBrokerCommand.java:50) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.<init>(VdsBrokerCommand.java:29) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskVDSCommand.<init>(VmReplicateDiskVDSCommand.java:9) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand.<init>(VmReplicateDiskFinishVDSCommand.java:8) [engine-vdsbroker.jar:] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [rt.jar:1.7.0_19] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) [rt.jar:1.7.0_19] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.7.0_19] at java.lang.reflect.Constructor.newInstance(Constructor.java:525) [rt.jar:1.7.0_19] at org.ovirt.engine.core.vdsbroker.ResourceManager.CreateCommand(ResourceManager.java:333) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:370) [engine-vdsbroker.jar:] at org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler.revertTask(VmReplicateDiskStartTaskHandler.java:79) [engine-bll.jar:] at org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler.endWithFailure(VmReplicateDiskStartTaskHandler.java:107) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalEndWithFailure(CommandBase.java:569) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:485) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1515) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:108) [engine-utils.jar:] at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:429) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:379) [engine-bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_19] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_19] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_19] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_19] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:58) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:] at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) [:1.7.0_19] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_19] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_19] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee.jar:7.2.0.Final-redhat- engine=# SELECT * from images; image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_date | _update_ date | quota_id | active --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+-------------+------ ------------------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+----------------- --------------+--------------------------------------+-------- 00000000-0000-0000-0000-000000000000 | 2008-04-01 00:00:00+03 | 85899345920 | 00000000-0000-0000-0000-000000000000 | | 0 | | | 2 | 4 | | 2013-02-26 04:56:09.820466+02 | | | t f82a0d58-0791-4137-b1e6-22a8794acd2a | 2013-05-27 14:07:13+03 | 16106127360 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 05-27 14:07:15+03 | df1e0727-64ac-4cae-a35f-7f31058bbd69 | 2 | 4 | cd5f0e01-8f73-4aa0-b837-b7f9bf4bc333 | 2013-05-27 14:07:05.401915+03 | 2013-05-29 10:55 :22.031875+03 | 00000000-0000-0000-0000-000000000000 | t d23f4d33-797b-4325-93d5-4e2e5d8e352b | 2013-05-30 19:00:13+03 | 1073741824 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 05-30 19:00:15+03 | b4ead7f9-62ec-4be8-b292-c78aa61571f4 | 2 | 4 | e76ada86-21cf-45b5-a241-74932cf45af9 | 2013-05-30 19:00:04.370783+03 | 2013-05-30 19:00 :15.472749+03 | 00000000-0000-0000-0000-000000000000 | t 864baac9-0229-4e0f-9be9-2ab0452c959d | 2013-06-05 10:06:56+03 | 4294967296 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 06-05 10:06:56+03 | 189f21ab-22b8-4e55-a994-079b105c396d | 1 | 5 | dcc04651-062b-4124-84aa-d97d3ae37173 | 2013-06-05 10:06:49.142729+03 | 2013-06-05 10:08 :51.89915+03 | 00000000-0000-0000-0000-000000000000 | f e76974e4-18f7-4bee-9a79-7e5a02a7d655 | 2013-06-05 10:09:00+03 | 4294967296 | 00000000-0000-0000-0000-000000000000 | 864baac9-0229-4e0f-9be9-2ab0452c959d | 2 | 2013- 06-05 10:09:00+03 | 44762adb-2711-41a8-9263-32c30130f9af | 2 | 4 | dcc04651-062b-4124-84aa-d97d3ae37173 | 2013-06-05 10:08:51.89915+03 | 2013-06-05 10:09 :14.203064+03 | 00000000-0000-0000-0000-000000000000 | t (5 rows) 3198f747-07ab-41eb-995b-f1470250c32f : verb = syncImageData id = 3198f747-07ab-41eb-995b-f1470250c32f
This reproduces with the latest upstream (commit hash 085420fc98657a4e0281d38abb56f5d0ae0825ee). Adding recent logs from my setup, and marking as high priority as this is a scenario that can very well happen, and the system does not seem to have a way to automatically recover.
Created attachment 771095 [details] Recent engine log with described failure.
*** Bug 989966 has been marked as a duplicate of this bug. ***
Verified on is9.1 - after killing vm process live migration fails (after a few minutes) and disk is unlocked.
Closing - RHEV 3.3 Released