+++ This bug was initially created as a clone of Bug #970889 +++ Description of problem: This happened to me in my setup and I found a way to reproduce it. a vm crashed during syncImageData and engine started reporting the following in a loop while vm disk remained in image locked: 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found I found a way to reproduce this 100% which is to kill -9 the vm pid during syncImageData. Version-Release number of selected component (if applicable): sf17.4 How reproducible: 100% Steps to Reproduce: 1. on a two hosts cluster with multiple iscsi storage create a vm with one disk 2. run the vm 3. when the vm changes status to up live storage migrate the disk 4. during syncImageData step, kill -9 the vm's pid Actual results: 1. engine shows the following error in the logs in a loop: 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found 2. vm's disk remains in image locked until we run unlock_entity script. Expected results: we should be able to recover in case a vm crashes duruing LSM. Additional info:logs 2013-06-05 10:12:03,563 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] Cannot get vdsManager for vdsid=00000000-0000-0000-0000-000000000000 2013-06-05 10:12:03,564 ERROR [org.ovirt.engine.core.vdsbroker.ResourceManager] (pool-4-thread-49) [5ab1c721] CreateCommand failed: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: Vds with id: 00000000-0000-0000-0000-000000000000 was not found at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.initializeVdsBroker(VdsBrokerCommand.java:50) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.<init>(VdsBrokerCommand.java:29) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskVDSCommand.<init>(VmReplicateDiskVDSCommand.java:9) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand.<init>(VmReplicateDiskFinishVDSCommand.java:8) [engine-vdsbroker.jar:] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) [rt.jar:1.7.0_19] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) [rt.jar:1.7.0_19] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [rt.jar:1.7.0_19] at java.lang.reflect.Constructor.newInstance(Constructor.java:525) [rt.jar:1.7.0_19] at org.ovirt.engine.core.vdsbroker.ResourceManager.CreateCommand(ResourceManager.java:333) [engine-vdsbroker.jar:] at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:370) [engine-vdsbroker.jar:] at org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler.revertTask(VmReplicateDiskStartTaskHandler.java:79) [engine-bll.jar:] at org.ovirt.engine.core.bll.lsm.VmReplicateDiskStartTaskHandler.endWithFailure(VmReplicateDiskStartTaskHandler.java:107) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.internalEndWithFailure(CommandBase.java:569) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:485) [engine-bll.jar:] at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1515) [engine-bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [engine-utils.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:108) [engine-utils.jar:] at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:429) [engine-bll.jar:] at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:379) [engine-bll.jar:] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_19] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [rt.jar:1.7.0_19] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_19] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_19] at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.WeavedInterceptor.processInvocation(WeavedInterceptor.java:53) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.interceptors.UserInterceptorFactory$1.processInvocation(UserInterceptorFactory.java:58) [jboss-as-ee.jar:7.2.0.Final-redhat-8] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:] at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) [:1.7.0_19] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_19] at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_19] at org.jboss.as.ee.component.ManagedReferenceLifecycleMethodInterceptorFactory$ManagedReferenceLifecycleMethodInterceptor.processInvocation(ManagedReferenceLifecycleMethodInterceptorFactory.java:123) [jboss-as-ee.jar:7.2.0.Final-redhat- engine=# SELECT * from images; image_guid | creation_date | size | it_guid | parentid | imagestatus | lastmodified | vm_snapshot_id | volume_type | volume_format | image_group_id | _create_date | _update_ date | quota_id | active --------------------------------------+------------------------+-------------+--------------------------------------+--------------------------------------+-------------+------ ------------------+--------------------------------------+-------------+---------------+--------------------------------------+-------------------------------+----------------- --------------+--------------------------------------+-------- 00000000-0000-0000-0000-000000000000 | 2008-04-01 00:00:00+03 | 85899345920 | 00000000-0000-0000-0000-000000000000 | | 0 | | | 2 | 4 | | 2013-02-26 04:56:09.820466+02 | | | t f82a0d58-0791-4137-b1e6-22a8794acd2a | 2013-05-27 14:07:13+03 | 16106127360 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 05-27 14:07:15+03 | df1e0727-64ac-4cae-a35f-7f31058bbd69 | 2 | 4 | cd5f0e01-8f73-4aa0-b837-b7f9bf4bc333 | 2013-05-27 14:07:05.401915+03 | 2013-05-29 10:55 :22.031875+03 | 00000000-0000-0000-0000-000000000000 | t d23f4d33-797b-4325-93d5-4e2e5d8e352b | 2013-05-30 19:00:13+03 | 1073741824 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 05-30 19:00:15+03 | b4ead7f9-62ec-4be8-b292-c78aa61571f4 | 2 | 4 | e76ada86-21cf-45b5-a241-74932cf45af9 | 2013-05-30 19:00:04.370783+03 | 2013-05-30 19:00 :15.472749+03 | 00000000-0000-0000-0000-000000000000 | t 864baac9-0229-4e0f-9be9-2ab0452c959d | 2013-06-05 10:06:56+03 | 4294967296 | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | 1 | 2013- 06-05 10:06:56+03 | 189f21ab-22b8-4e55-a994-079b105c396d | 1 | 5 | dcc04651-062b-4124-84aa-d97d3ae37173 | 2013-06-05 10:06:49.142729+03 | 2013-06-05 10:08 :51.89915+03 | 00000000-0000-0000-0000-000000000000 | f e76974e4-18f7-4bee-9a79-7e5a02a7d655 | 2013-06-05 10:09:00+03 | 4294967296 | 00000000-0000-0000-0000-000000000000 | 864baac9-0229-4e0f-9be9-2ab0452c959d | 2 | 2013- 06-05 10:09:00+03 | 44762adb-2711-41a8-9263-32c30130f9af | 2 | 4 | dcc04651-062b-4124-84aa-d97d3ae37173 | 2013-06-05 10:08:51.89915+03 | 2013-06-05 10:09 :14.203064+03 | 00000000-0000-0000-0000-000000000000 | t (5 rows) 3198f747-07ab-41eb-995b-f1470250c32f : verb = syncImageData id = 3198f747-07ab-41eb-995b-f1470250c32f --- Additional comment from Allon Mureinik on 2013-07-09 11:12:20 EDT --- This reproduces with the latest upstream (commit hash 085420fc98657a4e0281d38abb56f5d0ae0825ee). Adding recent logs from my setup, and marking as high priority as this is a scenario that can very well happen, and the system does not seem to have a way to automatically recover. --- Additional comment from Allon Mureinik on 2013-07-09 11:15:20 EDT --- --- Additional comment from Allon Mureinik on 2013-07-30 10:44:04 EDT ---
Verified using sf20 Error appears in logs but only once, Disk became available and didn't remain locked (after few minutes)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1210.html