Bug 962497 - engine: engine restart during DeleteImage step will cause reporting issues and ArrayIndexOutOfBoundsException: -1
Summary: engine: engine restart during DeleteImage step will cause reporting issues an...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.3.0
Assignee: Daniel Erez
QA Contact: yeylon@redhat.com
URL:
Whiteboard: storage
Depends On: 987783
Blocks: 971874
TreeView+ depends on / blocked
 
Reported: 2013-05-13 16:49 UTC by Dafna Ron
Modified: 2016-04-18 06:45 UTC (History)
13 users (show)

Fixed In Version: is2
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 971874 (view as bug list)
Environment:
Last Closed: 2014-01-21 22:13:12 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.48 MB, application/x-gzip)
2013-05-13 16:49 UTC, Dafna Ron
no flags Details
logs (2.47 MB, application/x-gzip)
2013-05-22 12:35 UTC, Dafna Ron
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 15133 0 None None None Never

Description Dafna Ron 2013-05-13 16:49:00 UTC
Created attachment 747297 [details]
logs

Description of problem:

I restarted ovirt-engine service right when the DeleteImage step of Live Storage migration is sent. 
we have the following issues:

1. ArrayIndexOutOfBoundsException: -1 which repeats in endless loop
2. we report to the user that we finsihed the move (also in loop)
3. once I stopped the vm we report that the same action failed in endless loop.
4. the syncData task is not cleared.  

since I think all of these issues are caused by the OutOfBounds error I am opening one bug for all.

Version-Release number of selected component (if applicable):

sf16

How reproducible:

100%

Steps to Reproduce:
1. on iscsi storage, start a vm with one disk on two hosts (my vm was running on the hsm)
2. live migrate the vms disk
3. right when the DeleteImage for the source disk is sent, restart ovirt-engine
  
Actual results:

1. ArrayIndexOutOfBoundsException: -1 which repeats in endless loop
2. we report to the user that we finsihed the move (also in loop)
3. once I stopped the vm we report that the same action failed in endless loop. 
4. the syncData task is not cleared

Expected results:

if we succeeded we should report the success and clear all tasks 

Additional info: full logs


[root@cougar01 ~]# vdsClient -s 0 getAllTasksInfo
28f4c0eb-8f3c-4728-bb60-b6f408601137 :
	verb = syncImageData
	id = 28f4c0eb-8f3c-4728-bb60-b6f408601137

root@cougar01 ~]# vdsClient -s 0 getAllTasksStatuses
{'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'28f4c0eb-8f3c-4728-bb60-b6f408601137': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '28f4c0eb-8f3c-4728-bb60-b6f408601137', 'taskResult': 'success', 'taskState': 'finished'}}}

2013-05-13 19:24:15,271 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-48) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2013-05-13 19:24:15,323 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-48) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:]

Comment 1 Sean Cohen 2013-05-22 11:44:43 UTC
Dafna, aside from log messages, what's the impact?
when you say endless loop, does it indeed go on forever?
how often?
what happens after engine restart?
how long did you wait?

Comment 2 Dafna Ron 2013-05-22 12:23:00 UTC
(In reply to Sean Cohen from comment #1)
> Dafna, aside from log messages, what's the impact?

the event log will report the task as succeeded and than failed (when we shut down the vm) until we restart engine 

> when you say endless loop, does it indeed go on forever?

until we restart engine

> how often?
every 5 seconds (the log will report the exception and each time an event will appear)

> what happens after engine restart?

the events disappear and engine sends deleteImage to the destination domain although db already changed the mapping to the destination (this will cause a mismatch in the disks and the vm will not run until gss will fix the mapping in db or move the disks between the storages)

> how long did you wait?

it does not matter, if we cannot clean the task it will run forever. but I waited at least 5 minutes before restarting engine to clear the task.

Comment 3 Dafna Ron 2013-05-22 12:34:17 UTC
as I said in the comment above we have a mismatch between the db and vdsm on the location of the disks. 
this issue is caused by engine because when we get the exception we will report the dataSync as successful and update the db with the new location, however, when we shut down the vm the alerts change to failure to move the disk and when we restart the engine to clean the task the DeleteImage is sent which causes the wrong disks to be removed. 

I will attach the full logs

vdsm: 

[root@cougar02 ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  3eb0913d-e032-4fa5-8026-745e7c7fee74 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   3.00g                                             
  8e93db93-fb9d-4566-aa02-f63dd6b77aa8 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   1.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  ids                                  38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m                                             
  inbox                                38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  leases                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  master                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   1.00g                                             
  metadata                             38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m                                             
  outbox                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  3eb0913d-e032-4fa5-8026-745e7c7fee74 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   3.00g                                             
  8e93db93-fb9d-4566-aa02-f63dd6b77aa8 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   1.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  ids                                  81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m                                             
  inbox                                81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  leases                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a----   2.00g                                             
  master                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao---   1.00g                                             
  metadata                             81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m                                             
  outbox                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 457.71g                                             
  lv_swap                              vg0                                  -wi-ao---   7.85g       


[root@cougar02 ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  ids                                  38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m                                             
  inbox                                38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  leases                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  master                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   1.00g                                             
  metadata                             38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m                                             
  outbox                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  3eb0913d-e032-4fa5-8026-745e7c7fee74 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   3.00g                                             
  8e93db93-fb9d-4566-aa02-f63dd6b77aa8 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   1.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  ids                                  81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m                                             
  inbox                                81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  leases                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a----   2.00g                                             
  master                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao---   1.00g                                             
  metadata                             81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m                                             
  outbox                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 457.71g                                             
  lv_swap                              vg0                                  -wi-ao---   7.85g        




[root@cougar02 ~]# vdsClient -s 0 getStorageDomainInfo 38755249-4bb3-4841-bf5b-05f4a521514d
	uuid = 38755249-4bb3-4841-bf5b-05f4a521514d
	vguuid = 4Ceo5k-vMud-sVKL-blqX-JwKo-ETN6-8ao4L9
	lver = -1
	state = OK
	version = 3
	role = Master
	pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
	spm_id = -1
	type = ISCSI
	class = Data
	master_ver = 0
	name = Dafna-32-03

[root@cougar02 ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l
	lver = 6
	state = OK
	version = 3
	role = Master
	pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
	spm_id = 2
	type = ISCSI
	class = Data
	master_ver = 3697
	name = Dafna-32-02

cougar02 (new spm after restart):

Thread-484::INFO::2013-05-21 17:40:14,549::logUtils::40::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='38755249-4bb3-4841-bf5b-05f4a521514d', spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', imgUUID='39194824-8601-4b3e-8d37-4
bddc71dfe68', postZero='false', force='false')


engine: 

create of volume: 

45e7c7fee74, newImageDescription = , imageId = 8e93db93-fb9d-4566-aa02-f63dd6b77aa8, sourceImageGroupId = 39194824-8601-4b3e-8d37-4bddc71dfe68), log id: 5d433107
2013-05-21 17:36:01,391 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-47) [75b18eea] -- CreateSnapshotVDSCommand::ExecuteIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID
2013-05-21 17:36:01,391 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-47) [75b18eea] -- createVolume parameters:
                sdUUID=81ef11d0-4c0c-47b4-8953-d61a6af442d8
                spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb
                imgGUID=39194824-8601-4b3e-8d37-4bddc71dfe68
                size=16,106,127,360 bytes
                volFormat=COW
                volType=Sparse
                volUUID=3eb0913d-e032-4fa5-8026-745e7c7fee74
                descr=
                srcImgGUID=39194824-8601-4b3e-8d37-4bddc71dfe68
                srcVolUUID=8e93db93-fb9d-4566-aa02-f63dd6b77aa8
                



Exception: 

2013-05-21 17:37:54,819 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-47) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2013-05-21 17:37:54,841 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-47) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view9.endAction(Unknown Source) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:]

delete of image group: 

2013-05-21 17:40:10,299 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-48) [4644a8d1] Reverting task deleteImage, handler: org.ovirt.engine.core.bll.lsm.CreateImagePlaceholderTaskHandler
2013-05-21 17:40:10,324 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-4-thread-48) [4644a8d1] START, DeleteImageGroupVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, imageGroupId = 39194824-8601-4b3e-8d37-4bddc71dfe68, postZeros = false, forceDelete = false), log id: 4bbb6f76
2013-05-21 17:40:12,476 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-4-thread-48) [4644a8d1] FINISH, DeleteImageGroupVDSCommand, log id: 4bbb6f76
2013-05-21 17:40:12,497 INFO  [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (pool-4-thread-48) [4644a8d1] EntityMultiAsyncTasks::AttachTask: Attaching task f5e11442-843c-448c-ae32-85707457cd9e to entity 8e93db93-fb9d-4566-aa02-f63dd6b77aa8.

Comment 4 Dafna Ron 2013-05-22 12:35:07 UTC
Created attachment 751711 [details]
logs

Comment 8 vvyazmin@redhat.com 2013-07-23 14:52:39 UTC
Can’t be verified, this bug depend on BZ987460, tested on RHEVM 3.3 - IS6 environment.

Comment 9 vvyazmin@redhat.com 2013-08-18 15:07:15 UTC
Tested on FCP Data Center

Verified, tested on RHEVM 3.3 - IS10 environment:

RHEVM:  rhevm-3.3.0-0.15.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.10-1.el6ev.noarch
VDSM:  vdsm-4.12.0-61.git8178ec2.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

Comment 12 Itamar Heim 2014-01-21 22:13:12 UTC
Closing - RHEV 3.3 Released

Comment 13 Itamar Heim 2014-01-21 22:21:09 UTC
Closing - RHEV 3.3 Released


Note You need to log in before you can comment on or make changes to this bug.