Bug 965676

Summary: engine: vm image remains locked with 'ArrayIndexOutOfBoundsException: -1' when vdsm restarts on spm after complete of task "syncData"
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: yeylon <yeylon>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, acathrow, amureini, cpelland, derez, dron, iheim, jkt, lpeer, Rhev-m-bugs, scohen, srevivo, yeylon
Target Milestone: ---Keywords: ZStream
Target Release: 3.3.0Flags: scohen: Triaged+
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: is2 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 971883 (view as bug list) Environment:
Last Closed: 2014-01-21 22:15:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 987783    
Bug Blocks: 971883    
Attachments:
Description Flags
logs
none
CloneImageGroupStructureVDSCommand logs none

Description Dafna Ron 2013-05-21 14:14:06 UTC
Created attachment 751191 [details]
logs

Description of problem:

vm is running on hsm, I restarted the vdsm right when syncData task (part of live storage migration) has completed. 
vm's disk remain locked, engine has endless ArrayIndexOutOfBoundsException: -1 exception + event log in loop: 

User admin@internal finished moving disk NFS-RHEL6_iSCSI_Disk1 to domain Dafna-32-03.

and after I shut down the vm we are getting failed event log errors in a loop: 


Version-Release number of selected component (if applicable):

User admin@internal have failed to move disk NFS-RHEL6_iSCSI_Disk1 to domain Dafna-32-03.

How reproducible:

100%

Steps to Reproduce:
1. in iscsi storage with two hosts, create a vm from template
2. run the vm on hsm
3. start a live storage migration on the vm disk
4. restart vdsm on the spm when syncData task has change state to completed successfully in SPM

Actual results:

vm is stuck in image locked and we are getting: java.lang.ArrayIndexOutOfBoundsException: -1

event log runs events in loops 

Expected results:

we should be able to recover from the vdsm restart

Additional info: logs


after vdsm restart: 

2013-05-21 16:43:19,170 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-49) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:26) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:107) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_19]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_19]

Comment 1 Dafna Ron 2013-05-21 16:14:27 UTC
*** Bug 965681 has been marked as a duplicate of this bug. ***

Comment 4 Dafna Ron 2013-05-23 09:24:37 UTC
same issue happens when I restart vdsm right when FINISH of CloneImageGroupStructureVDSCommand is logged. 

in this case we do not remove the disks from either the target or the des but it means that we cannot move the disks to that domain (volume already exists error) and need GSS intervention. 
we also cannot be 100% sure that the correct disks will be cleaned by GSS (so if by accident the original disks will be remove we will be left with empty lv's since syncData has not run yet). 

attaching the logs from the CloneImageGroupStructureVDSCommand test as well. 
and here is the lvs output in from the hosts: 


vm created: 

[root@cougar01 ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  ids                                  38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m                                             
  inbox                                38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  leases                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  master                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   1.00g                                             
  metadata                             38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m                                             
  outbox                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   1.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  ids                                  81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m                                             
  inbox                                81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  leases                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a----   2.00g                                             
  master                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao---   1.00g                                             
  metadata                             81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m                                             
  outbox                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 449.82g                                             
  lv_swap                              vg0                                  -wi-ao---  15.74g     

after CloneImageGroupStructureVDSCommand: 


[root@cougar01 ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  0d5899ae-490a-4015-926d-d523138050c6 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   1.00g                                             
  4ddf8f4f-5fdc-46e2-9885-2016b67e0192 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  ids                                  38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m                                             
  inbox                                38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  leases                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  master                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   1.00g                                             
  metadata                             38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m                                             
  outbox                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  0d5899ae-490a-4015-926d-d523138050c6 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   1.00g                                             
  4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  ids                                  81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m                                             
  inbox                                81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  leases                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a----   2.00g                                             
  master                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao---   1.00g                                             
  metadata                             81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m                                             
  outbox                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 449.82g                                             
  lv_swap                              vg0                                  -wi-ao---  15.74g       


after failure of move: 

[root@cougar01 ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  0d5899ae-490a-4015-926d-d523138050c6 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   1.00g                                             
  4ddf8f4f-5fdc-46e2-9885-2016b67e0192 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------   2.00g                                             
  ids                                  38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m                                             
  inbox                                38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  leases                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   2.00g                                             
  master                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a----   1.00g                                             
  metadata                             38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m                                             
  outbox                               38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m                                             
  0d5899ae-490a-4015-926d-d523138050c6 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   1.00g                                             
  4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------   2.00g                                             
  ids                                  81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m                                             
  inbox                                81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  leases                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a----   2.00g                                             
  master                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao---   1.00g                                             
  metadata                             81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m                                             
  outbox                               81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 449.82g                                             
  lv_swap                              vg0                                  -wi-ao---  15.74g          

 

[root@cougar01 ~]# vdsClient -s 0 getStorageDomainInfo 38755249-4bb3-4841-bf5b-05f4a521514d
	uuid = 38755249-4bb3-4841-bf5b-05f4a521514d
	vguuid = 4Ceo5k-vMud-sVKL-blqX-JwKo-ETN6-8ao4L9
	lver = -1
	state = OK
	version = 3
	role = Master
	pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
	spm_id = -1
	type = ISCSI
	class = Data
	master_ver = 0
	name = Dafna-32-03

[root@cougar01 ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8
	vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l
	lver = 43
	state = OK
	version = 3
	role = Master
	pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
	spm_id = 1
	type = ISCSI
	class = Data
	master_ver = 3697
	name = Dafna-32-02

Comment 5 Dafna Ron 2013-05-23 09:25:34 UTC
Created attachment 752056 [details]
CloneImageGroupStructureVDSCommand logs

Comment 10 Daniel Erez 2013-07-09 11:29:30 UTC
*** Bug 967600 has been marked as a duplicate of this bug. ***

Comment 11 vvyazmin@redhat.com 2013-07-23 14:50:49 UTC
Can’t be verified, this bug depend on BZ987460, tested on RHEVM 3.3 - IS6 environment.

Comment 12 vvyazmin@redhat.com 2013-08-18 17:04:38 UTC
Tested on FCP Data Center

Verified, tested on RHEVM 3.3 - IS10 environment:

RHEVM:  rhevm-3.3.0-0.15.master.el6ev.noarch
PythonSDK:  rhevm-sdk-python-3.3.0.10-1.el6ev.noarch
VDSM:  vdsm-4.12.0-61.git8178ec2.el6ev.x86_64
LIBVIRT:  libvirt-0.10.2-18.el6_4.9.x86_64
QEMU & KVM:  qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64
SANLOCK:  sanlock-2.8-1.el6.x86_64

Comment 15 Itamar Heim 2014-01-21 22:15:30 UTC
Closing - RHEV 3.3 Released

Comment 16 Itamar Heim 2014-01-21 22:22:39 UTC
Closing - RHEV 3.3 Released