Bug 965676
Summary: | engine: vm image remains locked with 'ArrayIndexOutOfBoundsException: -1' when vdsm restarts on spm after complete of task "syncData" | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||
Component: | ovirt-engine | Assignee: | Daniel Erez <derez> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | yeylon <yeylon> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.2.0 | CC: | abaron, acathrow, amureini, cpelland, derez, dron, iheim, jkt, lpeer, Rhev-m-bugs, scohen, srevivo, yeylon | ||||||
Target Milestone: | --- | Keywords: | ZStream | ||||||
Target Release: | 3.3.0 | Flags: | scohen:
Triaged+
|
||||||
Hardware: | x86_64 | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | is2 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 971883 (view as bug list) | Environment: | |||||||
Last Closed: | 2014-01-21 22:15:30 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 987783 | ||||||||
Bug Blocks: | 971883 | ||||||||
Attachments: |
|
*** Bug 965681 has been marked as a duplicate of this bug. *** same issue happens when I restart vdsm right when FINISH of CloneImageGroupStructureVDSCommand is logged. in this case we do not remove the disks from either the target or the des but it means that we cannot move the disks to that domain (volume already exists error) and need GSS intervention. we also cannot be 100% sure that the correct disks will be cleaned by GSS (so if by accident the original disks will be remove we will be left with empty lv's since syncData has not run yet). attaching the logs from the CloneImageGroupStructureVDSCommand test as well. and here is the lvs output in from the hosts: vm created: [root@cougar01 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 1.00g metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m 4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 1.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g ids 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m inbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m leases 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 2.00g master 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 1.00g metadata 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m outbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m lv_root vg0 -wi-ao--- 449.82g lv_swap vg0 -wi-ao--- 15.74g after CloneImageGroupStructureVDSCommand: [root@cougar01 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 0d5899ae-490a-4015-926d-d523138050c6 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 1.00g 4ddf8f4f-5fdc-46e2-9885-2016b67e0192 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 1.00g metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m 0d5899ae-490a-4015-926d-d523138050c6 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 1.00g 4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g ids 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m inbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m leases 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 2.00g master 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 1.00g metadata 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m outbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m lv_root vg0 -wi-ao--- 449.82g lv_swap vg0 -wi-ao--- 15.74g after failure of move: [root@cougar01 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert 0d5899ae-490a-4015-926d-d523138050c6 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 1.00g 4ddf8f4f-5fdc-46e2-9885-2016b67e0192 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 1.00g metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m 0d5899ae-490a-4015-926d-d523138050c6 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 1.00g 4ddf8f4f-5fdc-46e2-9885-2016b67e0192 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g ids 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m inbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m leases 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 2.00g master 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 1.00g metadata 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m outbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m lv_root vg0 -wi-ao--- 449.82g lv_swap vg0 -wi-ao--- 15.74g [root@cougar01 ~]# vdsClient -s 0 getStorageDomainInfo 38755249-4bb3-4841-bf5b-05f4a521514d uuid = 38755249-4bb3-4841-bf5b-05f4a521514d vguuid = 4Ceo5k-vMud-sVKL-blqX-JwKo-ETN6-8ao4L9 lver = -1 state = OK version = 3 role = Master pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb'] spm_id = -1 type = ISCSI class = Data master_ver = 0 name = Dafna-32-03 [root@cougar01 ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8 uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8 vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l lver = 43 state = OK version = 3 role = Master pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb'] spm_id = 1 type = ISCSI class = Data master_ver = 3697 name = Dafna-32-02 Created attachment 752056 [details]
CloneImageGroupStructureVDSCommand logs
*** Bug 967600 has been marked as a duplicate of this bug. *** Can’t be verified, this bug depend on BZ987460, tested on RHEVM 3.3 - IS6 environment. Tested on FCP Data Center Verified, tested on RHEVM 3.3 - IS10 environment: RHEVM: rhevm-3.3.0-0.15.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.10-1.el6ev.noarch VDSM: vdsm-4.12.0-61.git8178ec2.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released |
Created attachment 751191 [details] logs Description of problem: vm is running on hsm, I restarted the vdsm right when syncData task (part of live storage migration) has completed. vm's disk remain locked, engine has endless ArrayIndexOutOfBoundsException: -1 exception + event log in loop: User admin@internal finished moving disk NFS-RHEL6_iSCSI_Disk1 to domain Dafna-32-03. and after I shut down the vm we are getting failed event log errors in a loop: Version-Release number of selected component (if applicable): User admin@internal have failed to move disk NFS-RHEL6_iSCSI_Disk1 to domain Dafna-32-03. How reproducible: 100% Steps to Reproduce: 1. in iscsi storage with two hosts, create a vm from template 2. run the vm on hsm 3. start a live storage migration on the vm disk 4. restart vdsm on the spm when syncData task has change state to completed successfully in SPM Actual results: vm is stuck in image locked and we are getting: java.lang.ArrayIndexOutOfBoundsException: -1 event log runs events in loops Expected results: we should be able to recover from the vdsm restart Additional info: logs after vdsm restart: 2013-05-21 16:43:19,170 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-49) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:26) [engine-bll.jar:] at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:107) [engine-bll.jar:] at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_19] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_19] at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_19] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_19]