Bug 962497
| Summary: | engine: engine restart during DeleteImage step will cause reporting issues and ArrayIndexOutOfBoundsException: -1 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||
| Component: | ovirt-engine | Assignee: | Daniel Erez <derez> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | yeylon <yeylon> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.2.0 | CC: | abaron, acathrow, amureini, cpelland, derez, dron, iheim, jkt, lpeer, Rhev-m-bugs, scohen, srevivo, yeylon | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | 3.3.0 | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | storage | ||||||||
| Fixed In Version: | is2 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 971874 (view as bug list) | Environment: | |||||||
| Last Closed: | 2014-01-21 22:13:12 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 987783 | ||||||||
| Bug Blocks: | 971874 | ||||||||
| Attachments: |
|
||||||||
Dafna, aside from log messages, what's the impact? when you say endless loop, does it indeed go on forever? how often? what happens after engine restart? how long did you wait? (In reply to Sean Cohen from comment #1) > Dafna, aside from log messages, what's the impact? the event log will report the task as succeeded and than failed (when we shut down the vm) until we restart engine > when you say endless loop, does it indeed go on forever? until we restart engine > how often? every 5 seconds (the log will report the exception and each time an event will appear) > what happens after engine restart? the events disappear and engine sends deleteImage to the destination domain although db already changed the mapping to the destination (this will cause a mismatch in the disks and the vm will not run until gss will fix the mapping in db or move the disks between the storages) > how long did you wait? it does not matter, if we cannot clean the task it will run forever. but I waited at least 5 minutes before restarting engine to clear the task. as I said in the comment above we have a mismatch between the db and vdsm on the location of the disks.
this issue is caused by engine because when we get the exception we will report the dataSync as successful and update the db with the new location, however, when we shut down the vm the alerts change to failure to move the disk and when we restart the engine to clean the task the DeleteImage is sent which causes the wrong disks to be removed.
I will attach the full logs
vdsm:
[root@cougar02 ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
3eb0913d-e032-4fa5-8026-745e7c7fee74 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 3.00g
8e93db93-fb9d-4566-aa02-f63dd6b77aa8 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 1.00g
f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g
ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m
inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g
master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 1.00g
metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m
outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
3eb0913d-e032-4fa5-8026-745e7c7fee74 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 3.00g
8e93db93-fb9d-4566-aa02-f63dd6b77aa8 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 1.00g
f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g
ids 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m
inbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m
leases 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 2.00g
master 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 1.00g
metadata 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m
outbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m
lv_root vg0 -wi-ao--- 457.71g
lv_swap vg0 -wi-ao--- 7.85g
[root@cougar02 ~]# lvs
LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert
f82a0d58-0791-4137-b1e6-22a8794acd2a 38755249-4bb3-4841-bf5b-05f4a521514d -wi------ 2.00g
ids 38755249-4bb3-4841-bf5b-05f4a521514d -wi-ao--- 128.00m
inbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
leases 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 2.00g
master 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 1.00g
metadata 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 512.00m
outbox 38755249-4bb3-4841-bf5b-05f4a521514d -wi-a---- 128.00m
3eb0913d-e032-4fa5-8026-745e7c7fee74 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 3.00g
8e93db93-fb9d-4566-aa02-f63dd6b77aa8 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 1.00g
f82a0d58-0791-4137-b1e6-22a8794acd2a 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi------ 2.00g
ids 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 128.00m
inbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m
leases 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 2.00g
master 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-ao--- 1.00g
metadata 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 512.00m
outbox 81ef11d0-4c0c-47b4-8953-d61a6af442d8 -wi-a---- 128.00m
lv_root vg0 -wi-ao--- 457.71g
lv_swap vg0 -wi-ao--- 7.85g
[root@cougar02 ~]# vdsClient -s 0 getStorageDomainInfo 38755249-4bb3-4841-bf5b-05f4a521514d
uuid = 38755249-4bb3-4841-bf5b-05f4a521514d
vguuid = 4Ceo5k-vMud-sVKL-blqX-JwKo-ETN6-8ao4L9
lver = -1
state = OK
version = 3
role = Master
pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
spm_id = -1
type = ISCSI
class = Data
master_ver = 0
name = Dafna-32-03
[root@cougar02 ~]# vdsClient -s 0 getStorageDomainInfo 81ef11d0-4c0c-47b4-8953-d61a6af442d8
uuid = 81ef11d0-4c0c-47b4-8953-d61a6af442d8
vguuid = Eq3hKP-LB4o-CSHb-U91Z-JKKN-4eQV-Pecf1l
lver = 6
state = OK
version = 3
role = Master
pool = ['7fd33b43-a9f4-4eb7-a885-e9583a929ceb']
spm_id = 2
type = ISCSI
class = Data
master_ver = 3697
name = Dafna-32-02
cougar02 (new spm after restart):
Thread-484::INFO::2013-05-21 17:40:14,549::logUtils::40::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='38755249-4bb3-4841-bf5b-05f4a521514d', spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', imgUUID='39194824-8601-4b3e-8d37-4
bddc71dfe68', postZero='false', force='false')
engine:
create of volume:
45e7c7fee74, newImageDescription = , imageId = 8e93db93-fb9d-4566-aa02-f63dd6b77aa8, sourceImageGroupId = 39194824-8601-4b3e-8d37-4bddc71dfe68), log id: 5d433107
2013-05-21 17:36:01,391 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-47) [75b18eea] -- CreateSnapshotVDSCommand::ExecuteIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID
2013-05-21 17:36:01,391 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-47) [75b18eea] -- createVolume parameters:
sdUUID=81ef11d0-4c0c-47b4-8953-d61a6af442d8
spUUID=7fd33b43-a9f4-4eb7-a885-e9583a929ceb
imgGUID=39194824-8601-4b3e-8d37-4bddc71dfe68
size=16,106,127,360 bytes
volFormat=COW
volType=Sparse
volUUID=3eb0913d-e032-4fa5-8026-745e7c7fee74
descr=
srcImgGUID=39194824-8601-4b3e-8d37-4bddc71dfe68
srcVolUUID=8e93db93-fb9d-4566-aa02-f63dd6b77aa8
Exception:
2013-05-21 17:37:54,819 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-47) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2013-05-21 17:37:54,841 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-47) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1
at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3]
at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view9.endAction(Unknown Source) [engine-bll.jar:]
at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:]
delete of image group:
2013-05-21 17:40:10,299 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-4-thread-48) [4644a8d1] Reverting task deleteImage, handler: org.ovirt.engine.core.bll.lsm.CreateImagePlaceholderTaskHandler
2013-05-21 17:40:10,324 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-4-thread-48) [4644a8d1] START, DeleteImageGroupVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, imageGroupId = 39194824-8601-4b3e-8d37-4bddc71dfe68, postZeros = false, forceDelete = false), log id: 4bbb6f76
2013-05-21 17:40:12,476 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-4-thread-48) [4644a8d1] FINISH, DeleteImageGroupVDSCommand, log id: 4bbb6f76
2013-05-21 17:40:12,497 INFO [org.ovirt.engine.core.bll.EntityMultiAsyncTasks] (pool-4-thread-48) [4644a8d1] EntityMultiAsyncTasks::AttachTask: Attaching task f5e11442-843c-448c-ae32-85707457cd9e to entity 8e93db93-fb9d-4566-aa02-f63dd6b77aa8.
Created attachment 751711 [details]
logs
Can’t be verified, this bug depend on BZ987460, tested on RHEVM 3.3 - IS6 environment. Tested on FCP Data Center Verified, tested on RHEVM 3.3 - IS10 environment: RHEVM: rhevm-3.3.0-0.15.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.10-1.el6ev.noarch VDSM: vdsm-4.12.0-61.git8178ec2.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-18.el6_4.9.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.5.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 Closing - RHEV 3.3 Released Closing - RHEV 3.3 Released |
Created attachment 747297 [details] logs Description of problem: I restarted ovirt-engine service right when the DeleteImage step of Live Storage migration is sent. we have the following issues: 1. ArrayIndexOutOfBoundsException: -1 which repeats in endless loop 2. we report to the user that we finsihed the move (also in loop) 3. once I stopped the vm we report that the same action failed in endless loop. 4. the syncData task is not cleared. since I think all of these issues are caused by the OutOfBounds error I am opening one bug for all. Version-Release number of selected component (if applicable): sf16 How reproducible: 100% Steps to Reproduce: 1. on iscsi storage, start a vm with one disk on two hosts (my vm was running on the hsm) 2. live migrate the vms disk 3. right when the DeleteImage for the source disk is sent, restart ovirt-engine Actual results: 1. ArrayIndexOutOfBoundsException: -1 which repeats in endless loop 2. we report to the user that we finsihed the move (also in loop) 3. once I stopped the vm we report that the same action failed in endless loop. 4. the syncData task is not cleared Expected results: if we succeeded we should report the success and clear all tasks Additional info: full logs [root@cougar01 ~]# vdsClient -s 0 getAllTasksInfo 28f4c0eb-8f3c-4728-bb60-b6f408601137 : verb = syncImageData id = 28f4c0eb-8f3c-4728-bb60-b6f408601137 root@cougar01 ~]# vdsClient -s 0 getAllTasksStatuses {'status': {'message': 'OK', 'code': 0}, 'allTasksStatus': {'28f4c0eb-8f3c-4728-bb60-b6f408601137': {'message': '1 jobs completed successfully', 'code': 0, 'taskID': '28f4c0eb-8f3c-4728-bb60-b6f408601137', 'taskResult': 'success', 'taskState': 'finished'}}} 2013-05-13 19:24:15,271 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-48) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand 2013-05-13 19:24:15,323 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-48) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.lang.ArrayIndexOutOfBoundsException: -1 at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2] at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-3] at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:]