Bug 972696 - engine: after we fail to LSM a disk for a vm in pause state engine fails to clean cloneImageStructure task with ArrayIndexOutOfBoundsException: -1 (can't migrate disks because of orphan images on target domain)
Summary: engine: after we fail to LSM a disk for a vm in pause state engine fails to c...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.3.0
Assignee: Daniel Erez
QA Contact: Aharon Canan
URL:
Whiteboard: storage
: 970974 (view as bug list)
Depends On: 966618
Blocks: 902971
TreeView+ depends on / blocked
 
Reported: 2013-06-10 12:25 UTC by Idith Tal-Kohen
Modified: 2018-12-04 15:30 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, an exception was left in the engine logs when the engine failed to clean cloneImageStructure tasks with ArrayIndexOutOfBoundsException. Such exceptions are now cleaned from the engine logs. (Note: the images associated with such exceptions are still left on the target domains. Cleanup of target domains in such situations is expected to be implemented in a future version of Red Hat Enterprise Virtualization Manager.)
Clone Of: 966618
Environment:
Last Closed: 2013-07-16 13:41:06 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdms log (11.33 MB, text/plain)
2013-06-27 14:37 UTC, Aharon Canan
no flags Details
engine log (1.44 MB, text/plain)
2013-06-27 14:40 UTC, Aharon Canan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1048 0 normal SHIPPED_LIVE rhevm 3.2.1 bug fix update 2013-07-19 05:45:47 UTC
oVirt gerrit 15133 0 None None None Never

Description Idith Tal-Kohen 2013-06-10 12:25:31 UTC
+++ This bug was initially created as a clone of Bug #966618 +++

Description of problem:

after we failed to move a vm in pause state the cloneImageStructure cannot be cleaned without manual intervention by gss (as in stopTask/clearTask in vds and restart of engine). 
if the user will try to start the vm and LSM the disk again we will get volume already exists error. 

Version-Release number of selected component (if applicable):

sf17.1

How reproducible:

100%

Steps to Reproduce:
1. in iscsi storage with two hosts, create and run a vm with run-once as paused on the hsm
2. try to live migrate the vm
3. manually clean the task from spm and restart engine
4. start the vm
5. try to migrate the disk

Actual results:

engine fails to clean the task which causes lv's to remain in the target domain and we cannot move the images to the new domain without gss involvement.
 
Expected results:

even if we fail to LSM we should still be able to clear the task and roll back so that the user can migrate again. 

Additional info: logs


ab5d4053-b881-4503-9ba0-7427b2514801::ERROR::2013-05-23 17:36:09,641::task::850::TaskManager.Task::(_setError) Task=`ab5d4053-b881-4503-9ba0-7427b2514801`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1773, in cloneImageStructure
    image.Image(repoPath).cloneStructure(sdUUID, imgUUID, dstSdUUID)
  File "/usr/share/vdsm/storage/image.py", line 649, in cloneStructure
    self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/blockSD.py", line 610, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 418, in create
    raise se.VolumeAlreadyExists(volUUID)
VolumeAlreadyExists: Volume already exists: ('30e4d88e-e807-4fb9-9b41-39c988c338ad',)
ab5d4053-b881-4503-9ba0-7427b2514801::DEBUG::2013-05-23 17:36:09,642::task::869::TaskManager.Task::(_run) Task=`ab5d4053-b881-4503-9ba0-7427b2514801`::Task._run: ab5d4053-b881-4503-9ba0-7427b2514801 () {} failed - stopping task


engine: 

2013-05-23 17:28:32,019 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-38) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2013-05-23 17:28:32,033 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-38) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.l
ang.ArrayIndexOutOfBoundsException: -1
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:26) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:107) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_19]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_19]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_19]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_19]
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1

Comment 3 Aharon Canan 2013-06-27 14:37:51 UTC
Created attachment 766168 [details]
vdms log

Comment 4 Aharon Canan 2013-06-27 14:40:00 UTC
Created attachment 766169 [details]
engine log

reproduce using sf18.2

logs attached

from logs - 
468f6937-14cd-47b7-892b-406c46f167c1::ERROR::2013-06-27 17:23:53,557::task::850::TaskManager.Task::(_setError) Task=`468f6937-14cd-47b7-892b-406c46f167c1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1773, in cloneImageStructure
    image.Image(repoPath).cloneStructure(sdUUID, imgUUID, dstSdUUID)
  File "/usr/share/vdsm/storage/image.py", line 649, in cloneStructure
    self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/blockSD.py", line 609, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 418, in create
    raise se.VolumeAlreadyExists(volUUID)
VolumeAlreadyExists: Volume already exists: ('3a31b139-b457-4989-941e-0fd99d5021e7',)
468f6937-14cd-47b7-892b-406c46f167c1::DEBUG::2013-06-27 17:23:53,558::task::869::TaskManager.Task::(_run) Task=`468f6937-14cd-47b7-892b-406c46f167c1`::Task._run: 468f6937-14cd-47b7-892b-406c46f167c1 () {} failed - stopping task

Comment 5 Ayal Baron 2013-07-01 09:49:22 UTC
The original issue was fixed, the remaining issue is that vdsm does not clean up the leftovers of the disk on the target domain.
Need to reproduce and understand under what conditions this happens.

Comment 8 Allon Mureinik 2013-07-08 12:13:04 UTC
*** Bug 970974 has been marked as a duplicate of this bug. ***

Comment 10 errata-xmlrpc 2013-07-16 13:41:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1048.html


Note You need to log in before you can comment on or make changes to this bug.