Bug 972696 - engine: after we fail to LSM a disk for a vm in pause state engine fails to clean cloneImageStructure task with ArrayIndexOutOfBoundsException: -1 (can't migrate disks because of orphan images on target domain)
engine: after we fail to LSM a disk for a vm in pause state engine fails to c...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.3.0
Assigned To: Daniel Erez
Aharon Canan
storage
: Triaged, ZStream
: 970974 (view as bug list)
Depends On: 966618
Blocks: 902971
  Show dependency treegraph
 
Reported: 2013-06-10 08:25 EDT by Idith Tal-Kohen
Modified: 2016-02-10 14:30 EST (History)
18 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, an exception was left in the engine logs when the engine failed to clean cloneImageStructure tasks with ArrayIndexOutOfBoundsException. Such exceptions are now cleaned from the engine logs. (Note: the images associated with such exceptions are still left on the target domains. Cleanup of target domains in such situations is expected to be implemented in a future version of Red Hat Enterprise Virtualization Manager.)
Story Points: ---
Clone Of: 966618
Environment:
Last Closed: 2013-07-16 09:41:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vdms log (11.33 MB, text/plain)
2013-06-27 10:37 EDT, Aharon Canan
no flags Details
engine log (1.44 MB, text/plain)
2013-06-27 10:40 EDT, Aharon Canan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 15133 None None None Never

  None (edit)
Description Idith Tal-Kohen 2013-06-10 08:25:31 EDT
+++ This bug was initially created as a clone of Bug #966618 +++

Description of problem:

after we failed to move a vm in pause state the cloneImageStructure cannot be cleaned without manual intervention by gss (as in stopTask/clearTask in vds and restart of engine). 
if the user will try to start the vm and LSM the disk again we will get volume already exists error. 

Version-Release number of selected component (if applicable):

sf17.1

How reproducible:

100%

Steps to Reproduce:
1. in iscsi storage with two hosts, create and run a vm with run-once as paused on the hsm
2. try to live migrate the vm
3. manually clean the task from spm and restart engine
4. start the vm
5. try to migrate the disk

Actual results:

engine fails to clean the task which causes lv's to remain in the target domain and we cannot move the images to the new domain without gss involvement.
 
Expected results:

even if we fail to LSM we should still be able to clear the task and roll back so that the user can migrate again. 

Additional info: logs


ab5d4053-b881-4503-9ba0-7427b2514801::ERROR::2013-05-23 17:36:09,641::task::850::TaskManager.Task::(_setError) Task=`ab5d4053-b881-4503-9ba0-7427b2514801`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1773, in cloneImageStructure
    image.Image(repoPath).cloneStructure(sdUUID, imgUUID, dstSdUUID)
  File "/usr/share/vdsm/storage/image.py", line 649, in cloneStructure
    self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/blockSD.py", line 610, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 418, in create
    raise se.VolumeAlreadyExists(volUUID)
VolumeAlreadyExists: Volume already exists: ('30e4d88e-e807-4fb9-9b41-39c988c338ad',)
ab5d4053-b881-4503-9ba0-7427b2514801::DEBUG::2013-05-23 17:36:09,642::task::869::TaskManager.Task::(_run) Task=`ab5d4053-b881-4503-9ba0-7427b2514801`::Task._run: ab5d4053-b881-4503-9ba0-7427b2514801 () {} failed - stopping task


engine: 

2013-05-23 17:28:32,019 WARN  [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (pool-4-thread-38) Unable to get value of property: vds for class org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand
2013-05-23 17:28:32,033 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-4-thread-38) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: javax.ejb.EJBException: java.l
ang.ArrayIndexOutOfBoundsException: -1
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.handleExceptionInNoTx(CMTTxInterceptor.java:191) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.invokeInNoTx(CMTTxInterceptor.java:237) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.supports(CMTTxInterceptor.java:374) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ejb3.tx.CMTTxInterceptor.processInvocation(CMTTxInterceptor.java:218) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.CurrentInvocationContextInterceptor.processInvocation(CurrentInvocationContextInterceptor.java:41) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.ShutDownInterceptorFactory$1.processInvocation(ShutDownInterceptorFactory.java:64) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ejb3.component.interceptors.LoggingInterceptor.processInvocation(LoggingInterceptor.java:59) [jboss-as-ejb3.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.NamespaceContextInterceptor.processInvocation(NamespaceContextInterceptor.java:50) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.TCCLInterceptor.processInvocation(TCCLInterceptor.java:45) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ViewService$View.invoke(ViewService.java:165) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.as.ee.component.ViewDescription$1.processInvocation(ViewDescription.java:182) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.ChainedInterceptor.processInvocation(ChainedInterceptor.java:61) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.as.ee.component.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:72) [jboss-as-ee.jar:7.2.0.Final-redhat-8]
        at org.ovirt.engine.core.bll.interfaces.BackendInternal$$$view8.endAction(Unknown Source) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.EndCommandAction(EntityAsyncTask.java:147) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask.access$000(EntityAsyncTask.java:26) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.EntityAsyncTask$1.run(EntityAsyncTask.java:107) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.threadpool.ThreadPoolUtil$InternalWrapperRunnable.run(ThreadPoolUtil.java:71) [engine-utils.jar:]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_19]
        at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_19]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_19]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_19]
        at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_19]
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
Comment 3 Aharon Canan 2013-06-27 10:37:51 EDT
Created attachment 766168 [details]
vdms log
Comment 4 Aharon Canan 2013-06-27 10:40:00 EDT
Created attachment 766169 [details]
engine log

reproduce using sf18.2

logs attached

from logs - 
468f6937-14cd-47b7-892b-406c46f167c1::ERROR::2013-06-27 17:23:53,557::task::850::TaskManager.Task::(_setError) Task=`468f6937-14cd-47b7-892b-406c46f167c1`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1773, in cloneImageStructure
    image.Image(repoPath).cloneStructure(sdUUID, imgUUID, dstSdUUID)
  File "/usr/share/vdsm/storage/image.py", line 649, in cloneStructure
    self._createTargetImage(sdCache.produce(dstSdUUID), sdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 517, in _createTargetImage
    srcVolUUID=volParams['parent'])
  File "/usr/share/vdsm/storage/blockSD.py", line 609, in createVolume
    volUUID, desc, srcImgUUID, srcVolUUID)
  File "/usr/share/vdsm/storage/volume.py", line 418, in create
    raise se.VolumeAlreadyExists(volUUID)
VolumeAlreadyExists: Volume already exists: ('3a31b139-b457-4989-941e-0fd99d5021e7',)
468f6937-14cd-47b7-892b-406c46f167c1::DEBUG::2013-06-27 17:23:53,558::task::869::TaskManager.Task::(_run) Task=`468f6937-14cd-47b7-892b-406c46f167c1`::Task._run: 468f6937-14cd-47b7-892b-406c46f167c1 () {} failed - stopping task
Comment 5 Ayal Baron 2013-07-01 05:49:22 EDT
The original issue was fixed, the remaining issue is that vdsm does not clean up the leftovers of the disk on the target domain.
Need to reproduce and understand under what conditions this happens.
Comment 8 Allon Mureinik 2013-07-08 08:13:04 EDT
*** Bug 970974 has been marked as a duplicate of this bug. ***
Comment 10 errata-xmlrpc 2013-07-16 09:41:06 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1048.html

Note You need to log in before you can comment on or make changes to this bug.