Bug 879167

Summary: engine [Live Storage Migration]: cannot run vm, create template or export a vm after live storage migration failure
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: urgent Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, dyasny, ewarszaw, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf3 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 915354    
Bug Blocks: 915537    
Attachments:
Description Flags
logs from failure
none
image does not exist logs
none
logs and dbdump
none
new logs from 3.2 none

Description Dafna Ron 2012-11-22 09:08:13 UTC
Created attachment 649588 [details]
logs from failure

Description of problem:

after live storage migration failure I stopped a my vm. 
trying to re-run it again, create template from it or export it will give error that the image does not exist. 

Version-Release number of selected component (if applicable):

si24.4

How reproducible:

100%

Steps to Reproduce:
1. run several vm's and move their disks
2. after live migration fails stop the vms and try to re-run them
3.
  
Actual results:

engine rolled back on love storage migration after getting error from vdsm that it failed to remove logical volume as a result when we shut down the vm and try to run it, create template or export engine will send wrong vg uuid to vdsm. 

Expected results:

engine should not rollback on every step in live storage migration. 

Additional info:logs from original failure + logs after sutting down the vm. 

failure to delete the volume: 

Thread-17922::ERROR::2012-11-21 17:05:04,056::task::853::TaskManager.Task::(_setError) Task=`c07ee9e7-fff6-4d00-96d1-b88f36cd36de`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1349, in deleteImage
    dom.deleteImage(sdUUID, imgUUID, volsByImg)
  File "/usr/share/vdsm/storage/blockSD.py", line 945, in deleteImage
    deleteVolumes(sdUUID, toDel)
  File "/usr/share/vdsm/storage/blockSD.py", line 177, in deleteVolumes
    lvm.removeLVs(sdUUID, vols)
  File "/usr/share/vdsm/storage/lvm.py", line 1010, in removeLVs
    raise se.CannotRemoveLogicalVolume(vgName, str(lvNames))
CannotRemoveLogicalVolume: Cannot remove Logical Volume: ('d40978c8-3fab-483b-b786-2f1e1c5cf130', "('34ff2273-e1cd-41b9-9c30-61defdc85948', '98d1cf94-5e59-4f85-8696-698b0269e347')")


image does not exist error: 

7b870179-d4e8-488c-9b6e-a99b0a1a2fc5::ERROR::2012-11-22 10:39:09,989::task::853::TaskManager.Task::(_setError) Task=`7b870179-d4e8-488c-9b6e-a99b0a1a2fc5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 320, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1741, in moveImage
    image.Image(repoPath).move(srcDomUUID, dstDomUUID, imgUUID, vmUUID, op, postZero, force)
  File "/usr/share/vdsm/storage/image.py", line 635, in move
    chains = self._createTargetImage(destDom, srcSdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 484, in _createTargetImage
    srcChain = self.getChain(srcSdUUID, imgUUID)
  File "/usr/share/vdsm/storage/image.py", line 314, in getChain
    raise se.ImageDoesNotExistInSD(imgUUID, sdUUID)
ImageDoesNotExistInSD: Image does not exist in domain: 'image=270835d7-b3bb-4e1c-a34d-f09d0538affd, domain=8c0ef67f-03c1-4fbf-b099-3e3668405cfc'

engine is sending sdUUID a5f10bab-bd9d-4834-b1d9-b29d0ec887dc

2012-11-22 10:19:48,246 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (ajp-/127.0.0.1:8702-5) [71667ddd] -- copyImage parameters:
                sdUUID=a5f10bab-bd9d-4834-b1d9-b29d0ec887dc
                spUUID=edf0ee04-0cc2-4e13-877d-1e89541aea55
                vmGUID=818ebfe3-c74c-4230-a272-8287463f77e8
                srcImageGUID=8e2e185f-6789-4b99-b684-079990ada9a5
                srcVolUUID=6f5e0d04-1d04-4573-b3ab-37d1b3e79387
                dstImageGUID=e70899b1-fe8d-4f8c-98ec-d79e2e070885
                dstVolUUID=d46b8d44-3131-4ccb-ba5f-37c165fe9357
                descr=Auto-generated for Live Storage Migration of NFS-RHEL6_iSCSI_Disk1
                

lv is under vg d40978c8-3fab-483b-b786-2f1e1c5cf130:

root@gold-vdsc ~]# lvs |grep 98d1cf94-5e59-4f85-8696-698b0269e347
  98d1cf94-5e59-4f85-8696-698b0269e347 d40978c8-3fab-483b-b786-2f1e1c5cf130 -wi-a---   2.00g

Comment 1 Dafna Ron 2012-11-22 09:10:16 UTC
Created attachment 649592 [details]
image does not exist logs

Comment 2 Eduardo Warszawski 2012-11-23 07:38:08 UTC
The removal of the src image failed since the LVs were still open.
Afterwards the destination image was succefully removed.
Engine still look for the image at the destination.


Thread-16680::INFO::2012-11-21 16:44:45,322::logUtils::37::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', size='16106127360', volFormat=4, preallocate=2, diskType=2, volUUID='34ff2273-e1cd-41b9-9c30-61defdc85948', desc='', srcImgUUID='a8eb3963-520a-435d-aba4-80a0dd2d9983', srcVolUUID='39f89a6a-7fbb-43c0-a5ea-19b271f51829')
Thread-17083::INFO::2012-11-21 16:51:17,001::logUtils::37::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', volUUID='34ff2273-e1cd-41b9-9c30-61defdc85948')
Thread-17083::INFO::2012-11-21 16:51:17,419::logUtils::39::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/34ff2273-e1cd-41b9-9c30-61defdc85948', 'chain': [{'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'domainID': 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'volumeID': '39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}, {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/34ff2273-e1cd-41b9-9c30-61defdc85948', 'domainID': 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'volumeID': '34ff2273-e1cd-41b9-9c30-61defdc85948', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}]}
Thread-17595::INFO::2012-11-21 17:00:27,241::logUtils::37::dispatcher::(wrapper) Run and protect: createVolume(sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', size='16106127360', volFormat=4, preallocate=2, diskType=2, volUUID='98d1cf94-5e59-4f85-8696-698b0269e347', desc='', srcImgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', srcVolUUID='34ff2273-e1cd-41b9-9c30-61defdc85948')
Thread-17639::INFO::2012-11-21 17:00:56,601::logUtils::37::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', volUUID='98d1cf94-5e59-4f85-8696-698b0269e347')
Thread-17639::INFO::2012-11-21 17:00:58,525::logUtils::39::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/98d1cf94-5e59-4f85-8696-698b0269e347', 'chain': [{'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'domainID': 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'volumeID': '39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}, {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/34ff2273-e1cd-41b9-9c30-61defdc85948', 'domainID': 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'volumeID': '34ff2273-e1cd-41b9-9c30-61defdc85948', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}, {'path': '/rhev/
data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/d40978c8-3fab-483b-b786-2f1e1c5cf130/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/98d1cf94-5e59-4f85-8696-698b0269e347', 'domainID': 'd40978c8-3fab-483b-b786-2f1e1c5cf130', 'volumeID': '98d1cf94-5e59-4f85-8696-698b0269e347', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}]}
Thread-17662::INFO::2012-11-21 17:01:26,372::logUtils::37::dispatcher::(wrapper) Run and protect: cloneImageStructure(spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', dstSdUUID='8c0ef67f-03c1-4fbf-b099-3e3668405cfc')
Thread-17705::INFO::2012-11-21 17:02:01,007::logUtils::37::dispatcher::(wrapper) Run and protect: prepareImage(sdUUID='8c0ef67f-03c1-4fbf-b099-3e3668405cfc', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', volUUID='98d1cf94-5e59-4f85-8696-698b0269e347')
Thread-17705::INFO::2012-11-21 17:02:01,731::logUtils::39::dispatcher::(wrapper) Run and protect: prepareImage, Return response: {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/8c0ef67f-03c1-4fbf-b099-3e3668405cfc/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/98d1cf94-5e59-4f85-8696-698b0269e347', 'chain': [{'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/8c0ef67f-03c1-4fbf-b099-3e3668405cfc/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'domainID': '8c0ef67f-03c1-4fbf-b099-3e3668405cfc', 'volumeID': '39f89a6a-7fbb-43c0-a5ea-19b271f51829', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}, {'path': '/rhev/data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/8c0ef67f-03c1-4fbf-b099-3e3668405cfc/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/34ff2273-e1cd-41b9-9c30-61defdc85948', 'domainID': '8c0ef67f-03c1-4fbf-b099-3e3668405cfc', 'volumeID': '34ff2273-e1cd-41b9-9c30-61defdc85948', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}, {'path': '/rhev/
data-center/edf0ee04-0cc2-4e13-877d-1e89541aea55/8c0ef67f-03c1-4fbf-b099-3e3668405cfc/images/270835d7-b3bb-4e1c-a34d-f09d0538affd/98d1cf94-5e59-4f85-8696-698b0269e347', 'domainID': '8c0ef67f-03c1-4fbf-b099-3e3668405cfc', 'volumeID': '98d1cf94-5e59-4f85-8696-698b0269e347', 'imageID': '270835d7-b3bb-4e1c-a34d-f09d0538affd'}]}
Thread-17705::INFO::2012-11-21 17:02:31,017::logUtils::37::dispatcher::(wrapper) Run and protect: teardownImage(sdUUID='8c0ef67f-03c1-4fbf-b099-3e3668405cfc', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', volUUID=None)
Thread-17722::INFO::2012-11-21 17:02:32,872::logUtils::37::dispatcher::(wrapper) Run and protect: syncImageData(spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', dstSdUUID='8c0ef67f-03c1-4fbf-b099-3e3668405cfc', syncType='INTERNAL')
Thread-17922::INFO::2012-11-21 17:04:52,415::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='d40978c8-3fab-483b-b786-2f1e1c5cf130', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', postZero='false', force='false')
Thread-17939::INFO::2012-11-21 17:05:04,896::logUtils::37::dispatcher::(wrapper) Run and protect: deleteImage(sdUUID='8c0ef67f-03c1-4fbf-b099-3e3668405cfc', spUUID='edf0ee04-0cc2-4e13-877d-1e89541aea55', imgUUID='270835d7-b3bb-4e1c-a34d-f09d0538affd', postZero='false', force='false')

Comment 3 Daniel Erez 2012-12-31 07:27:52 UTC
patch merged:
http://gerrit.ovirt.org/#/c/10154/
Change-Id: Iadcffa5748b58b1af40535b0447487dde6c2d6cb

Comment 6 Dafna Ron 2013-01-20 17:20:57 UTC
tested on sf3 with vdsm-4.10.2-3.0.el6ev.x86_64

I live migrated two disks of the same vm and after the move started I added disks to a second vm on src domain so that we will have low disk space. 

we failed to delete volumes on one of the disks: 

2013-01-20 17:20:05,964 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-3-thread-40) [3e493079] START, DeleteImageGroupVDSCommand( storagePoolId = afcde1c5-6022-4077-ab06-2beed7e5e404, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = 8bcf7e0d-a418-4210-a79d-8a7888a26c5c, imageGroupId = d4e4d029-d9c5-47ee-8df9-06b70e2536f4, postZeros = false, forceDelete = false), log id: 5a354ebc
2013-01-20 17:20:12,316 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-40) [3e493079] Failed in DeleteImageGroupVDS method
2013-01-20 17:20:12,316 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-40) [3e493079] Error code CannotRemoveLogicalVolume and error message IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Cannot remove Logical Volume: ('8bcf7e0d-a418-4210-a79d-8a7888a26c5c', "('f4d19fed-171f-46d1-a403-8d9736a6c280', 'adcc64f2-f4d7-4e4e-8a96-ee90f592e217')")
2013-01-20 17:20:12,316 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (pool-3-thread-40) [3e493079] IrsBroker::Failed::DeleteImageGroupVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Cannot remove Logical Volume: ('8bcf7e0d-a418-4210-a79d-8a7888a26c5c', "('f4d19fed-171f-46d1-a403-8d9736a6c280', 'adcc64f2-f4d7-4e4e-8a96-ee90f592e217')")
2013-01-20 17:20:12,366 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-3-thread-40) [3e493079] FINISH, DeleteImageGroupVDSCommand, log id: 5a354ebc
2013-01-20 17:20:12,681 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-40) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Cannot remove Logical Volume: ('8bcf7e0d-a418-4210-a79d-8a7888a26c5c', "('f4d19fed-171f-46d1-a403-8d9736a6c280', 'adcc64f2-f4d7-4e4e-8a96-ee90f592e217')")


we also have exception in the logs: 

2013-01-20 17:20:12,681 ERROR [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-40) EntityAsyncTask::EndCommandAction [within thread]: EndAction for action type LiveMigrateDisk threw an exception: org.ovirt.engine.core.common.errors.VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Cannot remove Logical Volume: ('8bcf7e0d-a418-4210-a79d-8a7888a26c5c', "('f4d19fed-171f-46d1-a403-8d9736a6c280', 'adcc64f2-f4d7-4e4e-8a96-ee90f592e217')")
        at org.ovirt.engine.core.bll.VdsHandler.handleVdsResult(VdsHandler.java:168) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.AbstractSPMAsyncTaskHandler.compensate(AbstractSPMAsyncTaskHandler.java:51) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.revertPreviousHandlers(CommandBase.java:595) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.internalEndWithFailure(CommandBase.java:535) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.endActionInTransactionScope(CommandBase.java:472) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1465) [engine-bll.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:166) [engine-utils.jar:]
        at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:108) [engine-utils.jar:]
        at org.ovirt.engine.core.bll.CommandBase.endAction(CommandBase.java:416) [engine-bll.jar:]
        at org.ovirt.engine.core.bll.Backend.endAction(Backend.java:376) [engine-bll.jar:]
        at sun.reflect.GeneratedMethodAccessor218.invoke(Unknown Source) [:1.7.0_09-icedtea]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_09-icedtea]
        at java.lang.reflect.Method.invoke(Method.java:601) [rt.jar:1.7.0_09-icedtea]
        at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72) [jboss-as-ee.jar:7.1.3.Final-redhat-4]
        at org.jboss.invocation.InterceptorContext.proceed(InterceptorContext.java:288) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.jboss.invocation.InterceptorContext$Invocation.proceed(InterceptorContext.java:374) [jboss-invocation.jar:1.1.1.Final-redhat-2]
        at org.ovirt.engine.core.utils.ThreadLocalSessionCleanerInterceptor.injectWebContextToThreadLocal(ThreadLocalSessionCleanerInterceptor.java:11) [engine-utils.jar:]


the UI is reporting the action as rolled back - moving back to devel.

Comment 7 Dafna Ron 2013-01-20 17:22:51 UTC
Created attachment 683755 [details]
logs and dbdump

Comment 8 Daniel Erez 2013-01-24 23:39:19 UTC
The issue described in comment #6 is during roll-back/cleanup phase (i.e. deleting the target image). The original bug fix is to prevent rollback on failure of source image deletion. Moving to ON_QA for verification.

Dafna, can you please open a seperate bug on the issue described in the comment?

Comment 9 Dafna Ron 2013-02-06 13:55:48 UTC
I tested this scenario on vdsm-4.10.2-5.0.el6ev.x86_64 with libvirt-0.10.2-18.el6.x86_64 and qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64

we seem to be hitting multiple issues with this scenario with the same result - cannot run a vm after live storage migration. 

two examples - one vm is getting the below error below after running the vm

Thread-76880::ERROR::2013-02-06 15:25:48,795::dispatcher::66::Storage.Dispatcher.Protect::(run) {'status': {'message': "Logical volume does not exist: ('04a91189-8741-4589-900b-3adbe0908d63/04512717-e9a9-4fdc-8c2c-a0adf845e09f',)", 'code': 610}}
Thread-76880::DEBUG::2013-02-06 15:25:48,797::vm::676::vm.Vm::(_startUnderlyingVm) vmId=`9f4dbd09-e734-496e-8ba7-64297082912e`::_ongoingCreations released
Thread-76880::ERROR::2013-02-06 15:25:48,800::vm::700::vm.Vm::(_startUnderlyingVm) vmId=`9f4dbd09-e734-496e-8ba7-64297082912e`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 662, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1441, in _run
    devices = self.buildConfDevices()
  File "/usr/share/vdsm/vm.py", line 499, in buildConfDevices
    self._normalizeVdsmImg(drv)
  File "/usr/share/vdsm/vm.py", line 406, in _normalizeVdsmImg
    drv['truesize'] = res['truesize']
KeyError: 'truesize'
Thread-76880::DEBUG::2013-02-06 15:25:48,883::vm::1047::vm.Vm::(setDownStatus) vmId=`9f4dbd09-e734-496e-8ba7-64297082912e`::Changed state to Down: 'truesize'

here is the failure to delete (the storage domain is a different UUID) 

Thread-74494::ERROR::2013-02-06 14:44:08,816::dispatcher::66::Storage.Dispatcher.Protect::(run) {'status': {'message': 'Cannot remove Logical Volume: (\'e7d6614c-a33b-4e9d-82b5-34bfd12d390b\', "(\'c481dc85-6289-4215-bb66-3bcf03b00460\',
 \'04512717-e9a9-4fdc-8c2c-a0adf845e09f\')")', 'code': 551}}


other vm's are giving a libvirt error about running vms with snapshot: 

https://bugzilla.redhat.com/show_bug.cgi?id=903248

since the big issue was not fixed (can't run the vm after live storage migration), and I think that we might hit several more issues with this scenario I suggest that we make this bug a tracker bug and start opening bugs for each underline issue.

Comment 10 Dafna Ron 2013-02-06 13:56:48 UTC
Created attachment 693942 [details]
new logs from 3.2

Comment 14 Ayal Baron 2013-03-04 10:09:54 UTC
Dafna, any update on this?

Comment 16 Ayal Baron 2013-03-04 12:56:24 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > Dafna, any update on this?
> 
Discussed with Haim, bug was VERIFIED with the libvirt scratch build.

Comment 17 Haim 2013-03-04 13:41:44 UTC
we will verify this bug once official libvirt build with fix will be released.
moving back to ON_QA.

Comment 18 Dafna Ron 2013-03-13 11:44:56 UTC
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64

Comment 19 Itamar Heim 2013-06-11 09:51:56 UTC
3.2 has been released

Comment 20 Itamar Heim 2013-06-11 09:52:01 UTC
3.2 has been released

Comment 21 Itamar Heim 2013-06-11 09:58:52 UTC
3.2 has been released