Created attachment 752099 [details] logs Description of problem: we do not always have cleanup when LSM failed and although we rollback on engine we leave the vm images in the target domain. if we try to re-move the images (live or offline) we fail with image/volume already exist error from vdsm which requires gss involvement. Version-Release number of selected component (if applicable): sf17.1 How reproducible: 100% Steps to Reproduce: 1. create a vm from template on iscsi storage and run it on hsm host 2. restart vdsm on spm when VmReplicateDiskFinishVDSCommand is run 3. shut down the vm 4. try to move the disk again offline Actual results: we fail in vdsm with "volume already exists" error Expected results: we should provide a way for the user to decide if they want to call gss or deal with the conflict (one example I can think of is a conflict found wizard) Additional info:
(In reply to Dafna Ron from comment #0) > Steps to Reproduce: > 1. create a vm from template on iscsi storage and run it on hsm host > 2. restart vdsm on spm when VmReplicateDiskFinishVDSCommand is run I'm not sure how the vdsm restart influenced the flow and sadly the attached vdsm logs are not related to this bug (different time or host, there's no restart). I can't imagine a reason why restarting vdsm on spm (when syncImageData finished) should affect VmReplicateDiskFinish running on another host. What I can see instead in the engine logs is VmReplicateDiskStart failing (not related to any restart): 2013-05-23 11:43:09,397 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (pool-4-thread-50) [3a9a74d9] START, VmReplicateDiskStartVDSCommand(HostName = cougar02, HostId = a4af81ea-c847-4c9c-885e-cb763d5bf9ea, vmId=bf027b25-b9f1-4c42-bdfa-d7aca45f2405), log id: d890de2 ... 2013-05-23 11:43:16,272 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Weird return value: StatusForXmlRpc [mCode=55, mMessage=Drive replication error] 2013-05-23 11:43:16,273 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Failed in VmReplicateDiskStartVDS method 2013-05-23 11:43:16,274 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskStartVDS, error = Drive replication error 2013-05-23 11:43:16,274 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Command org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=55, mMessage=Drive replication error]] We don't have any detail on this as the vdsm.log is not containing anything related to drive replica. And after that also VmReplicateDiskFinish fails as expected: 2013-05-23 11:43:16,301 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-4-thread-50) [3a9a74d9] START, VmReplicateDiskFinishVDSCommand(HostName = cougar02, HostId = a4af81ea-c847-4c9c-885e-cb763d5bf9ea, vmId=bf027b25-b9f1-4c42-bdfa-d7aca45f2405), log id: 1ee32150 2013-05-23 11:43:16,325 WARN [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Weird return value: StatusForXmlRpc [mCode=55, mMessage=Drive replication error] 2013-05-23 11:43:16,325 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Failed in VmReplicateDiskFinishVDS method 2013-05-23 11:43:16,325 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Error code unexpected and error message VDSGenericException: VDSErrorException: Failed to VmReplicateDiskFinishVDS, error = Drive replication error 2013-05-23 11:43:16,325 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-50) [3a9a74d9] Command org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=55, mMessage=Drive replication error]] The next move fails as there was no cleanup: 2013-05-23 12:17:03,273 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MoveImageGroupVDSCommand] (pool-4-thread-50) [435a81d9] START, MoveImageGroupVDSCommand( storagePoolId = 7fd33b43-a9f4-4eb7-a885-e9583a929ceb, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = 81ef11d0-4c0c-47b4-8953-d61a6af442d8, imageGroupId = 1ea5b233-255a-45ef-958a-499892624c3b, dstDomainId = 38755249-4bb3-4841-bf5b-05f4a521514d, vmId = 00000000-0000-0000-0000-000000000000, op = Move, postZero = false, force = false), log id: 6f37e09e 2013-05-23 12:17:17,145 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-88) BaseAsyncTask::LogEndTaskFailure: Task d40ece4f-b240-4872-abdf-565c8ecdf039 (Parent Command MoveOrCopyDisk, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure: -- Result: cleanSuccess -- Message: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Volume already exists, -- Exception: VDSGenericException: VDSErrorException: Failed in vdscommand to HSMGetAllTasksStatusesVDS, error = Volume already exists I tested this behavior again on the current master (git hash 44ee6e6) failing with the "Drive replication error" on VmReplicateDiskStart and the destination image is now cleaned properly. 2013-10-11 15:10:12,940 INFO [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-6-thread-49) [1fe559bd] Running command: LiveMigrateDiskCommand Task handler: VmReplicateDiskStartTaskHandler internal: fa lse. Entities affected : ID: 3995b4f0-f13e-4650-a1db-0afccc7fdd5e Type: Disk, ID: 864df2d0-b022-435f-a9d1-b7ac0bd766bb Type: Storage 2013-10-11 15:10:12,942 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (pool-6-thread-49) [1fe559bd] START, VmReplicateDiskStartVDSCommand(HostName = vm-rhev1, HostId = 43bd4b0f-3 10d-4ba3-b357-5a08f19cd683, vmId=b6b7783e-971a-4607-9493-bfd474bcf796), log id: 7e926841 2013-10-11 15:10:12,946 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (pool-6-thread-49) [1fe559bd] Failed in VmReplicateDiskStartVDS method 2013-10-11 15:10:12,946 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (pool-6-thread-49) [1fe559bd] Error code replicaErr and error message VDSGenericException: VDSErrorException : Failed to VmReplicateDiskStartVDS, error = Drive replication error 2013-10-11 15:10:12,946 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSCommand] (pool-6-thread-49) [1fe559bd] Command org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskStartVDSComma nd return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=55, mMessage=Drive replication error]] ... 2013-10-11 15:10:12,950 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-6-thread-49) [1fe559bd] START, VmReplicateDiskFinishVDSCommand(HostName = vm-rhev1, HostId = 43bd4b0f -310d-4ba3-b357-5a08f19cd683, vmId=b6b7783e-971a-4607-9493-bfd474bcf796), log id: 38659aca 2013-10-11 15:10:12,954 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-6-thread-49) [1fe559bd] Failed in VmReplicateDiskFinishVDS method 2013-10-11 15:10:12,954 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-6-thread-49) [1fe559bd] Error code replicaErr and error message VDSGenericException: VDSErrorExceptio n: Failed to VmReplicateDiskFinishVDS, error = Drive replication error 2013-10-11 15:10:12,954 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCommand] (pool-6-thread-49) [1fe559bd] Command org.ovirt.engine.core.vdsbroker.vdsbroker.VmReplicateDiskFinishVDSCom mand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=55, mMessage=Drive replication error]] ... 2013-10-11 15:10:12,956 ERROR [org.ovirt.engine.core.bll.lsm.LiveMigrateDiskCommand] (pool-6-thread-49) [1fe559bd] Reverting task deleteImage, handler: org.ovirt.engine.core.bll.lsm.CreateImagePlaceholderTaskHandl er 2013-10-11 15:10:12,979 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-49) [1fe559bd] START, DeleteImageGroupVDSCommand( storagePoolId = 98da1408-948d-4cab-9a8b-418914b e9f07, ignoreFailoverLimit = false, storageDomainId = 864df2d0-b022-435f-a9d1-b7ac0bd766bb, imageGroupId = 3995b4f0-f13e-4650-a1db-0afccc7fdd5e, postZeros = false, forceDelete = false), log id: 51e45ee0 2013-10-11 15:10:14,095 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-49) [1fe559bd] FINISH, DeleteImageGroupVDSCommand, log id: 51e45ee0 Moving to ON_QA.
verified on 3.4.0-0.5.beta1.el6. destination image is being cleaned. Move after failure succeeded.
Closing as part of 3.4.0