Created attachment 737275 [details] logs Description of problem: I have encountered this several times. when I have a failure in delete of vm's or problems with the storage during commands on vm's which are linked to a template will leave the template image as shared and we cannot remove the template. I am opening this on engine since I do not think that vdsm can actually do anything as long as there are volumes depending on the template it will remain open. Version-Release number of selected component (if applicable): vdsm-4.10.2-15.0.el6ev.x86_64 sf13.1 How reproducible: randomly Steps to Reproduce: 1. create a pool of vdsm 2. detach the vm's from the pool 3. remove the vm's 4. restart vdsm twice 5. try to remove the template Actual results: cannot remove template although UI is not showing any volumes which are linked to it Expected results: we should be able to remove the template. Additional info: logs
from looking in the logs - >>>> on 14-4, a desktop vm based on template is being created, which means that the vm images aren't copies of the template images..but snapshots on top of it: >>> 2013-04-14 17:12:58,702 INFO [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-37) [9144bd4] Running command: AddVmCommand internal: true. Entities affected : ID: 40 5e236e-22e2-41c6-bfc2-584141ea52aa Type: VdsGroups, ID: 18d866a4-f328-4a75-a17a-9e8562677f33 Type: VmTemplate, ID: 00000000-0000-0000-0000-000000000000 Type: Storage 2013-04-14 17:12:58,722 INFO [org.ovirt.engine.core.bll.AddVmCommand] (pool-4-thread-37) [9144bd4] Lock freed to object EngineLock [exclusiveLocks= key: iscsi-pool-9 value: VM_NAME , sharedLocks= ] 2013-04-14 17:12:58,722 INFO [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (pool-4-thread-37) [9144bd4] START, SetVmStatusVDSCommand( vmId = 808862b7-0999-4039-90 e9-326eacaf6a3c, status = ImageLocked), log id: 39ac5445 2013-04-14 17:12:58,725 INFO [org.ovirt.engine.core.vdsbroker.SetVmStatusVDSCommand] (pool-4-thread-37) [9144bd4] FINISH, SetVmStatusVDSCommand, log id: 39ac5445 2013-04-14 17:12:58,727 INFO [org.ovirt.engine.core.bll.CreateSnapshotFromTemplateCommand] (pool-4-thread-37) [9144bd4] Running command: CreateSnapshotFromTemplateCommand i nternal: true. Entities affected : ID: aa6818fa-b374-4204-9ab7-6474205ac153 Type: Storage 2013-04-14 17:12:58,729 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-37) [9144bd4] START, GetImageInfoVDSCommand( storagePoolId = 9e35da73-9107-4558-beff-cd31c740f822, ignoreFailoverLimit = false, compatabilityVersion = null, storageDomainId = aa6818fa-b374-4204-9ab7-6474205ac153, imageGroupId = f64e0f 9c-4d93-4d8b-b82f-b84e44ddfb28, imageId = 3349ce41-93d1-4a05-852e-bb9f96187211), log id: 7308eb18 2013-04-14 17:13:00,288 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (pool-4-thread-37) [9144bd4] FINISH, GetImageInfoVDSCommand, return: org.ovi rt.engine.core.common.businessentities.DiskImage@9b7ad800, log id: 7308eb18 2013-04-14 17:13:00,291 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-37) [9144bd4] START, CreateSnapshotVDSCommand( storagePoolId = 9e35da73-9107-4558-beff-cd31c740f822, ignoreFailoverLimit = false, compatabilityVersion = 3.0, storageDomainId = aa6818fa-b374-4204-9ab7-6474205ac153, imageGroupId = 30ddb87c-a702-470c-8fcf-b5a58f0eb18e, imageSizeInBytes = 1073741824, volumeFormat = COW, newImageId = 0ad497f7-527c-4a0c-aa10-7b2ea410a295, newImageDescription = , imageId = 3349ce41-93d1-4a05-852e-bb9f96187211, sourceImageGroupId = f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28), log id: 1a501d34 2013-04-14 17:13:00,516 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-37) [9144bd4] -- CreateSnapshotVDSCommand::ExecuteIrsBrokerCommand: calling 'createVolume' with two new parameters: description and UUID 2013-04-14 17:13:00,516 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CreateSnapshotVDSCommand] (pool-4-thread-37) [9144bd4] -- createVolume parameters: sdUUID=aa6818fa-b374-4204-9ab7-6474205ac153 spUUID=9e35da73-9107-4558-beff-cd31c740f822 imgGUID=30ddb87c-a702-470c-8fcf-b5a58f0eb18e size=1,073,741,824 bytes volFormat=COW volType=Sparse volUUID=0ad497f7-527c-4a0c-aa10-7b2ea410a295 descr= srcImgGUID=f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28 srcVolUUID=3349ce41-93d1-4a05-852e-bb9f96187211 >>>> later on, the template image is being copied to another domain. 2013-04-14 17:11:45,643 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (pool-4-thread-37) [1fb9053e] START, CopyImageVDSCommand( storagePoolId = 9e35d a73-9107-4558-beff-cd31c740f822, ignoreFailoverLimit = false, compatabilityVersion = 3.0, storageDomainId = aa6818fa-b374-4204-9ab7-6474205ac153, imageGroupId = f64e0f9c-4d9 3-4d8b-b82f-b84e44ddfb28, imageId = 3349ce41-93d1-4a05-852e-bb9f96187211, dstImageGroupId = f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28, vmId = 00000000-0000-0000-0000-000000000000 , dstImageId = 3349ce41-93d1-4a05-852e-bb9f96187211, imageDescription = , dstStorageDomainId = 54c19ecf-62b3-4350-be2d-7865b116844d, copyVolumeType = SharedVol, volumeFormat = RAW, preallocate = Preallocated, postZero = false, force = false), log id: 45f1e7aa 2013-04-14 17:11:45,643 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (pool-4-thread-37) [1fb9053e] -- CopyImageVDSCommand::ExecuteIrsBrokerCommand: calling 'copyImage' with two new parameters: description and UUID 2013-04-14 17:11:45,643 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.CopyImageVDSCommand] (pool-4-thread-37) [1fb9053e] -- copyImage parameters: sdUUID=aa6818fa-b374-4204-9ab7-6474205ac153 spUUID=9e35da73-9107-4558-beff-cd31c740f822 vmGUID=00000000-0000-0000-0000-000000000000 srcImageGUID=f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28 srcVolUUID=3349ce41-93d1-4a05-852e-bb9f96187211 dstImageGUID=f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28 dstVolUUID=3349ce41-93d1-4a05-852e-bb9f96187211 descr= 2013-04-14 17:11:46,478 INFO [org.ovirt.engine.core.bll.LoginUserCommand] (ajp-/127.0.0.1:8702-6) Running command: LoginUserCommand internal: false. 2013-04-14 17:11:46,487 WARN [org.ovirt.engine.core.compat.backendcompat.PropertyInfo] (ajp-/127.0.0.1:8702-6) Unable to get value of property: glusterVolume for class org.ovirt.engine.core.bll.LoginUserCommand later on, we try to move the image based on the template images, to another domain (the one that the template image was copied to) during the execution of moveImage in vdsm, the engine is being restarted - 2013-04-14 17:27:29,037 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.MoveImageGroupVDSCommand] (pool-4-thread-40) [7409b0cc] START, MoveImageGroupVDSCommand( storagePoolId = 9e35da73-9107-4558-beff-cd31c740f822, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = aa6818fa-b374-4204-9ab7-6474205ac153, imageGroupId = 30ddb87c-a702-470c-8fcf-b5a58f0eb18e, dstDomainId = 54c19ecf-62b3-4350-be2d-7865b116844d, vmId = 00000000-0000-0000-0000-000000000000, op = Move, postZero = false, force = false), log id: 42d0bc6b 2013-04-14 17:28:43,122 ERROR [org.ovirt.engine.core.bll.MoveOrCopyImageGroupCommand] (pool-4-thread-40) [7409b0cc] Command org.ovirt.engine.core.bll.MoveOrCopyImageGroupCommand throw Vdc Bll exception. With error message VdcBLLException: java.lang.reflect.UndeclaredThrowableException 2013-04-14 17:28:43,125 ERROR [org.ovirt.engine.core.bll.MoveOrCopyImageGroupCommand] (pool-4-thread-40) [7409b0cc] Transaction rolled-back for command: org.ovirt.engine.core.bll.MoveOrCopyImageGroupCommand. so - I can see from the engine log that the move operation has reached vdsm, later on we got exception - so i can't know what happened or didn't happen on vdsm side (the logs aren't reaching back this far, the vdsm log is only from 18-4 and not from 14-4). later on, we try to delete the template image and fail because of shared volume. 2013-04-18 12:46:55,869 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-4-thread-45) [24af5525] START, DeleteImageGroupVDSCommand( storage PoolId = 9e35da73-9107-4558-beff-cd31c740f822, ignoreFailoverLimit = false, compatabilityVersion = 3.2, storageDomainId = 54c19ecf-62b3-4350-be2d-7865b116844d, imageGroupId = f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28, postZeros = false, forceDelete = false), log id: 1c67d0af 2013-04-18 12:46:56,889 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-45) [24af5525] Command DeleteImageGroupVDS execution failed. Exception: IrsOper ationFailedNoFailoverException: IRSGenericException: IRSErrorException: Shared Volume cannot be deleted: ("Cannot delete shared image f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28. v olImgs: {'3349ce41-93d1-4a05-852e-bb9f96187211': ImgsPar(imgs=['f64e0f9c-4d93-4d8b-b82f-b84e44ddfb28', '0e5aa786-d62d-4229-9e93-4c809fbcd0a5', 'b1965c2d-7cd3-46a9-ae20-3029a 81e93b4', '24882ab0-10a9-450d-a68d-f1003517df7a', '7eb27728-8560-4006-a15b-e8c48106527e', '8faf5e94-c1c8-4780-b306-213e72e1074e'], parent='00000000-0000-0000-0000-0000000000 00')}",) 2013-04-18 12:46:56,889 INFO [org after we would split the move command the copy+delete, we would be able to attempt to delete those volumes. regardless, this image should be represent as exiting on two domains, so if the delete fails - we will know that we might have orphand copy. Allon - I think that this can be postponed to 3.3, we can solve it that way or it will be somehow solved with the tasks changes that are being done. Eduardo, as you looked over vdsm - let me know if you somehow disagree. regardless, we don't have here the full relevant vdsm logs which isn't optimal.
> Allon - I think that this can be postponed to 3.3, we can solve it that way > or it will be somehow solved with the tasks changes that are being done. > Eduardo, as you looked over vdsm - let me know if you somehow disagree. Agree.
In 3.3 we changed move implementation (split to copy and delete) and this should be solved now.
Verified, tested on RHEVM 3.3 - IS15 environment: Host OS: RHEL 6.5 RHEVM: rhevm-3.3.0-0.22.master.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.14-1.el6ev.noarch VDSM: vdsm-4.12.0-138.gitab256be.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-24.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.402.el6.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64
Failed, tested on RHEVM 3.3 - IS20 environment: Tested on FCP Data Centers Host OS: RHEL 6.5 RHEVM: rhevm-3.3.0-0.28.beta1.el6ev.noarch PythonSDK: rhevm-sdk-python-3.3.0.17-1.el6ev.noarch VDSM: vdsm-4.13.0-0.5.beta1.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-29.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.414.el6.x86_64 SANLOCK: sanlock-2.8-1.el6.x86_64 Logs attached engine.log 2013-10-27 14:01:18,386 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-5-thread-49) Command DeleteImageGroupVDS execution failed. Exception: IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Shared Volume cannot be deleted: ("Cannot delete shared image eec2f3a2-6c17-4900-8896-6aedfacc5afb. volImgs: {'c5230d0e-2aeb-455c-8657-2d2b37ec6e9b': ImgsPar(imgs=['eec2f3a2-6c17-4900-8896-6aedfacc5afb', '3bd3164e-0ab9-446a-87f0-a959147d3d6c', '5aad1e4a-c73e-4b42-82b8-2587996c7136', '33e048ae-3f12-4bb4-b9c8-38b34bdf05b5', 'ea67478f-4672-46cd-a2e3-804ca82aa8c3'], parent='00000000-0000-0000-0000-000000000000')}",) 2013-10-27 14:01:18,386 ERROR [org.ovirt.engine.core.bll.RemoveTemplateSnapshotCommand] (pool-5-thread-49) Command org.ovirt.engine.core.bll.RemoveTemplateSnapshotCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IrsOperationFailedNoFailoverException: IRSGenericException: IRSErrorException: Shared Volume cannot be deleted: ("Cannot delete shared image eec2f3a2-6c17-4900-8896-6aedfacc5afb. volImgs: {'c5230d0e-2aeb-455c-8657-2d2b37ec6e9b': ImgsPar(imgs=['eec2f3a2-6c17-4900-8896-6aedfacc5afb', '3bd3164e-0ab9-446a-87f0-a959147d3d6c', '5aad1e4a-c73e-4b42-82b8-2587996c7136', '33e048ae-3f12-4bb4-b9c8-38b34bdf05b5', 'ea67478f-4672-46cd-a2e3-804ca82aa8c3'], parent='00000000-0000-0000-0000-000000000000')}",) (Failed with error CannotDeleteSharedVolume and code 223) 2013-10-27 14:01:18,388 ERROR [org.ovirt.engine.core.bll.RemoveTemplateSnapshotCommand] (pool-5-thread-49) Transaction rolled-back for command: org.ovirt.engine.core.bll.RemoveTemplateSnapshotCommand. 2013-10-27 14:01:18,457 ERROR [org.ovirt.engine.core.bll.RemoveAllVmTemplateImageTemplatesCommand] (pool-5-thread-49) Transaction rolled-back for command: org.ovirt.engine.core.bll.RemoveAllVmTemplateImageTemplatesCommand. vdsm.log Thread-104::ERROR::2013-10-27 12:01:16,776::task::850::TaskManager.Task::(_setError) Task=`176fdd5a-e04f-46f4-9488-0a8bf500706b`::Unexpected error Thread-104::ERROR::2013-10-27 12:01:16,784::dispatcher::67::Storage.Dispatcher.Protect::(run) {'status': {'message': 'Shared Volume cannot be deleted: ("Cannot delete shared image eec2f3a2-6c17-4900-8896-6aedfacc5afb. volImgs: {\'c5230d0e-2aeb-455c-8657-2d2b37ec6e9b\': ImgsPar(imgs=[\'eec2f3a2-6c17-4900-8896-6aedfacc5afb\', \'3bd3164e-0ab9-446a-87f0-a959147d3d6c\', \'5aad1e4a-c73e-4b42-82b8-2587996c7136\', \'33e048ae-3f12-4bb4-b9c8-38b34bdf05b5\', \'ea67478f-4672-46cd-a2e3-804ca82aa8c3\'], parent=\'00000000-0000-0000-0000-000000000000\')}",)', 'code': 223}}
Created attachment 816505 [details] ## Logs rhevm, vdsm, libvirt, thread dump, superVdsm
Workaround: Delete all illegal disks and delete template again
The issue of attempting to delete a template disk which is used as a shared volume by a disk known in the engine was resolved in the provided patch. We can still reach a state in which we have a a disk whose volume is shared and we'll fail to delete it as the based disks on it are unknown in the engine - this should be handled as part of 978975 or the dependent patch. therefore moving to post.
Verified using av6 Verification steps: 1. create a pool of VMs from template 2. detach the VMs from the pool 3. remove the VMs (remove disks) 4. restart vdsm twice 5. try to remove the template
*** Bug 1088959 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html
*** Bug 1088902 has been marked as a duplicate of this bug. ***