Description of problem: When I move a disk (of a running vm) to a different storage pool a live snapshot ('Auto-generated for Live Storage Migration') is created. This snapshot is not removed after the storage migration is completed and cannot be removed from a running VM ( probably because live snapshot deletion is not implemented yet?). When I stop the VM and then remove the snapshot this fails and the snapshot state changes to "BROKEN". I've done some additional testing and the problem only occurs when a preallocated disk is used. One of the errors in vdsm.log is: 14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after merge failed Traceback (most recent call last): File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 320, in shrinkToOptimalSize qemuImg.FORMAT.QCOW2) File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check raise QImgError(rc, out, err, "unable to parse qemu-img check output") QImgError: ecode=0, stdout=['No errors were found on the image.'], stderr=[], message=unable to parse qemu-img check output Thread-20::ERROR::2013-11-08 I've looked at the parsing error and the problem seems to be that the qemuImg.check() function expects a qemu-img version that outputs a string like: "Image end offset: 262144". The only version I could find that does that is the one in the F19 virt-preview repository. Stock F19 qemu-img doesn't have this output. Nor do the stock EL6 or qemu-img-rhev EL6 versions. The only function that calls qemuImg.check is blockVolume.shrinkToOptimalSize() this function needs (and uses) the parsed offset to shrink the LV containing the qcow2 image. Version-Release number of selected component (if applicable): vdsm-4.12.1-4.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Create VM with preallocated disk ( in my case on FC storage ). 2. Start VM 3. Do a live migration of the preallocated disk to a different storage domain. 4. Shut down VM 5. Remove the 'Auto-generated for Live Storage Migration' snapshot. Actual results: Snapshot removal fails and snapshot state changes to "BROKEN" Expected results: Snapshot removal succeeds. Additional info: Mailing list thread: http://lists.ovirt.org/pipermail/users/2013-November/017695.html Ovirt 3.3 on EL6 (centos6).
I've done some additional testing with just the live snapshot creation and removal. 1. Create VM with preallocated storage 2. Start VM 3. Make live snapshot 4. Shutdown VM 5. Try to remove snapshot The result is the same "Failed to delete snapshot" error and a "BROKEN" snapshot. I'll attach the full vdsm log (forced logrotate before the test). In this log I do not see the qemu-img parse error. I do see the following errors: d09b8ba0-659b-4d33-906b-2ae84931c29a::ERROR::2013-11-12 10:08:04,661::task::850::TaskManager.Task::(_setError) Task=`d09b8ba0-659b-4d33-906b-2ae84931c29a`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots sdUUID, vmUUID, imgUUID, ancestor, successor, postZero) File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 315, in shrinkToOptimalSize volParams = self.getVolumeParams() File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams volParams['imgUUID'] = self.getImage() File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage return self.getVolumeTag(TAG_PREFIX_IMAGE) File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix) File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag tags = lvm.getLV(sdUUID, volUUID).tags File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName)) LogicalVolumeDoesNotExistError: Logical volume does not exist: ('9d4e8a43-4851-42ff-a684-f3d802527cf7/_remove_me_gxRZPitO_3d8b0fa7-03f0-4c54-a79b-15a72922b3f1',) ... Traceback (most recent call last): File "/usr/share/vdsm/clientIF.py", line 331, in teardownVolumePath res = self.irs.teardownImage(drive['domainID'], File "/usr/share/vdsm/vm.py", line 1344, in __getitem__ raise KeyError(key) KeyError: 'domainID' d09b8ba0-659b-4d33-906b-2ae84931c29a::WARNING::2013-11-12 10:07:43,261::lvm::594::Storage.LVM::(getLv) lv: 3d8b0fa7-03f0-4c54-a79b-15a72922b3f1_MERGE not found in lvs vg: 9d4e8a43-4851-42ff-a684-f3d802527cf7 response ... d09b8ba0-659b-4d33-906b-2ae84931c29a::ERROR::2013-11-12 10:08:04,661::task::850::TaskManager.Task::(_setError) Task=`d09b8ba0-659b-4d33-906b-2ae84931c29a`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 318, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots sdUUID, vmUUID, imgUUID, ancestor, successor, postZero) File "/usr/share/vdsm/storage/image.py", line 1162, in merge srcVol.shrinkToOptimalSize() File "/usr/share/vdsm/storage/blockVolume.py", line 315, in shrinkToOptimalSize volParams = self.getVolumeParams() File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams volParams['imgUUID'] = self.getImage() File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage return self.getVolumeTag(TAG_PREFIX_IMAGE) File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix) File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag tags = lvm.getLV(sdUUID, volUUID).tags File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName)) LogicalVolumeDoesNotExistError: Logical volume does not exist: ('9d4e8a43-4851-42ff-a684-f3d802527cf7/_remove_me_gxRZPitO_3d8b0fa7-03f0-4c54-a79b-15a72922b3f1',)
Created attachment 822789 [details] 20131112 snapshot only vdsm.log
Created attachment 822790 [details] 20131112 snapshot only libvirtd.log
Might be related to BZ 1015071 and BZ 1027464 ?
The patch at http://gerrit.ovirt.org/#/c/19983/ from BZ 1015071 seems to fix this issue.
Seems like this one is fixed in 3.3.1, after upgrading from 3.3.0 to 3.3.1 snapshot removal works.