Bug 1029069 - Live storage migration snapshot removal fails, probably due to unexpected qemu-img output
Live storage migration snapshot removal fails, probably due to unexpected qem...
Status: CLOSED CURRENTRELEASE
Product: oVirt
Classification: Community
Component: vdsm (Show other bugs)
3.3
x86_64 Linux
unspecified Severity medium
: ---
: 3.3.2
Assigned To: Federico Simoncelli
Haim
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-11-11 10:32 EST by Sander
Modified: 2016-02-10 15:35 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-27 10:00:00 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
20131112 snapshot only vdsm.log (4.55 MB, text/plain)
2013-11-12 04:19 EST, Sander
no flags Details
20131112 snapshot only libvirtd.log (3.04 MB, text/plain)
2013-11-12 04:20 EST, Sander
no flags Details

  None (edit)
Description Sander 2013-11-11 10:32:24 EST
Description of problem:

When I move a disk (of a running vm) to a different storage pool a live
snapshot ('Auto-generated for Live Storage Migration') is created.

This snapshot is not removed after the storage migration is completed and
cannot be removed from a running VM ( probably because live snapshot
deletion is not implemented yet?).

When I stop the VM and then remove the snapshot this fails and the snapshot
state changes to "BROKEN".

I've done some additional testing and the problem only occurs when a
preallocated disk is used.

One of the errors in vdsm.log is:

14:02:53,772::image::1164::Storage.Image::(merge) Auto shrink after
merge failed
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/image.py", line 1162, in merge
    srcVol.shrinkToOptimalSize()
  File "/usr/share/vdsm/storage/blockVolume.py", line 320, in
shrinkToOptimalSize
    qemuImg.FORMAT.QCOW2)
  File "/usr/lib64/python2.6/site-packages/vdsm/qemuImg.py", line 109, in check
    raise QImgError(rc, out, err, "unable to parse qemu-img check output")
QImgError: ecode=0, stdout=['No errors were found on the image.'],
stderr=[], message=unable to parse qemu-img check output
Thread-20::ERROR::2013-11-08

I've looked at the parsing error and the problem seems to be that the
qemuImg.check() function expects a qemu-img version that outputs a
string like: "Image end offset: 262144".

The only version I could find that does that is the one in the F19 virt-preview
repository. Stock F19 qemu-img doesn't have this output. Nor do the stock EL6
or qemu-img-rhev EL6 versions.

The only function that calls qemuImg.check is blockVolume.shrinkToOptimalSize()
this function needs (and uses) the parsed offset to shrink the LV containing the
qcow2 image.


Version-Release number of selected component (if applicable):
vdsm-4.12.1-4.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create VM with preallocated disk ( in my case on FC storage ).
2. Start VM
3. Do a live migration of the preallocated disk to a different storage domain.
4. Shut down VM
5. Remove the 'Auto-generated for Live Storage Migration' snapshot.

Actual results:
Snapshot removal fails and snapshot state changes to "BROKEN"

Expected results:
Snapshot removal succeeds.

Additional info:

Mailing list thread: http://lists.ovirt.org/pipermail/users/2013-November/017695.html
Ovirt 3.3 on EL6 (centos6).
Comment 1 Sander 2013-11-12 04:18:00 EST
I've done some additional testing with just the live snapshot creation and removal.

1. Create VM with preallocated storage
2. Start VM
3. Make live snapshot
4. Shutdown VM
5. Try to remove snapshot

The result is the same "Failed to delete snapshot" error and a "BROKEN" snapshot.

I'll attach the full vdsm log (forced logrotate before the test).
In this log I do not see the qemu-img parse error.
I do see the following errors:

d09b8ba0-659b-4d33-906b-2ae84931c29a::ERROR::2013-11-12 10:08:04,661::task::850::TaskManager.Task::(_setError) Task=`d09b8ba0-659b-4d33-906b-2ae84931c29a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
    sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
  File "/usr/share/vdsm/storage/image.py", line 1162, in merge
    srcVol.shrinkToOptimalSize()
  File "/usr/share/vdsm/storage/blockVolume.py", line 315, in shrinkToOptimalSize
    volParams = self.getVolumeParams()
  File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams
    volParams['imgUUID'] = self.getImage()
  File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage
    return self.getVolumeTag(TAG_PREFIX_IMAGE)
  File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag
    return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix)
  File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag
    tags = lvm.getLV(sdUUID, volUUID).tags
  File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist: ('9d4e8a43-4851-42ff-a684-f3d802527cf7/_remove_me_gxRZPitO_3d8b0fa7-03f0-4c54-a79b-15a72922b3f1',)

...

Traceback (most recent call last):
  File "/usr/share/vdsm/clientIF.py", line 331, in teardownVolumePath
    res = self.irs.teardownImage(drive['domainID'],
  File "/usr/share/vdsm/vm.py", line 1344, in __getitem__
    raise KeyError(key)
KeyError: 'domainID'
d09b8ba0-659b-4d33-906b-2ae84931c29a::WARNING::2013-11-12 10:07:43,261::lvm::594::Storage.LVM::(getLv) lv: 3d8b0fa7-03f0-4c54-a79b-15a72922b3f1_MERGE not found in lvs vg: 9d4e8a43-4851-42ff-a684-f3d802527cf7 response

...

d09b8ba0-659b-4d33-906b-2ae84931c29a::ERROR::2013-11-12 10:08:04,661::task::850::TaskManager.Task::(_setError) Task=`d09b8ba0-659b-4d33-906b-2ae84931c29a`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/task.py", line 318, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
    sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
  File "/usr/share/vdsm/storage/image.py", line 1162, in merge
    srcVol.shrinkToOptimalSize()
  File "/usr/share/vdsm/storage/blockVolume.py", line 315, in shrinkToOptimalSize
    volParams = self.getVolumeParams()
  File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams
    volParams['imgUUID'] = self.getImage()
  File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage
    return self.getVolumeTag(TAG_PREFIX_IMAGE)
  File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag
    return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix)
  File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag
    tags = lvm.getLV(sdUUID, volUUID).tags
  File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV
    raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist: ('9d4e8a43-4851-42ff-a684-f3d802527cf7/_remove_me_gxRZPitO_3d8b0fa7-03f0-4c54-a79b-15a72922b3f1',)
Comment 2 Sander 2013-11-12 04:19:58 EST
Created attachment 822789 [details]
20131112 snapshot only vdsm.log
Comment 3 Sander 2013-11-12 04:20:54 EST
Created attachment 822790 [details]
20131112 snapshot only libvirtd.log
Comment 4 Sander 2013-11-12 05:15:42 EST
Might be related to BZ 1015071 and BZ 1027464 ?
Comment 5 Sander 2013-11-12 10:13:44 EST
The patch at http://gerrit.ovirt.org/#/c/19983/ from BZ 1015071 seems to fix this issue.
Comment 6 Sander 2013-11-27 10:00:00 EST
Seems like this one is fixed in 3.3.1, after upgrading from 3.3.0 to 3.3.1 snapshot removal works.

Note You need to log in before you can comment on or make changes to this bug.