Red Hat Bugzilla – Bug 891610
3.2 - engine: live snapshot fails due to race on multiple move of disks (live storage migration)
Last modified: 2016-02-10 13:12:33 EST
Description of problem:
createVolume is not finished on spm when we send prepareVolume which will fail because volume does not exist yet.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. create 3 domains in iscsi
2. create a template with 15GB thin provision disk and OS installed and create 20 pool vm's on one domain and 2 more vm's as clone on a second domain
3. run the vms on 2 hosts and have them write (opening explorer in the vms will be enough no need for heavy writing)
4. from vm's tab -> disks -> move each of the vm's disks to the 3ed domain
some of the vms fail to create the live snapshot because prepareVolume was sent to hsm before the createVolume finished creating the volume.
we should not send prepareVolume before confirming that createVolume completed successfully.
since there are a lot of tasks running at the same time and we already debugged here is the relevant info:
in spm log, create volume is Thread-4416:: Task is 11b6930d-5f8c-435e-9de4-46c6cc34684f
prepareVolume for same action in hsm is on Thread-4275:
this is the error with the volume that failed:
VolumeMetadataReadError: Error while processing volume meta data: ('missing offset tag on volume 710f7022-29ae-40cd-9b41-3a8a22fd8cc1',)
this is the SnapshotVDSCommand for the vm:
2012-11-14 12:15:36,360 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) START, SnapshotVDSCommand(HostName = gold-vdsd, HostId = 2d81a26a-2c20-11e2-aeab-001a4a169741, vmId=3d393cd1-666e-4283-a2b6-ce99e74656f4), log id: 57fa069c
and here is the failure in engine log:
2012-11-14 12:15:47,105 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) FINISH, SnapshotVDSCommand, log id: 57fa069c
2012-11-14 12:15:47,105 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-23) Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, rolling back.
*** Bug 891609 has been marked as a duplicate of this bug. ***
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64
3.2 has been released