Bug 891610 - 3.2 - engine: live snapshot fails due to race on multiple move of disks (live storage migration)
3.2 - engine: live snapshot fails due to race on multiple move of disks (live...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
3.2.0
Unspecified Unspecified
unspecified Severity high
: ---
: 3.2.0
Assigned To: Eduardo Warszawski
Dafna Ron
storage
:
: 891609 (view as bug list)
Depends On:
Blocks: 915537
  Show dependency treegraph
 
Reported: 2013-01-03 07:08 EST by Yeela Kaplan
Modified: 2016-02-10 13:12 EST (History)
9 users (show)

See Also:
Fixed In Version: vdsm-4.10.2-4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 876558
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 9358 None None None Never

  None (edit)
Description Yeela Kaplan 2013-01-03 07:08:34 EST
Description of problem:

createVolume is not finished on spm when we send prepareVolume which will fail because volume does not exist yet. 

Version-Release number of selected component (if applicable):

si24.1

How reproducible:

100%

Steps to Reproduce:
1. create 3 domains in iscsi 
2. create a template with 15GB thin provision disk and OS installed and create 20 pool vm's on one domain and 2 more vm's as clone on a second domain
3. run the vms on 2 hosts and have them write (opening explorer in the vms will be enough no need for heavy writing)
4. from vm's tab -> disks -> move each of the vm's disks to the 3ed domain
  
Actual results:

some of the vms fail to create the live snapshot because prepareVolume was sent to hsm before the createVolume finished creating the volume. 

Expected results:

we should not send prepareVolume before confirming that createVolume completed successfully. 

Additional info:logs

since there are a lot of tasks running at the same time and we already debugged here is the relevant info: 

in spm log, create volume is Thread-4416:: Task is 11b6930d-5f8c-435e-9de4-46c6cc34684f

prepareVolume for same action in hsm is on Thread-4275:

this is the error with the volume that failed: 

VolumeMetadataReadError: Error while processing volume meta data: ('missing offset tag on volume 710f7022-29ae-40cd-9b41-3a8a22fd8cc1',)

engine log: 

this is the SnapshotVDSCommand for the vm: 

2012-11-14 12:15:36,360 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) START, SnapshotVDSCommand(HostName = gold-vdsd, HostId = 2d81a26a-2c20-11e2-aeab-001a4a169741, vmId=3d393cd1-666e-4283-a2b6-ce99e74656f4), log id: 57fa069c


and here is the failure in engine log:

2012-11-14 12:15:47,105 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) FINISH, SnapshotVDSCommand, log id: 57fa069c
2012-11-14 12:15:47,105 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-23) Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, rolling back.
Comment 1 Yeela Kaplan 2013-01-03 07:10:37 EST
http://gerrit.ovirt.org/#/c/9358/
Comment 4 Yeela Kaplan 2013-01-03 08:55:25 EST
*** Bug 891609 has been marked as a duplicate of this bug. ***
Comment 6 Dafna Ron 2013-03-13 08:43:47 EDT
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64
Comment 7 Itamar Heim 2013-06-11 05:26:14 EDT
3.2 has been released
Comment 8 Itamar Heim 2013-06-11 05:30:41 EDT
3.2 has been released
Comment 9 Itamar Heim 2013-06-11 05:46:14 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.