Bug 891610

Summary: 3.2 - engine: live snapshot fails due to race on multiple move of disks (live storage migration)
Product: Red Hat Enterprise Virtualization Manager Reporter: Yeela Kaplan <ykaplan>
Component: vdsmAssignee: Eduardo Warszawski <ewarszaw>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, bazulay, cpelland, hateya, iheim, lpeer, oourfali, scohen, ykaul
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: vdsm-4.10.2-4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 876558 Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 915537    

Description Yeela Kaplan 2013-01-03 12:08:34 UTC
Description of problem:

createVolume is not finished on spm when we send prepareVolume which will fail because volume does not exist yet. 

Version-Release number of selected component (if applicable):

si24.1

How reproducible:

100%

Steps to Reproduce:
1. create 3 domains in iscsi 
2. create a template with 15GB thin provision disk and OS installed and create 20 pool vm's on one domain and 2 more vm's as clone on a second domain
3. run the vms on 2 hosts and have them write (opening explorer in the vms will be enough no need for heavy writing)
4. from vm's tab -> disks -> move each of the vm's disks to the 3ed domain
  
Actual results:

some of the vms fail to create the live snapshot because prepareVolume was sent to hsm before the createVolume finished creating the volume. 

Expected results:

we should not send prepareVolume before confirming that createVolume completed successfully. 

Additional info:logs

since there are a lot of tasks running at the same time and we already debugged here is the relevant info: 

in spm log, create volume is Thread-4416:: Task is 11b6930d-5f8c-435e-9de4-46c6cc34684f

prepareVolume for same action in hsm is on Thread-4275:

this is the error with the volume that failed: 

VolumeMetadataReadError: Error while processing volume meta data: ('missing offset tag on volume 710f7022-29ae-40cd-9b41-3a8a22fd8cc1',)

engine log: 

this is the SnapshotVDSCommand for the vm: 

2012-11-14 12:15:36,360 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) START, SnapshotVDSCommand(HostName = gold-vdsd, HostId = 2d81a26a-2c20-11e2-aeab-001a4a169741, vmId=3d393cd1-666e-4283-a2b6-ce99e74656f4), log id: 57fa069c


and here is the failure in engine log:

2012-11-14 12:15:47,105 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) FINISH, SnapshotVDSCommand, log id: 57fa069c
2012-11-14 12:15:47,105 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-23) Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, rolling back.

Comment 1 Yeela Kaplan 2013-01-03 12:10:37 UTC
http://gerrit.ovirt.org/#/c/9358/

Comment 4 Yeela Kaplan 2013-01-03 13:55:25 UTC
*** Bug 891609 has been marked as a duplicate of this bug. ***

Comment 6 Dafna Ron 2013-03-13 12:43:47 UTC
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64

Comment 7 Itamar Heim 2013-06-11 09:26:14 UTC
3.2 has been released

Comment 8 Itamar Heim 2013-06-11 09:30:41 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 09:46:14 UTC
3.2 has been released