Bug 891610 - 3.2 - engine: live snapshot fails due to race on multiple move of disks (live storage migration)
Summary: 3.2 - engine: live snapshot fails due to race on multiple move of disks (live...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.2.0
Assignee: Eduardo Warszawski
QA Contact: Dafna Ron
URL:
Whiteboard: storage
: 891609 (view as bug list)
Depends On:
Blocks: 915537
TreeView+ depends on / blocked
 
Reported: 2013-01-03 12:08 UTC by Yeela Kaplan
Modified: 2016-02-10 18:12 UTC (History)
9 users (show)

Fixed In Version: vdsm-4.10.2-4.0
Doc Type: Bug Fix
Doc Text:
Clone Of: 876558
Environment:
Last Closed:
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 9358 0 None None None Never

Description Yeela Kaplan 2013-01-03 12:08:34 UTC
Description of problem:

createVolume is not finished on spm when we send prepareVolume which will fail because volume does not exist yet. 

Version-Release number of selected component (if applicable):

si24.1

How reproducible:

100%

Steps to Reproduce:
1. create 3 domains in iscsi 
2. create a template with 15GB thin provision disk and OS installed and create 20 pool vm's on one domain and 2 more vm's as clone on a second domain
3. run the vms on 2 hosts and have them write (opening explorer in the vms will be enough no need for heavy writing)
4. from vm's tab -> disks -> move each of the vm's disks to the 3ed domain
  
Actual results:

some of the vms fail to create the live snapshot because prepareVolume was sent to hsm before the createVolume finished creating the volume. 

Expected results:

we should not send prepareVolume before confirming that createVolume completed successfully. 

Additional info:logs

since there are a lot of tasks running at the same time and we already debugged here is the relevant info: 

in spm log, create volume is Thread-4416:: Task is 11b6930d-5f8c-435e-9de4-46c6cc34684f

prepareVolume for same action in hsm is on Thread-4275:

this is the error with the volume that failed: 

VolumeMetadataReadError: Error while processing volume meta data: ('missing offset tag on volume 710f7022-29ae-40cd-9b41-3a8a22fd8cc1',)

engine log: 

this is the SnapshotVDSCommand for the vm: 

2012-11-14 12:15:36,360 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) START, SnapshotVDSCommand(HostName = gold-vdsd, HostId = 2d81a26a-2c20-11e2-aeab-001a4a169741, vmId=3d393cd1-666e-4283-a2b6-ce99e74656f4), log id: 57fa069c


and here is the failure in engine log:

2012-11-14 12:15:47,105 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SnapshotVDSCommand] (pool-4-thread-23) FINISH, SnapshotVDSCommand, log id: 57fa069c
2012-11-14 12:15:47,105 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-4-thread-23) Wasnt able to live snpashot due to error: VdcBLLException: VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to SnapshotVDS, error = Snapshot failed, rolling back.

Comment 1 Yeela Kaplan 2013-01-03 12:10:37 UTC
http://gerrit.ovirt.org/#/c/9358/

Comment 4 Yeela Kaplan 2013-01-03 13:55:25 UTC
*** Bug 891609 has been marked as a duplicate of this bug. ***

Comment 6 Dafna Ron 2013-03-13 12:43:47 UTC
verified on sf10 with vdsm-4.10.2-11.0.el6ev.x86_64 and libvirt-0.10.2-18.el6_4.eblake.2.x86_64

Comment 7 Itamar Heim 2013-06-11 09:26:14 UTC
3.2 has been released

Comment 8 Itamar Heim 2013-06-11 09:30:41 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 09:46:14 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.