Bug 877978

Summary: engine: we are able to create live snapshot and suspend vm
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: ovirt-engineAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Dafna Ron <dron>
Severity: high Docs Contact:
Priority: high    
Version: 3.1.0CC: abaron, amureini, dyasny, hateya, iheim, italkohe, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul
Target Milestone: ---Keywords: ZStream
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: sf5 Doc Type: Bug Fix
Doc Text:
Cause: Race between live snapshot and VM hibernate, cause results which might be undefined. Fix: The proposed fix is adding hibernateVM lock on the VM to avoid this behaviour.
Story Points: ---
Clone Of:
: 902484 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 902484, 915537    
Attachments:
Description Flags
logs none

Description Dafna Ron 2012-11-19 11:07:37 UTC
Created attachment 647695 [details]
logs

Description of problem:

engine does not lock the vm right away which allowed me to create a live snapshot and move the vm to hibernate at the same time. 
once that happens it appears that both actions succeeded but if we try to create a second snapshot we get the following error in vdsm: 

Thread-299::ERROR::2012-11-19 12:49:18,041::libvirtvm::1812::vm.Vm::(snapshot) vmId=`10796f9e-6b75-43fa-984a-bcdd4f6d562f`::The base volume doesn't exist: {'device': 'disk', 'domainID': '59022168-9350-41d1-9e6f-5ef87b6bcd42', 'volumeID': '9ddcecc2-5560-442a-9ac9-eb2be49e7112', 'imageID': 'c2a9831d-cb22-45a5-bffa-622d43d27d53'}

Version-Release number of selected component (if applicable):

si24.2
vdsm-cli-4.9.6-43.0.el6_3.noarch

How reproducible:

100%

Steps to Reproduce:
1. create and run a vm
2. create live snapshot -> hibernate the vm
3. run the vm -> try to create a second snpahsot
  
Actual results:

both actions appear to succeed. but if we try to create a second snapshot we get error from vdsm 

Expected results:

we should fail to hibernate the vm once we started a snapshot

Additional info:logs


this is the vm: 

[root@gold-vdsd ~]# vdsClient -s 0 list table
10796f9e-6b75-43fa-984a-bcdd4f6d562f   4297  RHEL_Clone           Up    

please note that it was running on hsm and the error is on the hsm log and not the spm log

Comment 4 Maor 2013-01-21 17:13:46 UTC
The proposed fix was to add locks when hibernate a VM and when creating a snapshot.

There is another issue which I encountered, that when the VM was already at hibernate state I tried to create a snapshot, and when the VM was Up again after hibernation the VM disks were still running upon the original volume and not the new created one.

I suspect that the VDSM exception origin was from that issue,
although I tried to reproduce it, but all worked fine for me (accept the bug I encountered).

I think that this bug should be split to two,
one for the race between create snapshot and hibernate
and the other on run VM after a snapshot was created while it was suspended.
What do you think?

Comment 5 Ayal Baron 2013-01-21 18:30:26 UTC
(In reply to comment #4)
> The proposed fix was to add locks when hibernate a VM and when creating a
> snapshot.
> 
> There is another issue which I encountered, that when the VM was already at
> hibernate state I tried to create a snapshot, and when the VM was Up again
> after hibernation the VM disks were still running upon the original volume
> and not the new created one.
> 
> I suspect that the VDSM exception origin was from that issue,
> although I tried to reproduce it, but all worked fine for me (accept the bug
> I encountered).
> 
> I think that this bug should be split to two,
> one for the race between create snapshot and hibernate

Ack.

> and the other on run VM after a snapshot was created while it was suspended.
> What do you think?

Is this reproducible? if so then please file a bug.

Comment 9 Dafna Ron 2013-02-05 13:41:56 UTC
verified that lock exists on sf5

Comment 11 Itamar Heim 2013-06-11 08:48:48 UTC
3.2 has been released

Comment 12 Itamar Heim 2013-06-11 08:48:48 UTC
3.2 has been released

Comment 13 Itamar Heim 2013-06-11 08:49:11 UTC
3.2 has been released

Comment 14 Itamar Heim 2013-06-11 08:53:46 UTC
3.2 has been released

Comment 15 Itamar Heim 2013-06-11 09:24:24 UTC
3.2 has been released