982636 – Cloning VM from snapshot of another VM results in corruption of original VM

Bug 982636 - Cloning VM from snapshot of another VM results in corruption of original VM

Summary: Cloning VM from snapshot of another VM results in corruption of original VM

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine-webadmin-portal
Sub Component:
Version:	3.2.0
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	3.3.0
Assignee:	Tomas Jelinek
QA Contact:	Pavel Stehlik
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:	1016700
TreeView+	depends on / blocked

Reported:	2013-07-09 13:13 UTC by Rejy M Cyriac
Modified:	2018-12-04 15:36 UTC (History)
CC List:	19 users (show)
Fixed In Version:	is7
Doc Type:	Bug Fix
Doc Text:	Previously, cloning a virtual machine based on the snapshot of a virtual machine would clone the virtual machine devices to the original virtual machine instead of the new virtual machine. This would cause the original virtual machine to become corrupted. With this update, virtual machine devices are now correctly copied to the new virtual machine, preventing corruption in the original virtual machine and making it possible to clone new virtual machines based on the snapshot of a virtual machine without issue.
Clone Of:
Clones:	1016700 (view as bug list)
Environment:
Last Closed:	2014-01-21 17:33:33 UTC
oVirt Team:	---
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2014:0038	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Virtualization Manager 3.3.0 update	2014-01-21 22:03:06 UTC
oVirt gerrit	17007	0	None	None	None	Never

Description Rejy M Cyriac 2013-07-09 13:13:13 UTC

Description of problem:
In a RHEV-RHS environment, on cloning a VM from the snapshot of another VM, the new VM boots up fine, but the original VM of which the snapshot was taken is rendered as not bootable. The failure message on boot-up is:

------------------------------------------------------

VM <VM name> is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'.

Failed to run VM <VM name> on Host <Hypervisor name>.

------------------------------------------------------

The only way to rescue the original VM is to commit the snapshot. This will make the VM bootable. However, it results in the loss of all data created after the snapshot was taken.

* The issue is reproducible for Live Snapshot and for Snapshot taken after VM shut-down.

* The issue is reproducible for VM with 'pre-allocated' disk, and for VM with 'thin-provision' disk.

* The issue is reproducible when the image store Storage Domain is based of RHS volume of type distribute-replicate, and of RHS volume of type pure replicate

Version-Release number of selected component (if applicable):

RHEVM: 3.2 (3.2.0-11.37.el6ev)

RHS: 2.0+ (glusterfs-server-3.3.0.11rhs-1.el6rhs.x86_64)

Hypervisors: RHEL 6.4 and RHEVH 6.4 with glusterfs-3.3.0.11rhs-1.el6.x86_64 and glusterfs-fuse-3.3.0.11rhs-1.el6.x86_64

How reproducible:
Always Reproducible.

Steps to Reproduce:

1. Create VM

2. Seal VM for cloning (remove unique identifiers like hostname,MAC address etc.)

3. Create Live Snapshot, or shut-down the VM and create Snapshot

4. Clone VM from snapshot

5. After clone process is completed, boot up original VM. This fails with message:

------------------------------------------------------

VM quizzac1 is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'.

Failed to run VM quizzac1 on Host RHEVH6.4-rhs-gp-srv15.

------------------------------------------------------

6. Boot up the cloned VM. It will boot up okay.

7. Attempt to rescue Original VM

a) If the snapshot is deleted from the original VM, it is still not bootable, and shows same error messages. The VM will now be irrecoverably lost!

b) Switch on the 'Preview Mode' of snapshot of the original VM, and on running the VM in preview mode, the VM boots up okay, but as expected it will have no data created from the period after the snapshot was taken.

c) Shutdown the VM from 'Preview Mode', and choose to 'Undo Preview', so as to restore from snapshot. The VM would then be not bootable, showing same error messages.

d) Switch on the 'Preview Mode' of snapshot of the original VM again. Confirm it boots okay, shut it down. Then choose to 'Commit Preview' of snapshot. Then the original VM is bootable again, but the data created from the period after snapshot was taken is now lost forever!

Actual results:
After a VM is cloned from the snapshot of another VM, the original VM is corrupted, and is recoverable only with loss of the data created from the period after snapshot.

Expected results:
After a VM is cloned from the snapshot of another VM, the original VM should be bootable to its state before shut-down.

Additional info:

Comment 5 Rejy M Cyriac 2013-07-11 12:35:21 UTC

I have tried the issue reproduction on RHEV Data Center, with pure NFS Storage Domain used as image store, so as to ensure whether scope of bug is limited to
RHS volume.

The result is that the issue *is always* reproducible when using pure NFS Storage Domain, in the same way as when RHS backed Storage Domain is used. So it looks like this may be caused by a Regression issue in RHEV.

The issue description remains valid whether a pure NFS or an RHS volume is used for the Storage Domain.

Versions used for the current test:

RHEVM: 3.2 (3.2.0-11.37.el6ev)

Hypervisor: RHEL 6.4

Storage Domain: NFS share from RHEL 6.4 system

P.S. I can provide any additional information needed, and assist with any data collection and testing, if required.

- rejy (rmc)

Comment 7 Rejy M Cyriac 2013-07-11 14:30:10 UTC

I played around with this Bug some more, and managed to narrow down the environment factors leading to the issue, and steps on how to get back the original VM.

Factors leading to issue:
------------------------

The issue is related to the Console Protocol for the original VM. The issue appears to occur only if the Console Protocol of 'VNC' is chosen for the VM of which the snapshot is taken. Then when the VM is shut down, and a VM is cloned off the snapshot, the original VM refuses to boot up, with the message:

VM <VM name> is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'.

And a way to get back the original VM back, is to preview and commit on the original VM, the same snapshot used to clone the other VM. But that results in the loss of all data created after the period when the snapshot was taken.

Discovered Now! Steps to get back the original VM with data intact:
------------------------------------------------------------------

After a VM is cloned off the snapshot, and while the original VM remains shut down, change the Console Protocol to 'Spice', and click on 'OK' to save. Then you may start up the original VM straight away, or again edit the Console Protocol back to 'VNC', save, and then start up the original VM. Either way, the original VM boots up fine, with all data intact, even from the period after the snapshot was taken.

Now I am not sure if this is a Regression, or if the issue was always there, and we never hit it, because there is a chance we may not have used the combination of VNC console protocol, VM snapshot, and cloning VM !

Hope this helps in weeding out the cause of the issue. :-)

Cheers!

- rejy (rmc)

Comment 9 Rejy M Cyriac 2013-07-12 07:00:23 UTC

Just to clarify on the 'Factors leading to issue:' part of comment 7 :

The issue occurs only if 'VNC' is the 'Console Protocol' for the original VM, *at the time of* creation of the snapshot, to be used for VM cloning. Later on, even if the original VM's 'Console Protocol' is changed to 'Spice', the issue will still occur as soon as that specific snapshot is used to clone a VM.

Comment 10 Tomas Jelinek 2013-07-17 15:12:27 UTC

merged u/s: 58720c1faf1d95f4e869b662fcbe2b3bd9f02889

Comment 12 David Botzer 2013-09-08 08:37:17 UTC

Fixed, 3.3/is12
Both VMs, original and cloned-from-snapshot
working ok,
Fixed, 3.3/is12

Comment 15 Charlie 2013-11-28 00:15:24 UTC

This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 17 errata-xmlrpc 2014-01-21 17:33:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0038.html

Note You need to log in before you can comment on or make changes to this bug.