Description of problem: In a RHEV-RHS environment, on cloning a VM from the snapshot of another VM, the new VM boots up fine, but the original VM of which the snapshot was taken is rendered as not bootable. The failure message on boot-up is: ------------------------------------------------------ VM <VM name> is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'. Failed to run VM <VM name> on Host <Hypervisor name>. ------------------------------------------------------ The only way to rescue the original VM is to commit the snapshot. This will make the VM bootable. However, it results in the loss of all data created after the snapshot was taken. * The issue is reproducible for Live Snapshot and for Snapshot taken after VM shut-down. * The issue is reproducible for VM with 'pre-allocated' disk, and for VM with 'thin-provision' disk. * The issue is reproducible when the image store Storage Domain is based of RHS volume of type distribute-replicate, and of RHS volume of type pure replicate Version-Release number of selected component (if applicable): RHEVM: 3.2 (3.2.0-11.37.el6ev) RHS: 2.0+ (glusterfs-server-3.3.0.11rhs-1.el6rhs.x86_64) Hypervisors: RHEL 6.4 and RHEVH 6.4 with glusterfs-3.3.0.11rhs-1.el6.x86_64 and glusterfs-fuse-3.3.0.11rhs-1.el6.x86_64 How reproducible: Always Reproducible. Steps to Reproduce: 1. Create VM 2. Seal VM for cloning (remove unique identifiers like hostname,MAC address etc.) 3. Create Live Snapshot, or shut-down the VM and create Snapshot 4. Clone VM from snapshot 5. After clone process is completed, boot up original VM. This fails with message: ------------------------------------------------------ VM quizzac1 is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'. Failed to run VM quizzac1 on Host RHEVH6.4-rhs-gp-srv15. ------------------------------------------------------ 6. Boot up the cloned VM. It will boot up okay. 7. Attempt to rescue Original VM a) If the snapshot is deleted from the original VM, it is still not bootable, and shows same error messages. The VM will now be irrecoverably lost! b) Switch on the 'Preview Mode' of snapshot of the original VM, and on running the VM in preview mode, the VM boots up okay, but as expected it will have no data created from the period after the snapshot was taken. c) Shutdown the VM from 'Preview Mode', and choose to 'Undo Preview', so as to restore from snapshot. The VM would then be not bootable, showing same error messages. d) Switch on the 'Preview Mode' of snapshot of the original VM again. Confirm it boots okay, shut it down. Then choose to 'Commit Preview' of snapshot. Then the original VM is bootable again, but the data created from the period after snapshot was taken is now lost forever! Actual results: After a VM is cloned from the snapshot of another VM, the original VM is corrupted, and is recoverable only with loss of the data created from the period after snapshot. Expected results: After a VM is cloned from the snapshot of another VM, the original VM should be bootable to its state before shut-down. Additional info:
I have tried the issue reproduction on RHEV Data Center, with pure NFS Storage Domain used as image store, so as to ensure whether scope of bug is limited to RHS volume. The result is that the issue *is always* reproducible when using pure NFS Storage Domain, in the same way as when RHS backed Storage Domain is used. So it looks like this may be caused by a Regression issue in RHEV. The issue description remains valid whether a pure NFS or an RHS volume is used for the Storage Domain. Versions used for the current test: RHEVM: 3.2 (3.2.0-11.37.el6ev) Hypervisor: RHEL 6.4 Storage Domain: NFS share from RHEL 6.4 system P.S. I can provide any additional information needed, and assist with any data collection and testing, if required. - rejy (rmc)
I played around with this Bug some more, and managed to narrow down the environment factors leading to the issue, and steps on how to get back the original VM. Factors leading to issue: ------------------------ The issue is related to the Console Protocol for the original VM. The issue appears to occur only if the Console Protocol of 'VNC' is chosen for the VM of which the snapshot is taken. Then when the VM is shut down, and a VM is cloned off the snapshot, the original VM refuses to boot up, with the message: VM <VM name> is down. Exit message: unsupported configuration: non-primary video device must be type of 'qxl'. And a way to get back the original VM back, is to preview and commit on the original VM, the same snapshot used to clone the other VM. But that results in the loss of all data created after the period when the snapshot was taken. Discovered Now! Steps to get back the original VM with data intact: ------------------------------------------------------------------ After a VM is cloned off the snapshot, and while the original VM remains shut down, change the Console Protocol to 'Spice', and click on 'OK' to save. Then you may start up the original VM straight away, or again edit the Console Protocol back to 'VNC', save, and then start up the original VM. Either way, the original VM boots up fine, with all data intact, even from the period after the snapshot was taken. Now I am not sure if this is a Regression, or if the issue was always there, and we never hit it, because there is a chance we may not have used the combination of VNC console protocol, VM snapshot, and cloning VM ! Hope this helps in weeding out the cause of the issue. :-) Cheers! - rejy (rmc)
Just to clarify on the 'Factors leading to issue:' part of comment 7 : The issue occurs only if 'VNC' is the 'Console Protocol' for the original VM, *at the time of* creation of the snapshot, to be used for VM cloning. Later on, even if the original VM's 'Console Protocol' is changed to 'Spice', the issue will still occur as soon as that specific snapshot is used to clone a VM.
merged u/s: 58720c1faf1d95f4e869b662fcbe2b3bd9f02889
Fixed, 3.3/is12 Both VMs, original and cloned-from-snapshot working ok, Fixed, 3.3/is12
This bug is currently attached to errata RHEA-2013:15231. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag. Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information: * Cause: What actions or circumstances cause this bug to present. * Consequence: What happens when the bug presents. * Fix: What was done to fix the bug. * Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore') Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug. For further details on the Cause, Consequence, Fix, Result format please refer to: https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes Thanks in advance.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0038.html