Red Hat Bugzilla – Bug 1266973
Cannot start or revert VM with failed stateless snapshot
Last modified: 2016-05-23 05:34:45 EDT
Description of problem:
I had a stateless VM running when a problem came up with its storage domain. I manually stopped the VM and now it still contains a stateless snapshot. The snapshot is preventing the VM from running and the snapshot cannot be removed by any normal means.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Run VM in stateless mode
2. Cause problem in storage domain so that VM transitions to paused.
3. Manually stop the VM. Observe that the stateless snapshot remains.
4. Attempt to start VM, observe that the VM stays stopped but ovirt engine.log contains NullPointerException error.
Cannot start VM with snapshot, cannot remove snapshot.
Either the snapshot should be automatically removed, or admin should be allowed to remove it manually.
The snapshot cannot be removed from eithr web UI or from CLI shell.
I have engine.log stack trace for the NullPointerException if needed.
This may be the same issue as #1072375 but I cannot tell for sure. I am very much interested in at least a workaround for the non-startable VM I currently have though.
yes, please provide all the logs
Created attachment 1106050 [details]
a segment of engine.log during the failed restoration
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
I followed the reproduction steps and didn't manage to reproduce it.
Generally speaking, I don't think that this case justifies a mechanism to remove the stateless snapshot - it should be done automatically by the system whenever a stateless VM goes down or powering up with stateless snapshot that we didn't manage to remove before.
In this particular case, the stateless snapshot removal fails because we reach an inconsistent state in the database: on the one hand, we have a device for the disk and on the other hand the disk itself doesn't exist.
I wonder how could it happen. I don't see a way to get to this state - could it be that the disk has been manually removed from the database in order to recover from the problem with the storage domain (what was the problem)? anyway, without further information/logs that could explain how the disk has been removed, we cannot make any further progress with this.
So I'm closing it as it couldn't be reproduced with the reported reproduction steps. Obviously we reached a state we should't get to so if there is additional information about that - feel free to reopen.