Bug 1266973 - Cannot start or revert VM with failed stateless snapshot
Cannot start or revert VM with failed stateless snapshot
Status: CLOSED WORKSFORME
Product: ovirt-engine
Classification: oVirt
Component: General (Show other bugs)
3.5.1.1
x86_64 Linux
high Severity medium (vote)
: ovirt-4.0.0-beta
: ---
Assigned To: Arik
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-28 13:24 EDT by Brian Sipos
Modified: 2016-05-23 05:34 EDT (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-23 05:34:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Virt
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
tjelinek: ovirt‑4.0.0?
mgoldboi: testing_plan_complete?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?


Attachments (Terms of Use)
a segment of engine.log during the failed restoration (37.00 KB, text/plain)
2015-12-15 09:47 EST, Brian Sipos
no flags Details

  None (edit)
Description Brian Sipos 2015-09-28 13:24:13 EDT
Description of problem:
I had a stateless VM running when a problem came up with its storage domain. I manually stopped the VM and now it still contains a stateless snapshot. The snapshot is preventing the VM from running and the snapshot cannot be removed by any normal means.

Version-Release number of selected component (if applicable):
3.5.1.1

How reproducible:
Unknown

Steps to Reproduce:
1. Run VM in stateless mode
2. Cause problem in storage domain so that VM transitions to paused.
3. Manually stop the VM. Observe that the stateless snapshot remains.
4. Attempt to start VM, observe that the VM stays stopped but ovirt engine.log contains NullPointerException error.

Actual results:
Cannot start VM with snapshot, cannot remove snapshot.

Expected results:
Either the snapshot should be automatically removed, or admin should be allowed to remove it manually.

Additional info:
The snapshot cannot be removed from eithr web UI or from CLI shell.
I have engine.log stack trace for the NullPointerException if needed.
Comment 1 Brian Sipos 2015-09-28 13:25:59 EDT
This may be the same issue as #1072375 but I cannot tell for sure. I am very much interested in at least a workaround for the non-startable VM I currently have though.
Comment 2 Tomas Jelinek 2015-12-15 09:13:55 EST
yes, please provide all the logs
Comment 3 Brian Sipos 2015-12-15 09:47 EST
Created attachment 1106050 [details]
a segment of engine.log during the failed restoration
Comment 4 Sandro Bonazzola 2016-05-02 05:58:39 EDT
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Comment 5 Arik 2016-05-23 05:34:45 EDT
I followed the reproduction steps and didn't manage to reproduce it.

Generally speaking, I don't think that this case justifies a mechanism to remove the stateless snapshot - it should be done automatically by the system whenever a stateless VM goes down or powering up with stateless snapshot that we didn't manage to remove before.

In this particular case, the stateless snapshot removal fails because we reach an inconsistent state in the database: on the one hand, we have a device for the disk and on the other hand the disk itself doesn't exist.

I wonder how could it happen. I don't see a way to get to this state - could it be that the disk has been manually removed from the database in order to recover from the problem with the storage domain (what was the problem)? anyway, without further information/logs that could explain how the disk has been removed, we cannot make any further progress with this.

So I'm closing it as it couldn't be reproduced with the reported reproduction steps. Obviously we reached a state we should't get to so if there is additional information about that - feel free to reopen.

Note You need to log in before you can comment on or make changes to this bug.