Bug 1266973

Summary: Cannot start or revert VM with failed stateless snapshot
Product: [oVirt] ovirt-engine Reporter: Brian Sipos <BSipos>
Component: GeneralAssignee: Arik <ahadas>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: 3.5.1.1CC: amureini, BSipos, bugs, mgoldboi, tjelinek
Target Milestone: ovirt-4.0.0-betaFlags: tjelinek: ovirt-4.0.0?
mgoldboi: testing_plan_complete?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-23 09:34:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
a segment of engine.log during the failed restoration none

Description Brian Sipos 2015-09-28 17:24:13 UTC
Description of problem:
I had a stateless VM running when a problem came up with its storage domain. I manually stopped the VM and now it still contains a stateless snapshot. The snapshot is preventing the VM from running and the snapshot cannot be removed by any normal means.

Version-Release number of selected component (if applicable):
3.5.1.1

How reproducible:
Unknown

Steps to Reproduce:
1. Run VM in stateless mode
2. Cause problem in storage domain so that VM transitions to paused.
3. Manually stop the VM. Observe that the stateless snapshot remains.
4. Attempt to start VM, observe that the VM stays stopped but ovirt engine.log contains NullPointerException error.

Actual results:
Cannot start VM with snapshot, cannot remove snapshot.

Expected results:
Either the snapshot should be automatically removed, or admin should be allowed to remove it manually.

Additional info:
The snapshot cannot be removed from eithr web UI or from CLI shell.
I have engine.log stack trace for the NullPointerException if needed.

Comment 1 Brian Sipos 2015-09-28 17:25:59 UTC
This may be the same issue as #1072375 but I cannot tell for sure. I am very much interested in at least a workaround for the non-startable VM I currently have though.

Comment 2 Tomas Jelinek 2015-12-15 14:13:55 UTC
yes, please provide all the logs

Comment 3 Brian Sipos 2015-12-15 14:47:31 UTC
Created attachment 1106050 [details]
a segment of engine.log during the failed restoration

Comment 4 Sandro Bonazzola 2016-05-02 09:58:39 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 5 Arik 2016-05-23 09:34:45 UTC
I followed the reproduction steps and didn't manage to reproduce it.

Generally speaking, I don't think that this case justifies a mechanism to remove the stateless snapshot - it should be done automatically by the system whenever a stateless VM goes down or powering up with stateless snapshot that we didn't manage to remove before.

In this particular case, the stateless snapshot removal fails because we reach an inconsistent state in the database: on the one hand, we have a device for the disk and on the other hand the disk itself doesn't exist.

I wonder how could it happen. I don't see a way to get to this state - could it be that the disk has been manually removed from the database in order to recover from the problem with the storage domain (what was the problem)? anyway, without further information/logs that could explain how the disk has been removed, we cannot make any further progress with this.

So I'm closing it as it couldn't be reproduced with the reported reproduction steps. Obviously we reached a state we should't get to so if there is additional information about that - feel free to reopen.