Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1152529

Summary: Problem with vm snapshots and disk
Product: [oVirt] ovirt-engine Reporter: Shirly Radco <sradco>
Component: Frontend.WebAdminAssignee: Shmuel Melamud <smelamud>
Status: CLOSED WORKSFORME QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: medium    
Version: ---CC: bugs, ecohen, gklein, istein, lsurette, mgoldboi, michal.skrivanek, rbalakri, Rhev-m-bugs, smelamud, sradco, yeylon
Target Milestone: ovirt-4.0.0-alphaFlags: ylavi: ovirt-4.0.0?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-03 16:43:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot1
none
screenshot2
none
screenshot3
none
screenshot4
none
screenshot5
none
engine logs for 2013-10-14 none

Description Shirly Radco 2014-10-14 11:16:53 UTC
Description of problem:

The snapshots tab shows the same snapshot "Active VM before the preview" several times. Snapshots can not be deleted and can not go back to earlier snapshots.
The Disks tab shows the same storage several times.


Version-Release number of selected component (if applicable):
3.4.2-0.2.el6ev

How reproducible:
Not sure how to recreate.

Steps to Reproduce:
1.
2.
3.

Actual results:
Should show "Active VM" one time.
Should show the disk one time.

Expected results:


Additional info:
See attached snapshots.

Comment 1 Shirly Radco 2014-10-14 11:17:52 UTC
Created attachment 946840 [details]
screenshot1

Comment 2 Shirly Radco 2014-10-14 11:18:27 UTC
Created attachment 946842 [details]
screenshot2

Comment 3 Shirly Radco 2014-10-14 11:19:05 UTC
Created attachment 946843 [details]
screenshot3

Comment 4 Shirly Radco 2014-10-14 11:19:58 UTC
Created attachment 946844 [details]
screenshot4

Comment 5 Shirly Radco 2014-10-14 11:22:28 UTC
Created attachment 946845 [details]
screenshot5

Comment 6 Omer Frenkel 2014-10-15 06:23:48 UTC
can you please explain what did you do to get to this?
also please attach engine+spm logs

Comment 7 Shirly Radco 2014-10-19 05:37:00 UTC
(In reply to Omer Frenkel from comment #6)
> can you please explain what did you do to get to this?
> also please attach engine+spm logs

This happened to me on production environment of eng lab.
First I shutdown the vm and tried to go back to a previous snapshot, using "preview" and then commit but it created the multiple  ""Active VM before the preview" snapshot . Tried this several times and tried to delete the previous snapshots but with no success.

Comment 8 Omer Frenkel 2014-10-20 11:26:59 UTC
any chance for the logs?

Comment 9 Shirly Radco 2014-10-23 08:40:35 UTC
Created attachment 949743 [details]
engine logs for 2013-10-14

Comment 10 Omer Frenkel 2014-10-28 15:02:47 UTC
Looks like an issue with rollback/compensation of RestoreAllSnapshotsCommand
that cause duplicate entries in the snapshots table

i see in the log that DeleteImageGroupVDSCommand fails:

2014-10-14 08:53:36,974 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (ajp-/127.0.0.1:8702-14) [502dbcc9] IrsBroker::Failed::DeleteImageGroupVDS due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to DeleteImageGroupVDS, error = Storage domain does not exist

this means the snapshot had memory volume, which was on a storage domain that was already deleted...

Comment 11 Michal Skrivanek 2015-06-05 13:19:51 UTC
still want to consider this "soon"

Comment 12 Omer Frenkel 2015-07-21 15:18:27 UTC
Shmuel is this duplicate of bug 1236061 ?

Comment 13 Shmuel Melamud 2015-07-22 14:44:27 UTC
I don't think so. In bug 1236061 the problem appears when exception is thrown in CreateAllSnapshotsFromVmCommand. I don't see anything similar in the log here. Here the problem appeared after running RestoreAllSnapshotsCommand. I've taken a look on it and I see that the command itself is transactive, but compensation is used somewhere. If an exception is thrown, this may cause a snapshot that should be deleted to appear twice - the first as the result of transaction rollback, and the second as the result of compensation of the same removal. But this is just a guess.

Comment 14 Ilanit Stein 2015-08-20 08:52:16 UTC
Shmual,

Can you please provide steps to reproduce?

Thanks,
Ilanit.

Comment 15 Michal Skrivanek 2015-09-14 08:26:43 UTC
(In reply to Omer Frenkel from comment #10)
> Looks like an issue with rollback/compensation of RestoreAllSnapshotsCommand
> that cause duplicate entries in the snapshots table
> 
> i see in the log that DeleteImageGroupVDSCommand fails:
> 
> 2014-10-14 08:53:36,974 ERROR
> [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand]
> (ajp-/127.0.0.1:8702-14) [502dbcc9] IrsBroker::Failed::DeleteImageGroupVDS
> due to: IRSErrorException: IRSGenericException: IRSErrorException: Failed to
> DeleteImageGroupVDS, error = Storage domain does not exist
> 
> this means the snapshot had memory volume, which was on a storage domain
> that was already deleted...

in this case, shouldn't we ignore the error and just continue instead of rollback?

Comment 16 Omer Frenkel 2015-09-16 15:17:42 UTC
lowering priority as we dont have a clear reproducer for this (Shmuel couldn't reproduce so far)
from the logs it seems that on storage issue there is an issue with rollback.
we should consider pushing to 4.0.

Comment 18 Shmuel Melamud 2015-09-24 18:22:33 UTC
It is unclear where the broken snapshot comes from. The log doesn't give an answer. The error condition in RestoreAllSnapshotsCommand appears only as result of that, it is not the cause of the problem.

In this situation, if we cannot find where the problem lies originally, we can at least ignore the StorageDomainDoesNotExist in DeleteImage. It looks logical and safe and will allow to remove the broken snapshot without modifying the DB directly.

See the patch in Gerrit: https://gerrit.ovirt.org/46706

Comment 19 Shmuel Melamud 2015-10-12 12:58:30 UTC
After some research, I don't think this is a good solution anymore.

I've created a VM with a disk on a separate storage domain and then cleaned up manually the storage directory. The storage domain went down and I've detached it from the data center. After that I've tried to remove the VM and got error message about absence of the storage domain.

It would be good, from my point of view, to give user possibility to remove the disk in such situation. But simply ignoring this error from RemoveImageCommand doesn't help, we need to remove this check from RemoveImageCommand.canDoAction(). The logic will be: if RemoveImageCommand is called to remove an image, but the storage domain is not available, delete just record from the DB and don't execute the VDSM action. But this logic is bad: if the storage domain is just temporarily not available and user executed RemoveImageCommand without knowing about it, this will leave the image orphaned  on the storage.

It will be more correct to check, if the storage domain is completely detached from the DC or even already not known to the engine. In regular scenario it will cause all links to this storage domain to be deleted. If there is a record in the DB pointing to an image on such a storage, it means an error occured. In such case we can safely allow RemoveImage to remove just the record in the DB.

Michal, Omer, what do you think?

Comment 20 Shmuel Melamud 2015-10-20 18:31:09 UTC
In the case I've described above it is possible to remove the VM using 'Destroy' command from the right-click menu of the VM. This command removes the VM and all links to its disk images, even when the images itself are not accessible.

If the situation described in the bug is similar, user can also use 'Destroy' to remove the disfunctional VM.

Shirly, can you give some additional information about the VM?

1. Content of 'Storage' tab.
2. Content of 'Disk' subtab of the VM.
3. Name of the storage domain where each of the disk of the VM is located.

Is it possible to run the VM? Are the disks accessible?

Comment 21 Red Hat Bugzilla Rules Engine 2015-11-16 14:11:10 UTC
This bug is flagged for 3.6, yet the milestone is for 4.0 version, therefore the milestone has been reset.
Please set the correct milestone or add the flag.

Comment 22 Shirly Radco 2015-11-17 09:06:36 UTC
(In reply to Shmuel Melamud from comment #20)
> In the case I've described above it is possible to remove the VM using
> 'Destroy' command from the right-click menu of the VM. This command removes
> the VM and all links to its disk images, even when the images itself are not
> accessible.
> 
> If the situation described in the bug is similar, user can also use
> 'Destroy' to remove the disfunctional VM.
> 
> Shirly, can you give some additional information about the VM?
> 
> 1. Content of 'Storage' tab.
> 2. Content of 'Disk' subtab of the VM.
> 3. Name of the storage domain where each of the disk of the VM is located.
> 
> Is it possible to run the VM? Are the disks accessible?

The vm was deleted from nott4 so I cant give you more details. Sorry.

Comment 23 Michal Skrivanek 2015-11-23 13:55:59 UTC
pending closure if there are no news in ~14 days

Comment 24 Shmuel Melamud 2016-01-03 16:43:29 UTC
No news since month ago, closing.