Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1372163 - [RFE] Warn user about VMs that have pending snapshot removal retries
[RFE] Warn user about VMs that have pending snapshot removal retries
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
4.0.2
x86_64 Linux
high Severity urgent
: ovirt-4.2.0
: 4.2.0
Assigned To: Ala Hino
meital avital
: FutureFeature, ZStream
Depends On:
Blocks: 1505244 1533061
  Show dependency treegraph
 
Reported: 2016-09-01 01:25 EDT by Germano Veit Michel
Modified: 2018-05-15 13:40 EDT (History)
19 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
When live or cold merge fails, snapshot disks may be left in an illegal state. If VMs with illegal snapshot disks are shut down, they will not re-start. VMs with illegal snapshot disks are now marked with an exclamation mark and a warning message not to shut them down.
Story Points: ---
Clone Of:
: 1533061 (view as bug list)
Environment:
Last Closed: 2018-05-15 13:38:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
ratamir: testing_plan_complete-


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3071431 None None None 2017-06-06 19:20 EDT
oVirt gerrit 80113 master MERGED core: Display an alert if the VM has illegal images 2017-08-08 15:00 EDT
oVirt gerrit 80114 master MERGED core: Add VM search by has_illegal_images 2017-08-08 15:01 EDT
oVirt gerrit 80228 master MERGED Add hasIllegalImages property to VM 2017-08-14 02:01 EDT
oVirt gerrit 80229 master MERGED core: Expose the VM hasIllegalImages property to REST-API 2017-08-23 03:51 EDT
oVirt gerrit 80638 master MERGED core: Enhance the VM has illegal images alert 2017-08-24 03:59 EDT
oVirt gerrit 86054 model_4.1 MERGED Add hasIllegalImages property to VM 2018-01-09 10:36 EST
oVirt gerrit 86217 master ABANDONED restapi: Update to model 4.1.41 2018-01-11 05:45 EST
oVirt gerrit 86218 ovirt-engine-4.1 MERGED restapi: Update to model 4.1.41 2018-01-11 08:20 EST
Red Hat Product Errata RHEA-2018:1488 None None None 2018-05-15 13:40 EDT

  None (edit)
Description Germano Veit Michel 2016-09-01 01:25:15 EDT
Description of problem:

Since latter 3.6 we have the recovery flows which can fix failed snapshot removals. Consider this case:

- A snapshot may fail to remove
- An image is left with illegal state in SD metadata
- User does not try again (ie: removal might be part of a backup script)
- Months later someone shuts down the VM

The VM will fail to come up again due to the illegal image in the SD. It's very easy to fix, but it's not obvious, and the consequence might be downtime for production VMs. A simple retry fixes the problem but this is not shown to the user.

Request:
- For VMs with failed snapshots, put a mark on the VM (something similar to the cluster upgrade icon?)
AND/OR
- Add a confirmation dialog when shutting down the VM explaining the situation

1. Proposed title of this feature request

Mark VMs that have a pending snapshot removal retry in the UI.

2. Who is the customer behind the request?

Red Hat - GSS
Comment 3 Germano Veit Michel 2016-09-01 19:33:12 EDT
See also BZ 1332038
Comment 11 Yaniv Kaul 2017-01-19 03:58:19 EST
Tal - please assign this to someone to assess risk and complexity for the 2nd use case:
"Add a confirmation dialog when shutting down the VM explaining the situation" - I'd argue that any task should have such a warning, but we also need one for failed tasks.
Comment 12 Tal Nisan 2017-01-22 08:14:47 EST
Idan, please have a look, we need to understand if we can gather all the info about the failed snapshot upon deactivating, passing that indication to the UI and from there add a confirmation dialog if needed
Comment 13 Idan Shaby 2017-01-23 11:15:03 EST
We need to differentiate between two cases:
1. The active snapshot is in an illegal state - in this case, indeed the vm can't be restarted.
2. A snapshot which is not the active one (an internal snapshot) is in an illegal state - in this case, the vm *can* be restarted.

To me, in these two cases respectively, it makes sense to:
1. Add a popup when shutting down the VM explaining the situation.
2. Mark the vm in the ui (without a popup).

But this is just my opinion about this.
What do you guys think, before I assess the risk and complexity of it?
Comment 14 Marina 2017-01-23 16:06:36 EST
Idan,
Can you asses both please?
Comment 15 Idan Shaby 2017-01-29 07:04:48 EST
Sure, the risk is quite low as most of the changes should be made in the ui.
We will also need to add a query to check for existing illegal snapshots in a given vm, and call it at the end of cold and live merge, and on run vm.
I guess that it will take something like ~ two weeks to complete.
Comment 18 Allon Mureinik 2017-06-29 12:11:50 EDT
I'd like to revisit the discussion here.
If removing a snapshot fails, the chain as a whole is most certianly valid, an the VM can be run - except for during a short, minimal piovt operation.
But marking the VM in some state can fail exactly like a pivot can fail, so you wouldn't be adding any resilence here, just adding another point of failure.

Can you describe a simple usecase and how this RFE will solve it?
I feel as though I might be missing something.
Comment 28 Allon Mureinik 2017-08-03 03:51:46 EDT
I see a couple of patches added here.
Just to keep the scope clear, the required patches should include
- Changes to the business entity / database / daos
- adding this new field to the search mechanism
- exposing it via the REST API
Comment 29 Ala Hino 2017-08-08 04:18:59 EDT
A note to QE guys:

As the changes to implementing this RFE includes changes to 'vms' view, please verify the performance of the system when verifying this.
Comment 30 Allon Mureinik 2017-08-08 15:02:42 EDT
(In reply to Allon Mureinik from comment #28)
> I see a couple of patches added here.
> Just to keep the scope clear, the required patches should include
> - Changes to the business entity / database / daos
Merged.

> - adding this new field to the search mechanism
Merged.

> - exposing it via the REST API
Pending review on this one. Once it's merged, the bug should be moved to MODIFIED.
Comment 31 Yaniv Lavi 2017-08-13 06:26:36 EDT
Proposed scope by engineering:
1. Cold merge should be usable as a recovery mechanism for live merge (bz#1384321 - targetted to 4.2) 
​2. Add some GUI indiciation that that the chain contains illegal imags FROM THE ENGINE'S DATABASE - a couple of days, need UXD's help.

This bug is targeting to fix #2. Is that acceptable by CEE instead of the RFE request listed in the summary?
Comment 36 Allon Mureinik 2017-09-14 08:54:23 EDT
Ala, can you add some doctext to this please?
Comment 41 Lilach Zitnitski 2017-11-14 04:15:00 EST
--------------------------------------
Tested with the following code:
----------------------------------------
ovirt-engine-4.2.0-0.0.master.20171112130303.git8bc889c.el7.centos.noarch
vdsm-4.20.6-62.gitd3023e4.el7.centos.x86_64

Tested with the following scenario:

Steps to Reproduce:
1. Create vm with disks and start the vm
2. start live merge
3. terminate the merge and cancel the live merge 

Actual results:
An exclamation mark is shown next to the vm's name and when hovering over it a warning is shown about snapshots with illegal disks

Expected results:

Moving to VERIFIED!
Comment 46 errata-xmlrpc 2018-05-15 13:38:43 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Note You need to log in before you can comment on or make changes to this bug.