Bug 1649129 - The VM has snapshot(s) with disk(s) in illegal status
Summary: The VM has snapshot(s) with disk(s) in illegal status
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.2.6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.5
: ---
Assignee: Ahmad Khiet
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-11-13 00:31 UTC by Toni Feric
Modified: 2020-05-19 12:17 UTC (History)
3 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2019-06-23 09:10:28 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
engine.log (2018-11-12) (125.09 KB, application/x-gzip)
2018-11-13 00:31 UTC, Toni Feric
no flags Details

Description Toni Feric 2018-11-13 00:31:43 UTC
Created attachment 1505024 [details]
engine.log (2018-11-12)

Description of problem:
After deleting a snapshot, the VM does not start anymore.
oVirt is displaying an exclamation mark next to the VM with the message "The VM has snapshot(s) with disk(s) in illegal status. You may not be able to start the VM before successfully before successfully trying the snapshot delete."
Before deleting the snapshot, the VM was running fine.

Version-Release number of selected component (if applicable):
Version 4.2.6.4-1.el7

It seems that the snapshot was rather old.
Running a "Live Merge" took a lot of time, and it seems that it merged and removed 3 out of 4 disks successfully, but for the biggest disk, it eventually timed out and broke off, leaving the whole VM in an inconsistent state.
The VM now refuses to start, despite the fact that the boot disk seems actually ok.

After shutting down the VM, the offline Merge also failed, but this time immediately.
Cloning the VM to another VM does not work.

Maybe the "Live Merge" process gave up too early? Maybe the disk corruption would not have happened, and would have turned out correctly, if the process just continued?

Is there a way to rescue the affected disk?

Please see the engine.log

Comment 1 Sandro Bonazzola 2019-01-28 09:34:14 UTC
This bug has not been marked as blocker for oVirt 4.3.0.
Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.

Comment 2 Ahmad Khiet 2019-05-28 08:45:23 UTC

The failure in the engine log does not provide all the information why VDSM has failed to process the command.

I need VDSM logs to better understand the timeout/connection issue.


"Is there a way to rescue the affected disk?"

yes, you can try two ways.
BUT, create a backup before doing the steps.


1. if the goal is to delete the image (which is illegal) you can delete it manually.
to get image id, use the following command: 

     vdsm-tool dump-volume-chains
this command will show a list of disks with disk status / if it's legal or illegal. 
then you delete the disk with the id.

2. you can change the status of the disk image from illegal to legal, then you can start the VM. 

use: 
    vdsm-tool dump-volume-chains

to list images and get image id.

then, create a JSON file with the right information, and use the command to update the volume.

$ cat update.json
  {
      "job_id":"<job_uuid>",
      "vol_info": {
          "sd_id": "<sd_id>",
          "img_id": "<img_id>",
          "vol_id": "<vol_id>",
          "generation": "<vol_gen>"
      },
      "legality": "LEGAL"
      }
  }

  $ vdsm-client SDM update_volume -f update.json

Comment 3 Tal Nisan 2019-06-23 09:10:28 UTC
Needinfo pending for a while now, closing until new information will be provided


Note You need to log in before you can comment on or make changes to this bug.