Bug 1304761 - Failed to delete disk snapshot - becomes illegal
Failed to delete disk snapshot - becomes illegal
Status: CLOSED CURRENTRELEASE
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage (Show other bugs)
3.6.0.3
Unspecified Unspecified
unspecified Severity high (vote)
: ovirt-3.6.6
: 3.6.6.1
Assigned To: Ala Hino
Aharon Canan
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-04 09:33 EST by Kevin Alon Goldblatt
Modified: 2017-07-27 08:14 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-05-30 06:56:19 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
acanan: testing_ack+


Attachments (Terms of Use)
engine, vdsm, server logs (911.04 KB, application/x-gzip)
2016-02-04 09:44 EST, Kevin Alon Goldblatt
no flags Details
logs (8.21 MB, application/x-tar)
2016-05-15 09:17 EDT, Ori Gofen
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 56472 master MERGED core: Live Merge: Improve Live Merge Recovery Mechanism 2016-04-28 05:01 EDT
oVirt gerrit 56534 master ABANDONED core: Live Merge: Validate snapshot legality 2016-05-22 07:26 EDT
oVirt gerrit 56734 ovirt-engine-3.6 MERGED core: Live Merge: Improve Live Merge Recovery Mechanism 2016-04-28 06:22 EDT

  None (edit)
Description Kevin Alon Goldblatt 2016-02-04 09:33:31 EST
Description of problem:
Deleting a Disk snapshot fails and the disk becomes illegal

Version-Release number of selected component (if applicable):


How reproducible:
Happened once

Steps to Reproduce:
1. Create a VM with 2 preallocted block disks
2. Create a snapshot "snapshot1"
3. Start the VM
4. Via the Storage Tab -> Snapshot Disks -> select one of the disks in the list and press Delete >>>> The message displayed reads "Disk 'vm6_Disk1' from Snapshot(s) 'snapshot1' of VM 'vm6' deletion has been completed (User: admin@internal)." BUT a few minutes later get a failed to delete message"Failed to complete deletion of Disk 'vm6_Disk1' from snapshot(s) 'snapshot1' of VM 'vm6' ". The snapshot disk is now displayed as 'Illegal'.
Deleting the second snapshot disk was successful.


Actual results:
The snapshot disk is not deleted and displayed as illegal

Expected results:
The snapshot disk should be deleted

Additional info:
Comment 1 Kevin Alon Goldblatt 2016-02-04 09:44 EST
Created attachment 1121122 [details]
engine, vdsm, server logs

Adding logs
Comment 2 Yaniv Kaul 2016-02-05 18:03:59 EST
Kevin 
- anything in the logs?
- How reproducible is it?
Comment 3 Allon Mureinik 2016-02-08 07:30:34 EST
The relevant part of of the engine log:

2016-02-04 11:57:39,565 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: 4da04bd3
2016-02-04 11:57:39,565 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) [2002d948] The following images were not removed: [2a7ac358-b2db-4e3b-a54c-72a589ea23df]
2016-02-04 11:57:44,470 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child commands to complete
2016-02-04 11:57:48,480 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status for step 'DESTROY_IMAGE_CHECK'
2016-02-04 11:57:49,548 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-72a589ea23df' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2016-02-04 11:57:49,745 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands have completed, status 'FAILED'
2016-02-04 11:57:50,773 ERROR [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure.

So the engine is doing the right thing, marking a snapshot we aren't sure about as ILLEGAL.
Question is, what happened to that snapshot on VDSM's side. I can't see anything obvious from the logs, so we'll need to investigate.
Ala, please take lead on this.
Comment 4 Allon Mureinik 2016-02-08 07:32:16 EST
Kevin, what versions are you using?
Comment 5 Ala Hino 2016-02-08 14:48:12 EST
(In reply to Allon Mureinik from comment #3)
> The relevant part of of the engine log:
> 
> 2016-02-04 11:57:39,565 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
> (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id:
> 4da04bd3
> 2016-02-04 11:57:39,565 ERROR
> [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1)
> [2002d948] The following images were not removed:
> [2a7ac358-b2db-4e3b-a54c-72a589ea23df]
> 2016-02-04 11:57:44,470 INFO 
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback]
> (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child
> commands to complete
> 2016-02-04 11:57:48,480 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status
> for step 'DESTROY_IMAGE_CHECK'
> 2016-02-04 11:57:49,548 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot
> '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images
> 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-
> 72a589ea23df' failed. Images have been marked illegal and can no longer be
> previewed or reverted to. Please retry Live Merge on the snapshot to
> complete the operation.
> 2016-02-04 11:57:49,745 INFO 
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback]
> (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands
> have completed, status 'FAILED'
> 2016-02-04 11:57:50,773 ERROR
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand]
> (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command
> 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure.
> 
> So the engine is doing the right thing, marking a snapshot we aren't sure
> about as ILLEGAL.
> Question is, what happened to that snapshot on VDSM's side. I can't see
> anything obvious from the logs, so we'll need to investigate.
> Ala, please take lead on this.

Ack
Comment 6 Ala Hino 2016-02-10 03:17:16 EST
Kevin, 

Could you please take a look at:
https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13

and see whether this is the same use case?
Comment 7 Kevin Alon Goldblatt 2016-02-15 09:17:54 EST
(In reply to Allon Mureinik from comment #4)
> Kevin, what versions are you using?

rhevm-3.6.3-0.1.el6.noarch
vdsm-4.17.20-0.el7ev
Comment 8 Kevin Alon Goldblatt 2016-02-15 09:22:25 EST
(In reply to Ala Hino from comment #6)
> Kevin, 
> 
> Could you please take a look at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13
> 
> and see whether this is the same use case?

In this user case I have 1 snapshot and I am deleting 1 out of the 2 'snapshot disks' from the live snapshot
In the user case from https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 they have 3 snapshots and are deleting the entire middle snapshot
They both seem to be reporting the same error namely: "[org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status
> for step 'DESTROY_IMAGE_CHECK'"
Comment 9 Ala Hino 2016-02-15 11:06:50 EST
Kevin,

It doesn't really matter how many snapshots there are and which one is deleted.

Can you try running the script and see whether it fixes this issue?
Comment 10 Allon Mureinik 2016-03-03 09:58:07 EST
If live merge fails, the volume should indeed become illegal. For now, we have a manual script to fix it (see bug 1302215). Moving forwards, Live Merge should be re-entrant, which is what we're working on now.
Comment 11 Kevin Alon Goldblatt 2016-03-14 09:52:45 EDT
(In reply to Ala Hino from comment #9)
> Kevin,
> 
> It doesn't really matter how many snapshots there are and which one is
> deleted.
> 
> Can you try running the script and see whether it fixes this issue?

Hi,

What script is it that I should run? Where do I get it from?
Comment 12 Ala Hino 2016-03-14 09:55:52 EDT
Here: bug 1308501
Comment 13 Kevin Alon Goldblatt 2016-03-14 10:30:07 EDT
(In reply to Yaniv Kaul from comment #2)
> Kevin 
> - anything in the logs?
 See comment 3 by amerino

> - How reproducible is it?
 See comment 1:

 How reproducible:
 Happened once
Comment 14 Allon Mureinik 2016-03-28 10:06:39 EDT
Pushing out to 3.6.6 as to not risk 3.6.5
Comment 16 Allon Mureinik 2016-05-01 15:13:34 EDT
Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please move this BZ to MODIFIED.
Comment 17 Ala Hino 2016-05-01 15:40:24 EDT
Moved to MODIFIED
Comment 18 Ori Gofen 2016-05-15 09:17 EDT
Created attachment 1157662 [details]
logs

Failed qa:
"2016-05-15 15:48:15,543 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-88) [746a0d1b] Merging of snapshot '5e30fd23-bf1c-45cf-ac6e-0c77215380ab' images '2eeb96
2a-9463-4586-a3e7-07de14059be1'..'4d93eb5d-2fe0-4966-9364-40f7582350c9' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation."

engine - rhevm-3.6.6-0.1.el6.noarch
hypervisor - vdsm-4.17.28-0.el7ev.noarch

steps:
As described above
Comment 19 Ala Hino 2016-05-15 16:19:06 EDT
Ori,

Can you verify that you are using the correct build?

Looking at the engine log file, I don't see messages I expected to see when merge fails.

When live merge failed for the first time, did you try live merge again and see what happen?
Comment 20 Ori Gofen 2016-05-16 07:51:23 EDT
Yes, double checked :)
Comment 21 Ala Hino 2016-05-16 11:55:53 EDT
Ori,

Note that in the scenario we tried together, were one of the disks was unplugged (not active), the behavior was as expected - the volume of the unplugged disk should be ILLEGAL.

Try the same scenario but while both disks are active. In this case, both volumes must be removed when deleting the snapshot
Comment 22 Ori Gofen 2016-05-17 08:03:44 EDT
Well in that case, I will mark this issue as verified, as the above scenario is covered, note that in case that one or more of a snapshot disks is deactivated the snapshot will become illegal, as you commented above.
Comment 23 Allon Mureinik 2016-05-22 07:43:38 EDT
Ala, is there anything we need to document here, or is it documented elsewhere?
Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-.
Comment 24 Ala Hino 2016-05-22 07:47:44 EDT
BZ 1323629 documents the behavior

Note You need to log in before you can comment on or make changes to this bug.