Bug 1304761

Summary: Failed to delete disk snapshot - becomes illegal
Product: [oVirt] ovirt-engine Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Aharon Canan <acanan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0.3CC: acanan, ahino, amureini, bazil89, bugs, kgoldbla, sbonazzo, tnisan, ylavi
Target Milestone: ovirt-3.6.6Flags: rule-engine: ovirt-3.6.z+
ylavi: planning_ack+
tnisan: devel_ack+
acanan: testing_ack+
Target Release: 3.6.6.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-30 10:56:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine, vdsm, server logs
none
logs none

Description Kevin Alon Goldblatt 2016-02-04 14:33:31 UTC
Description of problem:
Deleting a Disk snapshot fails and the disk becomes illegal

Version-Release number of selected component (if applicable):


How reproducible:
Happened once

Steps to Reproduce:
1. Create a VM with 2 preallocted block disks
2. Create a snapshot "snapshot1"
3. Start the VM
4. Via the Storage Tab -> Snapshot Disks -> select one of the disks in the list and press Delete >>>> The message displayed reads "Disk 'vm6_Disk1' from Snapshot(s) 'snapshot1' of VM 'vm6' deletion has been completed (User: admin@internal)." BUT a few minutes later get a failed to delete message"Failed to complete deletion of Disk 'vm6_Disk1' from snapshot(s) 'snapshot1' of VM 'vm6' ". The snapshot disk is now displayed as 'Illegal'.
Deleting the second snapshot disk was successful.


Actual results:
The snapshot disk is not deleted and displayed as illegal

Expected results:
The snapshot disk should be deleted

Additional info:

Comment 1 Kevin Alon Goldblatt 2016-02-04 14:44:08 UTC
Created attachment 1121122 [details]
engine, vdsm, server logs

Adding logs

Comment 2 Yaniv Kaul 2016-02-05 23:03:59 UTC
Kevin 
- anything in the logs?
- How reproducible is it?

Comment 3 Allon Mureinik 2016-02-08 12:30:34 UTC
The relevant part of of the engine log:

2016-02-04 11:57:39,565 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: 4da04bd3
2016-02-04 11:57:39,565 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) [2002d948] The following images were not removed: [2a7ac358-b2db-4e3b-a54c-72a589ea23df]
2016-02-04 11:57:44,470 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child commands to complete
2016-02-04 11:57:48,480 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status for step 'DESTROY_IMAGE_CHECK'
2016-02-04 11:57:49,548 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-72a589ea23df' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation.
2016-02-04 11:57:49,745 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands have completed, status 'FAILED'
2016-02-04 11:57:50,773 ERROR [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure.

So the engine is doing the right thing, marking a snapshot we aren't sure about as ILLEGAL.
Question is, what happened to that snapshot on VDSM's side. I can't see anything obvious from the logs, so we'll need to investigate.
Ala, please take lead on this.

Comment 4 Allon Mureinik 2016-02-08 12:32:16 UTC
Kevin, what versions are you using?

Comment 5 Ala Hino 2016-02-08 19:48:12 UTC
(In reply to Allon Mureinik from comment #3)
> The relevant part of of the engine log:
> 
> 2016-02-04 11:57:39,565 INFO 
> [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand]
> (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id:
> 4da04bd3
> 2016-02-04 11:57:39,565 ERROR
> [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1)
> [2002d948] The following images were not removed:
> [2a7ac358-b2db-4e3b-a54c-72a589ea23df]
> 2016-02-04 11:57:44,470 INFO 
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback]
> (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child
> commands to complete
> 2016-02-04 11:57:48,480 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status
> for step 'DESTROY_IMAGE_CHECK'
> 2016-02-04 11:57:49,548 ERROR
> [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot
> '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images
> 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-
> 72a589ea23df' failed. Images have been marked illegal and can no longer be
> previewed or reverted to. Please retry Live Merge on the snapshot to
> complete the operation.
> 2016-02-04 11:57:49,745 INFO 
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback]
> (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands
> have completed, status 'FAILED'
> 2016-02-04 11:57:50,773 ERROR
> [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand]
> (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command
> 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure.
> 
> So the engine is doing the right thing, marking a snapshot we aren't sure
> about as ILLEGAL.
> Question is, what happened to that snapshot on VDSM's side. I can't see
> anything obvious from the logs, so we'll need to investigate.
> Ala, please take lead on this.

Ack

Comment 6 Ala Hino 2016-02-10 08:17:16 UTC
Kevin, 

Could you please take a look at:
https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13

and see whether this is the same use case?

Comment 7 Kevin Alon Goldblatt 2016-02-15 14:17:54 UTC
(In reply to Allon Mureinik from comment #4)
> Kevin, what versions are you using?

rhevm-3.6.3-0.1.el6.noarch
vdsm-4.17.20-0.el7ev

Comment 8 Kevin Alon Goldblatt 2016-02-15 14:22:25 UTC
(In reply to Ala Hino from comment #6)
> Kevin, 
> 
> Could you please take a look at:
> https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13
> 
> and see whether this is the same use case?

In this user case I have 1 snapshot and I am deleting 1 out of the 2 'snapshot disks' from the live snapshot
In the user case from https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 they have 3 snapshots and are deleting the entire middle snapshot
They both seem to be reporting the same error namely: "[org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand]
> (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status
> for step 'DESTROY_IMAGE_CHECK'"

Comment 9 Ala Hino 2016-02-15 16:06:50 UTC
Kevin,

It doesn't really matter how many snapshots there are and which one is deleted.

Can you try running the script and see whether it fixes this issue?

Comment 10 Allon Mureinik 2016-03-03 14:58:07 UTC
If live merge fails, the volume should indeed become illegal. For now, we have a manual script to fix it (see bug 1302215). Moving forwards, Live Merge should be re-entrant, which is what we're working on now.

Comment 11 Kevin Alon Goldblatt 2016-03-14 13:52:45 UTC
(In reply to Ala Hino from comment #9)
> Kevin,
> 
> It doesn't really matter how many snapshots there are and which one is
> deleted.
> 
> Can you try running the script and see whether it fixes this issue?

Hi,

What script is it that I should run? Where do I get it from?

Comment 12 Ala Hino 2016-03-14 13:55:52 UTC
Here: bug 1308501

Comment 13 Kevin Alon Goldblatt 2016-03-14 14:30:07 UTC
(In reply to Yaniv Kaul from comment #2)
> Kevin 
> - anything in the logs?
 See comment 3 by amerino

> - How reproducible is it?
 See comment 1:

 How reproducible:
 Happened once

Comment 14 Allon Mureinik 2016-03-28 14:06:39 UTC
Pushing out to 3.6.6 as to not risk 3.6.5

Comment 16 Allon Mureinik 2016-05-01 19:13:34 UTC
Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please move this BZ to MODIFIED.

Comment 17 Ala Hino 2016-05-01 19:40:24 UTC
Moved to MODIFIED

Comment 18 Ori Gofen 2016-05-15 13:17:14 UTC
Created attachment 1157662 [details]
logs

Failed qa:
"2016-05-15 15:48:15,543 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-88) [746a0d1b] Merging of snapshot '5e30fd23-bf1c-45cf-ac6e-0c77215380ab' images '2eeb96
2a-9463-4586-a3e7-07de14059be1'..'4d93eb5d-2fe0-4966-9364-40f7582350c9' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation."

engine - rhevm-3.6.6-0.1.el6.noarch
hypervisor - vdsm-4.17.28-0.el7ev.noarch

steps:
As described above

Comment 19 Ala Hino 2016-05-15 20:19:06 UTC
Ori,

Can you verify that you are using the correct build?

Looking at the engine log file, I don't see messages I expected to see when merge fails.

When live merge failed for the first time, did you try live merge again and see what happen?

Comment 20 Ori Gofen 2016-05-16 11:51:23 UTC
Yes, double checked :)

Comment 21 Ala Hino 2016-05-16 15:55:53 UTC
Ori,

Note that in the scenario we tried together, were one of the disks was unplugged (not active), the behavior was as expected - the volume of the unplugged disk should be ILLEGAL.

Try the same scenario but while both disks are active. In this case, both volumes must be removed when deleting the snapshot

Comment 22 Ori Gofen 2016-05-17 12:03:44 UTC
Well in that case, I will mark this issue as verified, as the above scenario is covered, note that in case that one or more of a snapshot disks is deactivated the snapshot will become illegal, as you commented above.

Comment 23 Allon Mureinik 2016-05-22 11:43:38 UTC
Ala, is there anything we need to document here, or is it documented elsewhere?
Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-.

Comment 24 Ala Hino 2016-05-22 11:47:44 UTC
BZ 1323629 documents the behavior