Description of problem: Deleting a Disk snapshot fails and the disk becomes illegal Version-Release number of selected component (if applicable): How reproducible: Happened once Steps to Reproduce: 1. Create a VM with 2 preallocted block disks 2. Create a snapshot "snapshot1" 3. Start the VM 4. Via the Storage Tab -> Snapshot Disks -> select one of the disks in the list and press Delete >>>> The message displayed reads "Disk 'vm6_Disk1' from Snapshot(s) 'snapshot1' of VM 'vm6' deletion has been completed (User: admin@internal)." BUT a few minutes later get a failed to delete message"Failed to complete deletion of Disk 'vm6_Disk1' from snapshot(s) 'snapshot1' of VM 'vm6' ". The snapshot disk is now displayed as 'Illegal'. Deleting the second snapshot disk was successful. Actual results: The snapshot disk is not deleted and displayed as illegal Expected results: The snapshot disk should be deleted Additional info:
Created attachment 1121122 [details] engine, vdsm, server logs Adding logs
Kevin - anything in the logs? - How reproducible is it?
The relevant part of of the engine log: 2016-02-04 11:57:39,565 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: 4da04bd3 2016-02-04 11:57:39,565 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) [2002d948] The following images were not removed: [2a7ac358-b2db-4e3b-a54c-72a589ea23df] 2016-02-04 11:57:44,470 INFO [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child commands to complete 2016-02-04 11:57:48,480 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status for step 'DESTROY_IMAGE_CHECK' 2016-02-04 11:57:49,548 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-72a589ea23df' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation. 2016-02-04 11:57:49,745 INFO [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands have completed, status 'FAILED' 2016-02-04 11:57:50,773 ERROR [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure. So the engine is doing the right thing, marking a snapshot we aren't sure about as ILLEGAL. Question is, what happened to that snapshot on VDSM's side. I can't see anything obvious from the logs, so we'll need to investigate. Ala, please take lead on this.
Kevin, what versions are you using?
(In reply to Allon Mureinik from comment #3) > The relevant part of of the engine log: > > 2016-02-04 11:57:39,565 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] > (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: > 4da04bd3 > 2016-02-04 11:57:39,565 ERROR > [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) > [2002d948] The following images were not removed: > [2a7ac358-b2db-4e3b-a54c-72a589ea23df] > 2016-02-04 11:57:44,470 INFO > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] > (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child > commands to complete > 2016-02-04 11:57:48,480 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status > for step 'DESTROY_IMAGE_CHECK' > 2016-02-04 11:57:49,548 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot > '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images > 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c- > 72a589ea23df' failed. Images have been marked illegal and can no longer be > previewed or reverted to. Please retry Live Merge on the snapshot to > complete the operation. > 2016-02-04 11:57:49,745 INFO > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] > (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands > have completed, status 'FAILED' > 2016-02-04 11:57:50,773 ERROR > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] > (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command > 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure. > > So the engine is doing the right thing, marking a snapshot we aren't sure > about as ILLEGAL. > Question is, what happened to that snapshot on VDSM's side. I can't see > anything obvious from the logs, so we'll need to investigate. > Ala, please take lead on this. Ack
Kevin, Could you please take a look at: https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 and see whether this is the same use case?
(In reply to Allon Mureinik from comment #4) > Kevin, what versions are you using? rhevm-3.6.3-0.1.el6.noarch vdsm-4.17.20-0.el7ev
(In reply to Ala Hino from comment #6) > Kevin, > > Could you please take a look at: > https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 > > and see whether this is the same use case? In this user case I have 1 snapshot and I am deleting 1 out of the 2 'snapshot disks' from the live snapshot In the user case from https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 they have 3 snapshots and are deleting the entire middle snapshot They both seem to be reporting the same error namely: "[org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status > for step 'DESTROY_IMAGE_CHECK'"
Kevin, It doesn't really matter how many snapshots there are and which one is deleted. Can you try running the script and see whether it fixes this issue?
If live merge fails, the volume should indeed become illegal. For now, we have a manual script to fix it (see bug 1302215). Moving forwards, Live Merge should be re-entrant, which is what we're working on now.
(In reply to Ala Hino from comment #9) > Kevin, > > It doesn't really matter how many snapshots there are and which one is > deleted. > > Can you try running the script and see whether it fixes this issue? Hi, What script is it that I should run? Where do I get it from?
Here: bug 1308501
(In reply to Yaniv Kaul from comment #2) > Kevin > - anything in the logs? See comment 3 by amerino > - How reproducible is it? See comment 1: How reproducible: Happened once
Pushing out to 3.6.6 as to not risk 3.6.5
Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please move this BZ to MODIFIED.
Moved to MODIFIED
Created attachment 1157662 [details] logs Failed qa: "2016-05-15 15:48:15,543 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-88) [746a0d1b] Merging of snapshot '5e30fd23-bf1c-45cf-ac6e-0c77215380ab' images '2eeb96 2a-9463-4586-a3e7-07de14059be1'..'4d93eb5d-2fe0-4966-9364-40f7582350c9' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation." engine - rhevm-3.6.6-0.1.el6.noarch hypervisor - vdsm-4.17.28-0.el7ev.noarch steps: As described above
Ori, Can you verify that you are using the correct build? Looking at the engine log file, I don't see messages I expected to see when merge fails. When live merge failed for the first time, did you try live merge again and see what happen?
Yes, double checked :)
Ori, Note that in the scenario we tried together, were one of the disks was unplugged (not active), the behavior was as expected - the volume of the unplugged disk should be ILLEGAL. Try the same scenario but while both disks are active. In this case, both volumes must be removed when deleting the snapshot
Well in that case, I will mark this issue as verified, as the above scenario is covered, note that in case that one or more of a snapshot disks is deactivated the snapshot will become illegal, as you commented above.
Ala, is there anything we need to document here, or is it documented elsewhere? Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-.
BZ 1323629 documents the behavior