Bug 1304761
Summary: | Failed to delete disk snapshot - becomes illegal | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Kevin Alon Goldblatt <kgoldbla> | ||||||
Component: | BLL.Storage | Assignee: | Ala Hino <ahino> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Aharon Canan <acanan> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.6.0.3 | CC: | acanan, ahino, amureini, bazil89, bugs, kgoldbla, sbonazzo, tnisan, ylavi | ||||||
Target Milestone: | ovirt-3.6.6 | Flags: | rule-engine:
ovirt-3.6.z+
ylavi: planning_ack+ tnisan: devel_ack+ acanan: testing_ack+ |
||||||
Target Release: | 3.6.6.1 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2016-05-30 10:56:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Kevin Alon Goldblatt
2016-02-04 14:33:31 UTC
Created attachment 1121122 [details]
engine, vdsm, server logs
Adding logs
Kevin - anything in the logs? - How reproducible is it? The relevant part of of the engine log: 2016-02-04 11:57:39,565 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: 4da04bd3 2016-02-04 11:57:39,565 ERROR [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) [2002d948] The following images were not removed: [2a7ac358-b2db-4e3b-a54c-72a589ea23df] 2016-02-04 11:57:44,470 INFO [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child commands to complete 2016-02-04 11:57:48,480 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status for step 'DESTROY_IMAGE_CHECK' 2016-02-04 11:57:49,548 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c-72a589ea23df' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation. 2016-02-04 11:57:49,745 INFO [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands have completed, status 'FAILED' 2016-02-04 11:57:50,773 ERROR [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure. So the engine is doing the right thing, marking a snapshot we aren't sure about as ILLEGAL. Question is, what happened to that snapshot on VDSM's side. I can't see anything obvious from the logs, so we'll need to investigate. Ala, please take lead on this. Kevin, what versions are you using? (In reply to Allon Mureinik from comment #3) > The relevant part of of the engine log: > > 2016-02-04 11:57:39,565 INFO > [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVolumeInfoVDSCommand] > (pool-5-thread-1) [2002d948] FINISH, GetVolumeInfoVDSCommand, log id: > 4da04bd3 > 2016-02-04 11:57:39,565 ERROR > [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-1) > [2002d948] The following images were not removed: > [2a7ac358-b2db-4e3b-a54c-72a589ea23df] > 2016-02-04 11:57:44,470 INFO > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] > (DefaultQuartzScheduler_Worker-76) [57b97b01] Waiting on Live Merge child > commands to complete > 2016-02-04 11:57:48,480 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status > for step 'DESTROY_IMAGE_CHECK' > 2016-02-04 11:57:49,548 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-22) [11a340cf] Merging of snapshot > '56b3ff90-0ac2-41a3-bf43-dd50e52eed28' images > 'b8ad02b5-557b-487d-bde3-7aaab295f518'..'2a7ac358-b2db-4e3b-a54c- > 72a589ea23df' failed. Images have been marked illegal and can no longer be > previewed or reverted to. Please retry Live Merge on the snapshot to > complete the operation. > 2016-02-04 11:57:49,745 INFO > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommandCallback] > (DefaultQuartzScheduler_Worker-22) [11a340cf] All Live Merge child commands > have completed, status 'FAILED' > 2016-02-04 11:57:50,773 ERROR > [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] > (DefaultQuartzScheduler_Worker-12) [8e98c54] Ending command > 'org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand' with failure. > > So the engine is doing the right thing, marking a snapshot we aren't sure > about as ILLEGAL. > Question is, what happened to that snapshot on VDSM's side. I can't see > anything obvious from the logs, so we'll need to investigate. > Ala, please take lead on this. Ack Kevin, Could you please take a look at: https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 and see whether this is the same use case? (In reply to Allon Mureinik from comment #4) > Kevin, what versions are you using? rhevm-3.6.3-0.1.el6.noarch vdsm-4.17.20-0.el7ev (In reply to Ala Hino from comment #6) > Kevin, > > Could you please take a look at: > https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 > > and see whether this is the same use case? In this user case I have 1 snapshot and I am deleting 1 out of the 2 'snapshot disks' from the live snapshot In the user case from https://bugzilla.redhat.com/show_bug.cgi?id=1302215#c13 they have 3 snapshots and are deleting the entire middle snapshot They both seem to be reporting the same error namely: "[org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] > (DefaultQuartzScheduler_Worker-24) [11a340cf] Failed child command status > for step 'DESTROY_IMAGE_CHECK'" Kevin, It doesn't really matter how many snapshots there are and which one is deleted. Can you try running the script and see whether it fixes this issue? If live merge fails, the volume should indeed become illegal. For now, we have a manual script to fix it (see bug 1302215). Moving forwards, Live Merge should be re-entrant, which is what we're working on now. (In reply to Ala Hino from comment #9) > Kevin, > > It doesn't really matter how many snapshots there are and which one is > deleted. > > Can you try running the script and see whether it fixes this issue? Hi, What script is it that I should run? Where do I get it from? Here: bug 1308501 (In reply to Yaniv Kaul from comment #2) > Kevin > - anything in the logs? See comment 3 by amerino > - How reproducible is it? See comment 1: How reproducible: Happened once Pushing out to 3.6.6 as to not risk 3.6.5 Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please move this BZ to MODIFIED. Moved to MODIFIED Created attachment 1157662 [details]
logs
Failed qa:
"2016-05-15 15:48:15,543 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskLiveCommand] (DefaultQuartzScheduler_Worker-88) [746a0d1b] Merging of snapshot '5e30fd23-bf1c-45cf-ac6e-0c77215380ab' images '2eeb96
2a-9463-4586-a3e7-07de14059be1'..'4d93eb5d-2fe0-4966-9364-40f7582350c9' failed. Images have been marked illegal and can no longer be previewed or reverted to. Please retry Live Merge on the snapshot to complete the operation."
engine - rhevm-3.6.6-0.1.el6.noarch
hypervisor - vdsm-4.17.28-0.el7ev.noarch
steps:
As described above
Ori, Can you verify that you are using the correct build? Looking at the engine log file, I don't see messages I expected to see when merge fails. When live merge failed for the first time, did you try live merge again and see what happen? Yes, double checked :) Ori, Note that in the scenario we tried together, were one of the disks was unplugged (not active), the behavior was as expected - the volume of the unplugged disk should be ILLEGAL. Try the same scenario but while both disks are active. In this case, both volumes must be removed when deleting the snapshot Well in that case, I will mark this issue as verified, as the above scenario is covered, note that in case that one or more of a snapshot disks is deactivated the snapshot will become illegal, as you commented above. Ala, is there anything we need to document here, or is it documented elsewhere? Please either provide the doctext, or comment with the BZ tracking the doc text and set requires-doctext-. BZ 1323629 documents the behavior |