Bug 1306367
Summary: | Live Merge - recovery flow not working properly when DESTROY_IMAGE_CHECK command fails | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Greg Padgett <gpadgett> |
Component: | BLL.Storage | Assignee: | Greg Padgett <gpadgett> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Elad <ebenahar> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.2 | CC: | acanan, ahino, amureini, bugs, gklein, gpadgett, sbonazzo, tnisan |
Target Milestone: | ovirt-3.6.3 | Flags: | rule-engine:
ovirt-3.6.z+
rule-engine: planning_ack+ tnisan: devel_ack+ rule-engine: testing_ack+ |
Target Release: | 3.6.3.2 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-03-10 12:49:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Greg Padgett
2016-02-10 16:55:14 UTC
(In reply to Greg Padgett from comment #0) > 2. It looks like something changed the way the database records are marked > in the failure case in general, and the images are no longer being marked > illegal. Consequently, the Live Merge code won't think it has ever > attempted a merge on the image. Even worse, other operations such as > snapshot preview may attempt to use a partially-merged snapshot. If it > appears to work and the user commits the preview, the vm may end up with > corrupt data. > > Version-Release number of selected component (if applicable): > 3.6.2 (for problem #1 above; still investigating problem #2) The second problem does not apply to the 3.6 branch. Bug tickets must have version flags set prior to targeting them to a release. Please ask maintainer to set the correct version flags and only then set the target milestone. Hi Greg, can you please provide us a way to reproduce the bug? Hi Elad, Basically, to reproduce this issue we have to fake the result of getVolumeInfo after invoking DESTROY_IMAGE. This can done by stopping vdsm/shutdown the host after you see in the log that DESTROY_IMAGE completed. Thanks, Tested accordingly: Rebooted the hypervisor right after I saw the following in engine.log: 2016-03-02 07:36:53,715 INFO [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-5) [2e486e8c] Running command: DestroyImageCheckCommand internal: true. 2016-03-02 07:36:55,906 INFO [org.ovirt.engine.core.bll.DestroyImageCheckCommand] (pool-5-thread-5) [2e486e8c] Requested images were successfully removed Live merge operation was marked as failed after few minutes: 2016-03-02 07:41:47,747 INFO [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-68) [7162a4dd] All Live Merge child commands have complete d, status 'FAILED' 2016-03-02 07:41:48,792 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (DefaultQuartzScheduler_Worker-9) [7162a4dd] Ending command 'org.ovirt.engine.core.bll.RemoveSnap shotCommand' with failure. 2016-03-02 07:41:48,869 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-9) [7162a4dd] Correlation ID: 7162a4dd, Job I D: ba5075ea-36b0-4e79-8218-549e02412b95, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot '1' for VM 'test1'. Using: rhevm-3.6.3.3-0.1.el6.noarch vdsm-4.17.23-0.el7ev.noarch Greg, please confirm that the test was done properly and the results are the expected so I'll verify, thanks. Elad, The results seem to be as expected. Can you run the manual recovery script (found in bug 1308501) and see whether it fixes the database? Thanks Ala, Executed the script and the snapshot got removed successfully. Based on this and on comment #5, moving to VERIFIED. |