Bug 1410428

Summary: Failure to cold-merge a in-between snapshot
Product: [oVirt] ovirt-engine Reporter: Carlos Mestre González <cmestreg>
Component: BLL.StorageAssignee: Ala Hino <ahino>
Status: CLOSED CURRENTRELEASE QA Contact: Raz Tamir <ratamir>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: bugs, cmestreg, gklein, ratamir
Target Milestone: ovirt-4.1.0-betaKeywords: Regression
Target Release: 4.1.0.2Flags: gklein: ovirt-4.1+
gklein: blocker+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-01 14:53:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vdsm and engine logs none

Description Carlos Mestre González 2017-01-05 13:19:48 UTC
Created attachment 1237631 [details]
vdsm and engine logs

Description of problem:
Fails to cold merge a in-between  snapshot

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.3.beta2.el7.noarch
vdsm-4.19.1-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with multiple disks (1 boot + 4 disks) on an ISCSI Domain.
2. Do it 3 times: Start the vm, write a file on the boot disk, shutdown the vm and create a snapshot (snap #0, snap #1, snap #2)
3. Remove the middle snapshot (snap #1)

Actual results:
Operation remove snapshot fails


Expected results:
Operation succeeds and the snapshot is removed.

Additional info:
This works in last 4.0.X build.

2017-01-05 12:34:03,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler10) [54dd3262] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' is in pre-polling  peri
od and should not be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:07,447+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallba
ck] (DefaultQuartzScheduler2) [71ee4a1c] Command 'ColdMergeSnapshotSingleDisk' (id: '4340142b-
4f13-4aac-94ae-1e0170ae73c6') waiting on child command id: 'a902b529-9fe5-40a3-b73e-a4946dd393
91' type:'FinalizeMerge' to complete
2017-01-05 12:34:09,460+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCa
llback] (DefaultQuartzScheduler2) [54dd3262] Command 'RemoveSnapshot' (id: '50be4d9a-4732-4767
-a458-e44c2d4f8c71') waiting on child command id: '4340142b-4f13-4aac-94ae-1e0170ae73c6' type:
'ColdMergeSnapshotSingleDisk' to complete
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling p
eriod time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling p
eriod time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler6) [71ee4a1c] Polling and updating Async Tasks: 2 tasks, 1 tasks to poll now
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:15,185+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler6) [71ee4a1c] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-01-05 12:34:15,196+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [71ee4a1c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None
2017-01-05 12:34:15,197+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:15,197+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] SPMAsyncTask::PollTask: Polling task '203b21e4-ce7e-4903-9c12-f7736c6a76f4' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2017-01-05 12:34:15,203+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] BaseAsyncTask::logEndTaskFailure: Task '203b21e4-ce7e-4903-9c12-f7736c6a76f4' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None, code = 100',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None, code = 100'

I'll add logs.

Comment 1 Carlos Mestre González 2017-01-05 13:38:31 UTC
Could be the same or same causes as https://bugzilla.redhat.com/show_bug.cgi?id=1410428 thought the errors are different.

Comment 2 Yaniv Kaul 2017-01-06 20:19:39 UTC
Severity? 
Regression?

Comment 3 Raz Tamir 2017-01-07 17:43:41 UTC
This bug might be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1408977

Comment 4 Ala Hino 2017-01-09 10:51:28 UTC
(In reply to Raz Tamir from comment #3)
> This bug might be the same as
> https://bugzilla.redhat.com/show_bug.cgi?id=1408977

This is a different issue than BZ 1408977. BZ 1408977 happens when cloning a VM from a template and it fails at prepareMerge step. This one occurs when the distance (number of volumes) between the merged volume and the leaf is greater than one. As an example, consider this chain: base -> s1 -> s2 -> s3 -> leaf. When merging base, s1 or s2, we will hit this issue. However, merging s3 works fine. This BZ fails at finalizeMerge step.

Comment 5 Raz Tamir 2017-01-22 18:30:34 UTC
Verified on ovirt-engine-4.1.0.3-0.0.master.20170122091652.gitc6fc2c2.el7.centos

Followed the steps to reproduce and the snapshot removal works