Bug 1410428 - Failure to cold-merge a in-between snapshot
Summary: Failure to cold-merge a in-between snapshot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.1.0-beta
: 4.1.0.2
Assignee: Ala Hino
QA Contact: Raz Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-05 13:19 UTC by Carlos Mestre González
Modified: 2017-02-01 14:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-01 14:53:35 UTC
oVirt Team: Storage
Embargoed:
gklein: ovirt-4.1+
gklein: blocker+


Attachments (Terms of Use)
vdsm and engine logs (1.33 MB, application/x-gzip)
2017-01-05 13:19 UTC, Carlos Mestre González
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 70055 0 master MERGED Cold Merge: Prepare top's volume child 2017-01-17 14:21:07 UTC
oVirt gerrit 70663 0 ovirt-4.1 MERGED Cold Merge: Prepare top's volume child 2017-01-17 14:40:51 UTC

Description Carlos Mestre González 2017-01-05 13:19:48 UTC
Created attachment 1237631 [details]
vdsm and engine logs

Description of problem:
Fails to cold merge a in-between  snapshot

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.0-0.3.beta2.el7.noarch
vdsm-4.19.1-1.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a VM with multiple disks (1 boot + 4 disks) on an ISCSI Domain.
2. Do it 3 times: Start the vm, write a file on the boot disk, shutdown the vm and create a snapshot (snap #0, snap #1, snap #2)
3. Remove the middle snapshot (snap #1)

Actual results:
Operation remove snapshot fails


Expected results:
Operation succeeds and the snapshot is removed.

Additional info:
This works in last 4.0.X build.

2017-01-05 12:34:03,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler10) [54dd3262] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' is in pre-polling  peri
od and should not be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:07,447+02 INFO  [org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallba
ck] (DefaultQuartzScheduler2) [71ee4a1c] Command 'ColdMergeSnapshotSingleDisk' (id: '4340142b-
4f13-4aac-94ae-1e0170ae73c6') waiting on child command id: 'a902b529-9fe5-40a3-b73e-a4946dd393
91' type:'FinalizeMerge' to complete
2017-01-05 12:34:09,460+02 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCa
llback] (DefaultQuartzScheduler2) [54dd3262] Command 'RemoveSnapshot' (id: '50be4d9a-4732-4767
-a458-e44c2d4f8c71') waiting on child command id: '4340142b-4f13-4aac-94ae-1e0170ae73c6' type:
'ColdMergeSnapshotSingleDisk' to complete
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling p
eriod time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartz
Scheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling p
eriod time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (DefaultQuartzScheduler6) [71ee4a1c] Polling and updating Async Tasks: 2 tasks, 1 tasks to poll now
2017-01-05 12:34:13,843+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:15,185+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetAllTasksStatusesVDSCommand] (DefaultQuartzScheduler6) [71ee4a1c] Failed in 'HSMGetAllTasksStatusesVDS' method
2017-01-05 12:34:15,196+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler6) [71ee4a1c] Correlation ID: null, Call Stack: null, Custom Event ID: -1, Message: VDSM host_mixed_1 command failed: cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None
2017-01-05 12:34:15,197+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] Task id '203b21e4-ce7e-4903-9c12-f7736c6a76f4' has passed pre-polling period time and should be polled. Pre-polling period is 60000 millis. 
2017-01-05 12:34:15,197+02 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] SPMAsyncTask::PollTask: Polling task '203b21e4-ce7e-4903-9c12-f7736c6a76f4' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') returned status 'finished', result 'cleanSuccess'.
2017-01-05 12:34:15,203+02 ERROR [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (DefaultQuartzScheduler6) [71ee4a1c] BaseAsyncTask::logEndTaskFailure: Task '203b21e4-ce7e-4903-9c12-f7736c6a76f4' (Parent Command 'FinalizeMerge', Parameters Type 'org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters') ended with failure:
-- Result: 'cleanSuccess'
-- Message: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None, code = 100',
-- Exception: 'VDSGenericException: VDSErrorException: Failed to HSMGetAllTasksStatusesVDS, error = cmd=['/usr/bin/qemu-img', 'rebase', '-t', 'none', '-T', 'none', '-u', '-f', 'qcow2', '-F', 'qcow2', '-b', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/95d55627-4010-4ae0-ae99-f79bbb90836f', '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f'], ecode=1, stdout=[], stderr=["qemu-img: Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': Could not open '/rhev/data-center/mnt/blockSD/98a1756e-d9ad-44cb-8284-c95befa95945/images/8b4279bf-8771-476c-a250-30fec4d3156d/3f630dd4-366a-4413-9727-233afe75bf6f': No such file or directory"], message=None, code = 100'

I'll add logs.

Comment 1 Carlos Mestre González 2017-01-05 13:38:31 UTC
Could be the same or same causes as https://bugzilla.redhat.com/show_bug.cgi?id=1410428 thought the errors are different.

Comment 2 Yaniv Kaul 2017-01-06 20:19:39 UTC
Severity? 
Regression?

Comment 3 Raz Tamir 2017-01-07 17:43:41 UTC
This bug might be the same as https://bugzilla.redhat.com/show_bug.cgi?id=1408977

Comment 4 Ala Hino 2017-01-09 10:51:28 UTC
(In reply to Raz Tamir from comment #3)
> This bug might be the same as
> https://bugzilla.redhat.com/show_bug.cgi?id=1408977

This is a different issue than BZ 1408977. BZ 1408977 happens when cloning a VM from a template and it fails at prepareMerge step. This one occurs when the distance (number of volumes) between the merged volume and the leaf is greater than one. As an example, consider this chain: base -> s1 -> s2 -> s3 -> leaf. When merging base, s1 or s2, we will hit this issue. However, merging s3 works fine. This BZ fails at finalizeMerge step.

Comment 5 Raz Tamir 2017-01-22 18:30:34 UTC
Verified on ovirt-engine-4.1.0.3-0.0.master.20170122091652.gitc6fc2c2.el7.centos

Followed the steps to reproduce and the snapshot removal works


Note You need to log in before you can comment on or make changes to this bug.