Bug 1134866 - Cold merge of snapshot hangs and leaves snapshot disks in Locked state
Summary: Cold merge of snapshot hangs and leaves snapshot disks in Locked state
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: oVirt
Classification: Retired
Component: ovirt-engine-core
Version: 3.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.5.0
Assignee: Daniel Erez
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard: storage
: 1134434 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-08-28 11:35 UTC by Kevin Alon Goldblatt
Modified: 2016-02-10 16:56 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2014-10-17 12:34:51 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
engine vdsm and server logs (4.16 MB, application/octet-stream)
2014-08-28 11:35 UTC, Kevin Alon Goldblatt
no flags Details
NEW engine vdsm and server logs (993.22 KB, application/x-gzip)
2014-09-29 14:11 UTC, Kevin Alon Goldblatt
no flags Details
engine vdsm and server logs (852.37 KB, application/x-gzip)
2014-10-05 20:39 UTC, Kevin Alon Goldblatt
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 32173 0 None None None Never
oVirt gerrit 33603 0 master MERGED core: RemoveDiskSnapshots - verify image existence Never
oVirt gerrit 33618 0 ovirt-engine-3.5 MERGED core: RemoveDiskSnapshots - verify image existence Never

Description Kevin Alon Goldblatt 2014-08-28 11:35:10 UTC
Created attachment 931888 [details]
engine vdsm and server logs

Description of problem:
Deleting 2 snapshot-disks from seperated snapshots of the same VM disk results in:
removal of one of the snapshot disks and a failed cold merge of second disk

Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.1-6.gita4a4614.el6.x86_64

How reproducible:
All the time

Steps to Reproduce:
1.create a vm with 5 disks 3 thin and 2 scsi preallocated take first snapshot
2.Insall os on one disk, start VM 
3.Write 1 gb of data with dd to the thin disk and create second snapshot
4.Add 2 disks and take third snapshot
6.select 3 thin snapshot-disks (2 of the snapshot-disks are from the same VM disk  and remove them >>> The seperate snapshot disk is deleted. HOWEVER the snapshot disks from the same vm fail to merge

Actual results:
The snapshot-disks from the same VM disk fail to cold merge

Expected results:
The cold merge should succeed

Additional info:
FROM ENGINE LOG>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2014-08-28 09:50:23,180 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] FINISH, MergeSnapshotsVDSCommand, log id: c0d7d55
2014-08-28 09:50:23,199 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command c1d0
58e7-331d-445b-9618-13881dd613cb
2014-08-28 09:50:23,200 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] CommandMultiAsyncTasks::AttachTask: Attaching task d3a05373-0f5c-402b-8
7d9-39340581741e to command c1d058e7-331d-445b-9618-13881dd613cb.
2014-08-28 09:50:23,212 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Adding task d3a05373-0f5c-402b-87d9-39340581741e (Parent Command Remove
DiskSnapshots, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..

REQUEST FOR DELETE STARTS HERE BUT NEVER COMPLETES------------------

2014-08-28 09:50:23,232 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Correlation ID: 637cf1cc, Job ID: a0fb6bcc-4f2a-497
f-ac24-e5803f23748a, Call Stack: null, Custom Event ID: -1, Message: Disk 'vm11_Disk1' from Snapshot(s) 'vm11_snap2, vm11_snap3' of VM 'vm11' deletion was initiated by admin.
2014-08-28 09:50:23,233 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] BaseAsyncTask::startPollingTask: Starting to poll task d3a05373-0f5c-402b-8
7d9-39340581741e.
2014-08-28 09:50:23,234 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Lock freed to object EngineLock [exclusiveLocks= key: af408184-4924
-4324-a977-ce8664b6f67a value: DISK
, sharedLocks= ]

Comment 1 Allon Mureinik 2014-08-31 13:48:58 UTC
Daniel, doesn't http://gerrit.ovirt.org/#/c/32173/ fix this one too?

Comment 2 Daniel Erez 2014-08-31 13:59:19 UTC
Yes, according to the logs ([1], [2]), the issue seems similar to bug 1134382.
Moving to MODIFIED.

[1] A network error occurred:
2014-08-27 10:31:46,572 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-75) [25e52e79] Host nott-vds1 is not responding. It will stay in Connecting state for a grace period of $160 seconds and after that an attempt to fence the host will be issued.
2014-08-27 10:31:46,577 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-75) [25e52e79] Failure to refresh Vds runtime info: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused

[2] Consequently, snapshot removal failed:
2014-08-27 10:34:49,578 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-8-thread-47) [within thread]: endAction for action type RemoveDiskSnapshots threw an exception.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.RemoveDiskSnapshotTaskHandler.endWithFailure(RemoveDiskSnapshotTaskHandler.java:103) [bll.jar:]

Comment 3 Kevin Alon Goldblatt 2014-09-29 14:11:50 UTC
Created attachment 942334 [details]
NEW engine vdsm and server logs

Comment 4 Kevin Alon Goldblatt 2014-09-29 14:28:43 UTC
Checked with:
rhevm-3.5.0-0.13.beta.el6ev.noarch
vdsm-4.16.5-2.el6ev.x86_64

I reproduced this bug again as follows: Moving to REOPEN!

Created a VM with 4 disks (2 preallocated and 2 thin)
Created snapshot s1.
Added 2 additional disks (1 preallocated and 1 thin)
Created snapshot s2.
From Storage domain (block storage) select 3 snapshot disks for deletion (2 from snapshot s1 and 1 from snapshot s2)
The 2 snapshot disks from snapshot s1 are successfully deleted.
The 1 snapshot disk from snapshot s2 is not deleted and remains LOCKED

From the engine log:

-----------------------------------------
THE DELETE REQUEST>>>>>>>>>

2014-09-29 15:47:18,968 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-1) [23d5d67a] Correlation ID: 124616ce, Job ID: 651b0f61-6fea-4fd1-ba47-ad4cb0b899aa, Call Stack: null, Custom Event ID: -1, Message: Disk 'vm1_Disk4' from Snapshot(s) 'vm1_s1, vm1_s2(6 disks)' of VM 'vm1' deletion was initiated by admin.


THE CORRELATION ID CONTINUES WITH>>>>>>>>>
2014-09-29 15:49:21,975 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] Correlation ID: 124616ce, Call Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type has been used.
2014-09-29 15:49:21,975 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] BaseAsyncTask::startPollingTask: Starting to poll task cd15549e-dc92-4add-9ec0-561148656f89.
2014-09-29 15:49:21,981 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] Correlation ID: 124616ce, Call Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type has been used.

Comment 5 Kevin Alon Goldblatt 2014-10-05 20:36:07 UTC
Please indicate in which release this was fixed. Peviously I checked and reopened this BZ with Ver3.5 vt4

Comment 6 Kevin Alon Goldblatt 2014-10-05 20:39:59 UTC
Created attachment 944104 [details]
engine vdsm and server logs

added new logs.

Comment 7 Daniel Erez 2014-10-06 06:04:27 UTC
(In reply to Kevin Alon Goldblatt from comment #5)
> Please indicate in which release this was fixed. Peviously I checked and
> reopened this BZ with Ver3.5 vt4

It's not included in vt4, should be available in a following build.

Comment 8 Daniel Erez 2014-10-06 06:04:43 UTC
*** Bug 1134434 has been marked as a duplicate of this bug. ***

Comment 9 Kevin Alon Goldblatt 2014-10-12 12:05:24 UTC
Moving this bz to verify. I ran the same scenario and none of the disks remained in a locked state. However the snapshot disk failed to delete. I will submit a new bz for this

Comment 10 Sandro Bonazzola 2014-10-17 12:34:51 UTC
oVirt 3.5 has been released and should include the fix for this issue.


Note You need to log in before you can comment on or make changes to this bug.