Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1134866

Summary: Cold merge of snapshot hangs and leaves snapshot disks in Locked state
Product: [Retired] oVirt Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: ovirt-engine-coreAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5CC: acanan, amureini, derez, ecohen, gklein, iheim, rbalakri, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-17 12:34:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine vdsm and server logs
none
NEW engine vdsm and server logs
none
engine vdsm and server logs none

Description Kevin Alon Goldblatt 2014-08-28 11:35:10 UTC
Created attachment 931888 [details]
engine vdsm and server logs

Description of problem:
Deleting 2 snapshot-disks from seperated snapshots of the same VM disk results in:
removal of one of the snapshot disks and a failed cold merge of second disk

Version-Release number of selected component (if applicable):
ovirt-engine-3.5.0-0.0.master.20140821064931.gitb794d66.el6.noarch
vdsm-4.16.1-6.gita4a4614.el6.x86_64

How reproducible:
All the time

Steps to Reproduce:
1.create a vm with 5 disks 3 thin and 2 scsi preallocated take first snapshot
2.Insall os on one disk, start VM 
3.Write 1 gb of data with dd to the thin disk and create second snapshot
4.Add 2 disks and take third snapshot
6.select 3 thin snapshot-disks (2 of the snapshot-disks are from the same VM disk  and remove them >>> The seperate snapshot disk is deleted. HOWEVER the snapshot disks from the same vm fail to merge

Actual results:
The snapshot-disks from the same VM disk fail to cold merge

Expected results:
The cold merge should succeed

Additional info:
FROM ENGINE LOG>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2014-08-28 09:50:23,180 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] FINISH, MergeSnapshotsVDSCommand, log id: c0d7d55
2014-08-28 09:50:23,199 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command c1d0
58e7-331d-445b-9618-13881dd613cb
2014-08-28 09:50:23,200 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] CommandMultiAsyncTasks::AttachTask: Attaching task d3a05373-0f5c-402b-8
7d9-39340581741e to command c1d058e7-331d-445b-9618-13881dd613cb.
2014-08-28 09:50:23,212 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Adding task d3a05373-0f5c-402b-87d9-39340581741e (Parent Command Remove
DiskSnapshots, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..

REQUEST FOR DELETE STARTS HERE BUT NEVER COMPLETES------------------

2014-08-28 09:50:23,232 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Correlation ID: 637cf1cc, Job ID: a0fb6bcc-4f2a-497
f-ac24-e5803f23748a, Call Stack: null, Custom Event ID: -1, Message: Disk 'vm11_Disk1' from Snapshot(s) 'vm11_snap2, vm11_snap3' of VM 'vm11' deletion was initiated by admin.
2014-08-28 09:50:23,233 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] BaseAsyncTask::startPollingTask: Starting to poll task d3a05373-0f5c-402b-8
7d9-39340581741e.
2014-08-28 09:50:23,234 INFO  [org.ovirt.engine.core.bll.RemoveDiskSnapshotsCommand] (org.ovirt.thread.pool-8-thread-2) [1961ac0f] Lock freed to object EngineLock [exclusiveLocks= key: af408184-4924
-4324-a977-ce8664b6f67a value: DISK
, sharedLocks= ]

Comment 1 Allon Mureinik 2014-08-31 13:48:58 UTC
Daniel, doesn't http://gerrit.ovirt.org/#/c/32173/ fix this one too?

Comment 2 Daniel Erez 2014-08-31 13:59:19 UTC
Yes, according to the logs ([1], [2]), the issue seems similar to bug 1134382.
Moving to MODIFIED.

[1] A network error occurred:
2014-08-27 10:31:46,572 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-75) [25e52e79] Host nott-vds1 is not responding. It will stay in Connecting state for a grace period of $160 seconds and after that an attempt to fence the host will be issued.
2014-08-27 10:31:46,577 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-75) [25e52e79] Failure to refresh Vds runtime info: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.ConnectException: Connection refused

[2] Consequently, snapshot removal failed:
2014-08-27 10:34:49,578 ERROR [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (org.ovirt.thread.pool-8-thread-47) [within thread]: endAction for action type RemoveDiskSnapshots threw an exception.: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.RemoveDiskSnapshotTaskHandler.endWithFailure(RemoveDiskSnapshotTaskHandler.java:103) [bll.jar:]

Comment 3 Kevin Alon Goldblatt 2014-09-29 14:11:50 UTC
Created attachment 942334 [details]
NEW engine vdsm and server logs

Comment 4 Kevin Alon Goldblatt 2014-09-29 14:28:43 UTC
Checked with:
rhevm-3.5.0-0.13.beta.el6ev.noarch
vdsm-4.16.5-2.el6ev.x86_64

I reproduced this bug again as follows: Moving to REOPEN!

Created a VM with 4 disks (2 preallocated and 2 thin)
Created snapshot s1.
Added 2 additional disks (1 preallocated and 1 thin)
Created snapshot s2.
From Storage domain (block storage) select 3 snapshot disks for deletion (2 from snapshot s1 and 1 from snapshot s2)
The 2 snapshot disks from snapshot s1 are successfully deleted.
The 1 snapshot disk from snapshot s2 is not deleted and remains LOCKED

From the engine log:

-----------------------------------------
THE DELETE REQUEST>>>>>>>>>

2014-09-29 15:47:18,968 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-1) [23d5d67a] Correlation ID: 124616ce, Job ID: 651b0f61-6fea-4fd1-ba47-ad4cb0b899aa, Call Stack: null, Custom Event ID: -1, Message: Disk 'vm1_Disk4' from Snapshot(s) 'vm1_s1, vm1_s2(6 disks)' of VM 'vm1' deletion was initiated by admin.


THE CORRELATION ID CONTINUES WITH>>>>>>>>>
2014-09-29 15:49:21,975 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] Correlation ID: 124616ce, Call Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type has been used.
2014-09-29 15:49:21,975 INFO  [org.ovirt.engine.core.bll.tasks.SPMAsyncTask] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] BaseAsyncTask::startPollingTask: Starting to poll task cd15549e-dc92-4add-9ec0-561148656f89.
2014-09-29 15:49:21,981 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-7-thread-34) [4bc728f8] Correlation ID: 124616ce, Call Stack: null, Custom Event ID: -1, Message: Unrecognized audit log type has been used.

Comment 5 Kevin Alon Goldblatt 2014-10-05 20:36:07 UTC
Please indicate in which release this was fixed. Peviously I checked and reopened this BZ with Ver3.5 vt4

Comment 6 Kevin Alon Goldblatt 2014-10-05 20:39:59 UTC
Created attachment 944104 [details]
engine vdsm and server logs

added new logs.

Comment 7 Daniel Erez 2014-10-06 06:04:27 UTC
(In reply to Kevin Alon Goldblatt from comment #5)
> Please indicate in which release this was fixed. Peviously I checked and
> reopened this BZ with Ver3.5 vt4

It's not included in vt4, should be available in a following build.

Comment 8 Daniel Erez 2014-10-06 06:04:43 UTC
*** Bug 1134434 has been marked as a duplicate of this bug. ***

Comment 9 Kevin Alon Goldblatt 2014-10-12 12:05:24 UTC
Moving this bz to verify. I ran the same scenario and none of the disks remained in a locked state. However the snapshot disk failed to delete. I will submit a new bz for this

Comment 10 Sandro Bonazzola 2014-10-17 12:34:51 UTC
oVirt 3.5 has been released and should include the fix for this issue.