Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1157222

Summary: Deleting all snapshot disks during a vdsm restart causes some of the snapshot disks to stay in locked status
Product: Red Hat Enterprise Virtualization Manager Reporter: lkuchlan <lkuchlan>
Component: ovirt-engineAssignee: Daniel Erez <derez>
Status: CLOSED CURRENTRELEASE QA Contact: lkuchlan <lkuchlan>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, derez, ecohen, gklein, gpadgett, iheim, lkuchlan, lpeer, lsurette, rbalakri, Rhev-m-bugs, rnori, scohen, tnisan, yeylon
Target Milestone: ---   
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: org.ovirt.engine-root-3.5.0-25 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
logs and image none

Description lkuchlan 2014-10-26 10:22:22 UTC
Created attachment 950767 [details]
logs

Description of problem:
Deleting all snapshot disks during a vdsm restart causes some of the snapshot disks to stay in locked status

Version-Release number of selected component (if applicable):
3.5 vt5 

How reproducible:
100%

Steps to Reproduce:
1. Add a VM with 3 disks (using an NFS domain), create a snapshot
2. Add another 2 disks and create another snapshot
3. Select all volumes from the snapshot overview and remove them
4. During the removal process, restart the vdsm service

Actual results:
Some of the snapshot volumes remain in locked status 

Expected results:
The volumes snapshots should be deleted, or else the volumes should not be locked if the deletion does not take place

Comment 1 Tal Nisan 2014-10-28 10:35:22 UTC
Daniel, you handled something similar, isn't this already solved?

Comment 2 Daniel Erez 2014-10-28 11:45:57 UTC
(In reply to Tal Nisan from comment #1)
> Daniel, you handled something similar, isn't this already solved?

Yes, should already be solved. Added the relevant oVirt gerrit external tracker.

Comment 3 lkuchlan 2014-11-25 13:11:42 UTC
tested using RHEVM 3.5 vt11
most of the snapshot volumes still remain in locked status

Comment 4 Daniel Erez 2014-11-26 19:17:04 UTC
Hi Liron,

Can you please attach the recent logs from engine and vdsm.

Comment 5 lkuchlan 2014-11-29 20:49:53 UTC
Created attachment 962810 [details]
logs and image

Please find attached the logs

Comment 6 Daniel Erez 2014-11-30 11:28:12 UTC
Hi Ravi,

Looking at the engine log, several MergeSnapshotsVDSCommand had a connectivity error [1] upon sending to VDSM (as the service been stopped), which has resulted in a transaction roll-back of the RemoveSnapshotSingleDiskCommand [2].

I.e. upon failure, only 'CommandBase -> rollback' was executed. Can we invoke the compensation mechanism in this flow so the disk could get properly unlocked?
Or just manually override 'rollback' method?
The flow: RemoveSnapshotSingleDiskCommand is invoked as an internal command
by RemoveDiskSnapshotTaskHandler, which is called by RemoveDiskSnapshotsCommand.

[1]
2014-11-29 21:12:26,019 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (org.ovirt.thread.pool-7-thread-8) [38273561] ERROR, MergeSnapshotsVDSCommand( storagePoolId = 00000002-0002-0002-0002-0000000000b8, ignoreFailoverLimit = false, storageDomainId = 0f599def-1b9b-45be-83bc-f4271ce26569, imageGroupId = beaddb35-ab14-4589-aaee-40feaed2d573, imageId = e030be66-68a5-4afa-a0df-bb5b7465cc70, imageId2 = 7b75eda2-3013-4c4a-a9ab-127c3b3b0bd8, vmId = 00000000-0000-0000-0000-000000000000, postZero = false), exception: XmlRpcRunTimeException: Connection issues during send request, log id: 4e0a56ae: org.ovirt.engine.core.vdsbroker.xmlrpc.XmlRpcRunTimeException: Connection issues during send request
	at org.ovirt.engine.core.vdsbroker.jsonrpc.FutureMap.<init>(FutureMap.java:70) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.jsonrpc.JsonRpcIIrsServer.mergeSnapshots(JsonRpcIIrsServer.java:157) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.MergeSnapshotsVDSCommand.executeIrsBrokerCommand(MergeSnapshotsVDSCommand.java:15) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand.executeVDSCommand(IrsBrokerCommand.java:156) [vdsbroker.jar:]
	at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) [vdsbroker.jar:]
	at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:]
	at org.ovirt.engine.core.vdsbroker.ResourceManager.runVdsCommand(ResourceManager.java:418) [vdsbroker.jar:]
	at org.ovirt.engine.core.bll.VDSBrokerFrontendImpl.RunVdsCommand(VDSBrokerFrontendImpl.java:33) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runVdsCommand(CommandBase.java:2042) [bll.jar:]
	at org.ovirt.engine.core.bll.storage.StorageHandlingCommandBase.runVdsCommand(StorageHandlingCommandBase.java:639) [bll.jar:]
	at org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand.mergeSnapshots(RemoveSnapshotSingleDiskCommand.java:50) [bll.jar:]
	at org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand.executeCommand(RemoveSnapshotSingleDiskCommand.java:37) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1179) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1318) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1943) [bll.jar:]

[2]
2014-11-29 21:12:26,054 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-7-thread-8) [38273561] Command org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.irsbroker.IRSProtocolException: IRSProtocolException:  (Failed with error ENGINE and code 5001)
2014-11-29 21:12:26,080 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] (org.ovirt.thread.pool-7-thread-8) [38273561] Transaction rolled-back for command: org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand.

Comment 7 Ravi Nori 2014-12-02 03:09:26 UTC
In my testing both compensate and rollback were invoked on RemoveSnapshotSingleDiskCommand. The only option was to override rollback and unlock the disk. Submitted a patch

Comment 9 lkuchlan 2014-12-16 08:35:08 UTC
Tested using RHEVM 3.5 vt13.3

Comment 10 Allon Mureinik 2015-02-16 19:13:17 UTC
RHEV-M 3.5.0 has been released, closing this bug.

Comment 11 Allon Mureinik 2015-02-16 19:13:17 UTC
RHEV-M 3.5.0 has been released, closing this bug.