Bug 1160336

Summary: Live Merge of snapshots with nfs disks fails when Restarting vdsm during the delete operation
Product: Red Hat Enterprise Virtualization Manager Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: ovirt-engineAssignee: Adam Litke <alitke>
Status: CLOSED DUPLICATE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: acanan, amureini, ecohen, gklein, iheim, kgoldbla, lpeer, lsurette, rbalakri, Rhev-m-bugs, tnisan, yeylon
Target Milestone: ---   
Target Release: 3.5.1   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-04 18:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine vdsm and server logs none

Description Kevin Alon Goldblatt 2014-11-04 15:07:15 UTC
Created attachment 953651 [details]
engine vdsm and server logs

Description of problem:
Restaring the vdsm service during a Live merge operation causes the operation to fail

Description of problem:
On a LIVE VM with 4 snapshots, successfully deleted 2nd and 3rd snapshots but failed to delete the first snapshot created

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.17.beta.el6ev.noarch
vdsm-4.16.7.2-1.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Create a vm with 4 disks (2 thin provisioned and 2 preallocated nfs disks)
2. Created snapshots 1, 2, 3 and 4 
3. Select to delete snapshot 2 and during the delete and live merge operation I restarted the vdsm >>> Operation fails

Actual results:
Failed to delete snapshot

Expected results:
Delete snapshot operation should resume and complete after vdsm restarts

Additional info:
From engine.log:
-------------------------
2014-11-03 18:29:29,570 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (ajp-/127.0.0.1:8702-9) Correlation ID: 4480deef, Job ID: 3440f909-c147-4ae4-abc1-7504e42ed28f, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'snap2' deletion for VM 'nfs4' was initiated by admin.
2014-11-03 18:29:29,582 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-3) [2dc9f734] START, DeleteImageGroupVDSCommand( storagePoolId = b2da04d8-13d3-4a4c-9e91-e6ee493c4fdd, ignoreFailoverLimit = false, storageDomainId = b473691a-e6f6-4c44-81cb-6bee354608fd, imageGroupId = f959c800-81e5-4578-ab34-ecca4e34b59b, postZeros = false, forceDelete = false), log id: 75df3a82
2014-11-03 18:29:30,490 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-3) [2dc9f734] FINISH, DeleteImageGroupVDSCommand, log id: 75df3a82
2014-11-03 18:29:30,492 INFO  [org.ovirt.engine.core.bll.tasks.CommandAsyncTask] (pool-6-thread-3) [2dc9f734] CommandAsyncTask::Adding CommandMultiAsyncTasks object for command 45a52e0f-6662-4a77-8bad-ef8975fbf6c9
2014-11-03 18:29:30,492 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (pool-6-thread-3) [2dc9f734] CommandMultiAsyncTasks::AttachTask: Attaching task 8cdc508e-4382-4d27-8a83-8cf219c2234e to command 45a52e0f-6662-4a77-8bad-ef8975fbf6c9.
2014-11-03 18:29:30,497 INFO  [org.ovirt.engine.core.bll.tasks.AsyncTaskManager] (pool-6-thread-3) [2dc9f734] Adding task 8cdc508e-4382-4d27-8a83-8cf219c2234e (Parent Command RemoveMemoryVolumes, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters), polling hasn't started yet..
2014-11-03 18:29:30,503 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-3) [2dc9f734] START, DeleteImageGroupVDSCommand( storagePoolId = b2da04d8-13d3-4a4c-9e91-e6ee493c4fdd, ignoreFailoverLimit = false, storageDomainId = b473691a-e6f6-4c44-81cb-6bee354608fd, imageGroupId = 41b8b33f-219f-4f10-bd57-79252dfa061a, postZeros = false, forceDelete = false), log id: 7e1de327
2014-11-03 18:29:31,337 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand] (pool-6-thread-3) [2dc9f734] FINISH, DeleteImageGroupVDSCommand, log id: 7e1de327
2014-11-03 18:29:31,338 INFO  [org.ovirt.engine.core.bll.CommandMultiAsyncTasks] (pool-6-thread-3) [2dc9f734] CommandMultiAsyncTasks::AttachTask: Attaching task 7cb80223-e46c-4060-b008-2b106e53e375 to command 45a52e0f-6662-4a77-8bad-ef8975fbf6c9.
.
.
2014-11-03 18:32:42,163 INFO  [org.ovirt.engine.core.bll.RemoveSnapshotCommandCallback] (DefaultQuartzScheduler_Worker-19) [4480deef] All Live Merge child commands have completed, status FAILED
.
.
2014-11-03 18:32:52,233 ERROR [org.ovirt.engine.core.bll.RemoveSnapshotCommand] (DefaultQuartzScheduler_Worker-30) [4480deef] Ending command with failure: org.ovirt.engine.core.bll.RemoveSnapshotCommand
2014-11-03 18:32:52,260 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-30) [4480deef] Correlation ID: 4480deef, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'snap2' for VM 'nfs4'


From vdsm.log
--------------------
Thread-6112::INFO::2014-11-03 18:29:52,824::vm::5741::vm.Vm::(merge) vmId=`1b74d641-debf-4fe7-9288-ef31225141df`::Starting merge with jobUUID='a0e0ae74-ec74-4a15-8554-04fd07f70eb9'




See logs

Comment 1 Adam Litke 2014-11-04 18:36:09 UTC
Looking at the logs I can see that some of the disks were on block devices again.

Thread-6113::DEBUG::2014-11-03 18:29:52,497::vm::5790::vm.Vm::(_diskXMLGetVolumeChainInfo) vmId=`1b74d641-debf-4fe7-9288-ef31225141df`::<allocation/> missing from backing chain for block device vdb
Thread-6113::ERROR::2014-11-03 18:29:52,500::vm::5690::vm.Vm::(merge) vmId=`1b74d641-debf-4fe7-9288-ef31225141df`::merge: libvirt does not support volume chain monitoring.  Unable to perform live merge.


I would recommend retrying your test using only nfs disks until there is a resolution for 1155583.

*** This bug has been marked as a duplicate of bug 1155583 ***