Description of problem: A live merge failed with "Failed to live merge, still in volume chain", however I have not been able to find the reason why. The engine log just says "Failed to live merge, still in volume chain". The vdsm log indicates that the merge completed and the subsequent image synchronization completed. No errors were reported in these sequences. It does indicate that the image that should have been deleted is still in the chain. There are no libvirtd logs, The qemu log for the VM doesn't indicate any error/message. Version-Release number of selected component (if applicable): RHEV 3.5.1 RHEV-H 7.1 20150512.1.el7ev vdsm-4.16.13.1-1.el7 libvirt-1.2.8-16.el7_1.2 qemu-kvm-rhev-2.1.2-23.el7_1.3 How reproducible: Not respducible, however the customer has encountered multiple live merge failures subsequently. Steps to Reproduce: 1. 2. 3. Actual results: Live merge with a specific error, but no indication in the logs of how to determine why. Expected results: Appropriate errors messages logged. Additional info:
There are a bunch of fixes since 3.5.1 to 3.5.8, so the issue, if it still exists, should be dramatically mitigated. Pushing out to 3.6.6 as to not risk 3.6.5
Ala, Do we know how to reproduce/verify? Aharon.
Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please take this BZ through the cloning process
(In reply to Allon Mureinik from comment #19) > Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to > address this. Are we waiting for anything else? If so - please explain what. > If not, please take this BZ through the cloning process The target milestone is still 3.6.6 - not sure what the status is.
This BZ is fixed. Currently, we are not able to clone this BZ due to an infra issue. Will update BZ info once issue resolved.
(In reply to Ala Hino from comment #21) > This BZ is fixed. > > Currently, we are not able to clone this BZ due to an infra issue. Will > update BZ info once issue resolved. Then do it manually.
This bug has been cloned to the z-stream bug #1333304
Test the with the following code: ------------------------------------------ rhevm-4.0.0.2-0.1.el7ev.noarch vdsm-4.18.1-11.gita92976e.el7ev.x86_64 Tested with the following scenario: ------------------------------------- 1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin 2. Created snapshots sbu1, sbu2 and sbu3 while VM is dowm 3. Started the VM and created su1, su2 and su3 4. Selected snbu2 and tried deleting >>>>> The operation failed with "Failed to delete snapshot 'sbu2' for VM 'vm1'." vdsm log displays error Volume does not exist" Engine.log: ----------------- 2016-06-05 14:11:54,499 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sbu2' deletion for VM 'vm1' was initiated by admin@internal-authz. . . . 2016-06-05 14:14:09,898 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandPar ameters:{runAsync='true', storagePoolId='3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', ignoreFailoverLimit='false', storageDomainId='7590d3ff-797a-4042-90f8-4094e6d01665', imageGroupId='181ba273-4 150-4ad0-a209-95184cd1fbd4', imageId='3c27fae4-6ac7-436e-8302-e165632e5108'}), log id: 135a9ad8 2016-06-05 14:14:10,928 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine .core.common.businessentities.storage.DiskImage@7786d8dc, log id: 135a9ad8 2016-06-05 14:14:11,206 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back 2016-06-05 14:14:11,206 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back 2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Error invoking callback method 'onFailed' for 'FAILED' command 'c2 b114ea-5b53-4686-86c1-175c97ad19a8' 2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Exception: java.lang.NullPointerException at org.ovirt.engine.core.utils.ovf.OvfWriter.buildDisk(OvfWriter.java:204) [utils.jar:] at org.ovirt.engine.core.utils.ovf.OvfManager.buildOvf(OvfManager.java:118) [utils.jar:] at org.ovirt.engine.core.utils.ovf.OvfManager.exportVm(OvfManager.java:29) [utils.jar:] at org.ovirt.engine.core.bll.storage.disk.image.ImagesHandler.prepareSnapshotConfigWithAlternateImage(ImagesHandler.java:829) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$updateVmConfigurationForImageChange$1(RemoveSnapshotSingleDiskLiveCommand.java:394) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.updateVmConfigurationForImageChange(RemoveSnapshotSingleDiskLiveCommand.java:392) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.handleBackwardLiveMerge(RemoveSnapshotSingleDiskLiveCommand.java:382) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$syncDbRecords$0(RemoveSnapshotSingleDiskLiveCommand.java:303) [bll.jar:] at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.syncDbRecords(RemoveSnapshotSingleDiskLiveCommand.java:291) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.onFailed(RemoveSnapshotSingleDiskLiveCommand.java:447) [bll.jar:] at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback.onFailed(RemoveSnapshotSingleDiskLiveCommandCallback.java:18) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.endCallback(CommandCallbacksPoller.java:56) [bll.jar:] at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:] at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) [:1.8.0_91] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_91] at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_91] at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:] at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:] at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_91] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_91] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_91] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91] 2016-06-05 14:14:12,249 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [6067c86] Command 'RemoveSnapshot' id: 'fec75b49-40a7-4439-8dde- d9db3bb221b2' child commands '[b77126ce-f696-4bdc-a0f6-b411841c5ad4, 24b9ea03-2be1-4f0a-a1ab-3ba54c209f58, c2b114ea-5b53-4686-86c1-175c97ad19a8]' executions were completed, status 'FAILED' 2016-06-05 14:14:13,520 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler10) [6067c86] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnap shotCommand' with failure. 2016-06-05 14:14:13,724 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [6067c86] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'sbu2' for VM 'vm1'. VDSM log: -------------------------------- Thread-430::DEBUG::2016-06-05 14:13:30,758::resourceManager::241::Storage.ResourceManager.Request::(grant) ResName=`Storage.7590d3ff-797a-4042-90f8-4094e6d01665`ReqID=`8e5f76d1-4683-46d5-b823-43ad815e8165`::Granted request Thread-430::DEBUG::2016-06-05 14:13:30,758::task::829::Storage.TaskManager.Task::(resourceAcquired) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_resourcesAcquired: Storage.7590d3ff-797a-4042-90f8-4094e6d01665 (shared) Thread-430::DEBUG::2016-06-05 14:13:30,758::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting False Thread-430::DEBUG::2016-06-05 14:13:30,759::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516000c8|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 7590d3ff-797a-4042-90f8-4094e6d01665 (cwd None) Thread-430::DEBUG::2016-06-05 14:13:30,961::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0 Thread-430::DEBUG::2016-06-05 14:13:30,963::lvm::463::Storage.LVM::(_reloadlvs) lvs reloaded Thread-430::WARNING::2016-06-05 14:13:30,963::lvm::609::Storage.LVM::(getLv) lv: 9f680fdb-fe8a-462c-9977-7aec183ec6af not found in lvs vg: 7590d3ff-797a-4042-90f8-4094e6d01665 response Thread-430::ERROR::2016-06-05 14:13:30,963::task::868::Storage.TaskManager.Task::(_setError) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 875, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 3039, in getVolumeInfo volUUID=volUUID).getInfo() File "/usr/share/vdsm/storage/sd.py", line 619, in produceVolume volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 370, in __init__ manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID) File "/usr/share/vdsm/storage/blockVolume.py", line 72, in __init__ volUUID) File "/usr/share/vdsm/storage/volume.py", line 80, in __init__ self.validate() File "/usr/share/vdsm/storage/blockVolume.py", line 156, in validate raise se.VolumeDoesNotExist(self.volUUID) VolumeDoesNotExist: Volume does not exist: ('9f680fdb-fe8a-462c-9977-7aec183ec6af',) Thread-430::DEBUG::2016-06-05 14:13:30,989::task::887::Storage.TaskManager.Task::(_run) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._run: 18cc50f9-e374-46bb-9389-e48f192628c8 ('7590d3ff-797a-4042-90f8-4094e6d01665', '3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', '9c3f4477-4041-4c4f-8add-b547db25c839', '9f680fdb-fe8a-462c-9977-7aec183ec6af') {} failed - stopping task Thread-430::DEBUG::2016-06-05 14:13:30,989::task::1248::Storage.TaskManager.Task::(stop) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::stopping in state preparing (force False) Thread-430::DEBUG::2016-06-05 14:13:30,989::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting True Thread-430::INFO::2016-06-05 14:13:30,989::task::1173::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::aborting: Task is aborted: 'Volume does not exist' - code 201 Thread-430::DEBUG::2016-06-05 14:13:30,990::task::1178::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Prepare: aborted: Volume does not exist Thread-430::DEBUG::2016-06-05 14:13:30,990::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 0 aborting True Thread-430::DEBUG::2016-06-05 14:13:30,990::task::930::Storage.TaskManager.Task::(_doAbort) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._doAbort: force False Thread-430::DEBUG::2016-06-05 14:13:30,990::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state preparing -> state aborting Thread-430::DEBUG::2016-06-05 14:13:30,990::task::552::Storage.TaskManager.Task::(__state_aborting) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_aborting: recover policy none Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state aborting -> state failed Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.7590d3ff-797a-4042-90f8-4094e6d01665': < ResourceRef 'Storage.7590d3ff-797a-4042-90f8-4094e6d01665', isValid: 'True' obj: 'None'>} Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Tested with the following code: ------------------------------------------ rhevm-4.0.0.4-0.1.el7ev.noarch vdsm-4.18.2-0.el7ev.x86_64 Tested with the following scenario: ------------------------------------- 1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin 2. Created snapshots sd1, sd2, sd3 and sd4 while VM is dowm 3. Deleted sd2 successfully 4. Deleted sd3 successfully 5. Started the VM 6. Created snapshot su1 7. Created snapshot su2 8. Created snapshot su3 9. Created snapshot su4 10. Deleted snapshot sd1 successfully 11. Deleted snapshot su2 successfully Moving to VERIFIED!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1743.html