Bug 1301083
| Summary: | Live merge fails with "Failed to live merge, still in volume chain" | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Gordon Watson <gwatson> | |
| Component: | ovirt-engine | Assignee: | Ala Hino <ahino> | |
| Status: | CLOSED ERRATA | QA Contact: | Kevin Alon Goldblatt <kgoldbla> | |
| Severity: | high | Docs Contact: | ||
| Priority: | high | |||
| Version: | 3.5.1 | CC: | acanan, ahino, amureini, bazulay, gklein, gveitmic, inetkach, lsurette, mkalinin, ratamir, rbalakri, Rhev-m-bugs, sapandit, srevivo, tnisan, ycui, ykaul, ylavi | |
| Target Milestone: | ovirt-4.0.0-beta | |||
| Target Release: | 4.0.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ovirt-engine-4.0.0_beta1 | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1333302 1333304 (view as bug list) | Environment: | ||
| Last Closed: | 2016-08-23 20:31:32 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1247379 | |||
| Bug Blocks: | 1333302, 1333304 | |||
|
Description
Gordon Watson
2016-01-22 14:37:37 UTC
There are a bunch of fixes since 3.5.1 to 3.5.8, so the issue, if it still exists, should be dramatically mitigated. Pushing out to 3.6.6 as to not risk 3.6.5 Ala, Do we know how to reproduce/verify? Aharon. Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to address this. Are we waiting for anything else? If so - please explain what. If not, please take this BZ through the cloning process (In reply to Allon Mureinik from comment #19) > Ala, patch https://gerrit.ovirt.org/#/c/56734/ on the 3.6 branch seems to > address this. Are we waiting for anything else? If so - please explain what. > If not, please take this BZ through the cloning process The target milestone is still 3.6.6 - not sure what the status is. This BZ is fixed. Currently, we are not able to clone this BZ due to an infra issue. Will update BZ info once issue resolved. (In reply to Ala Hino from comment #21) > This BZ is fixed. > > Currently, we are not able to clone this BZ due to an infra issue. Will > update BZ info once issue resolved. Then do it manually. This bug has been cloned to the z-stream bug #1333304 Test the with the following code:
------------------------------------------
rhevm-4.0.0.2-0.1.el7ev.noarch
vdsm-4.18.1-11.gita92976e.el7ev.x86_64
Tested with the following scenario:
-------------------------------------
1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin
2. Created snapshots sbu1, sbu2 and sbu3 while VM is dowm
3. Started the VM and created su1, su2 and su3
4. Selected snbu2 and tried deleting >>>>> The operation failed with "Failed to delete snapshot 'sbu2' for VM 'vm1'."
vdsm log displays error Volume does not exist"
Engine.log:
-----------------
2016-06-05 14:11:54,499 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-5) [] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Snapshot 'sbu2' deletion for VM 'vm1' was initiated by admin@internal-authz.
.
.
.
2016-06-05 14:14:09,898 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] START, GetImageInfoVDSCommand( GetImageInfoVDSCommandPar
ameters:{runAsync='true', storagePoolId='3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', ignoreFailoverLimit='false', storageDomainId='7590d3ff-797a-4042-90f8-4094e6d01665', imageGroupId='181ba273-4
150-4ad0-a209-95184cd1fbd4', imageId='3c27fae4-6ac7-436e-8302-e165632e5108'}), log id: 135a9ad8
2016-06-05 14:14:10,928 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] (DefaultQuartzScheduler2) [6067c86] FINISH, GetImageInfoVDSCommand, return: org.ovirt.engine
.core.common.businessentities.storage.DiskImage@7786d8dc, log id: 135a9ad8
2016-06-05 14:14:11,206 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back
2016-06-05 14:14:11,206 INFO [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler2) [6067c86] transaction rolled back
2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Error invoking callback method 'onFailed' for 'FAILED' command 'c2
b114ea-5b53-4686-86c1-175c97ad19a8'
2016-06-05 14:14:11,206 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (DefaultQuartzScheduler2) [6067c86] Exception: java.lang.NullPointerException
at org.ovirt.engine.core.utils.ovf.OvfWriter.buildDisk(OvfWriter.java:204) [utils.jar:]
at org.ovirt.engine.core.utils.ovf.OvfManager.buildOvf(OvfManager.java:118) [utils.jar:]
at org.ovirt.engine.core.utils.ovf.OvfManager.exportVm(OvfManager.java:29) [utils.jar:]
at org.ovirt.engine.core.bll.storage.disk.image.ImagesHandler.prepareSnapshotConfigWithAlternateImage(ImagesHandler.java:829) [bll.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$updateVmConfigurationForImageChange$1(RemoveSnapshotSingleDiskLiveCommand.java:394) [bll.jar:]
at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.updateVmConfigurationForImageChange(RemoveSnapshotSingleDiskLiveCommand.java:392) [bll.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.handleBackwardLiveMerge(RemoveSnapshotSingleDiskLiveCommand.java:382) [bll.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.lambda$syncDbRecords$0(RemoveSnapshotSingleDiskLiveCommand.java:303) [bll.jar:]
at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:204) [utils.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.syncDbRecords(RemoveSnapshotSingleDiskLiveCommand.java:291) [bll.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommand.onFailed(RemoveSnapshotSingleDiskLiveCommand.java:447) [bll.jar:]
at org.ovirt.engine.core.bll.snapshots.RemoveSnapshotSingleDiskLiveCommandCallback.onFailed(RemoveSnapshotSingleDiskLiveCommandCallback.java:18) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.endCallback(CommandCallbacksPoller.java:56) [bll.jar:]
at org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller.invokeCallbackMethods(CommandCallbacksPoller.java:109) [bll.jar:]
at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source) [:1.8.0_91]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_91]
at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_91]
at org.ovirt.engine.core.utils.timer.JobWrapper.invokeMethod(JobWrapper.java:77) [scheduler.jar:]
at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:51) [scheduler.jar:]
at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_91]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_91]
2016-06-05 14:14:12,249 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (DefaultQuartzScheduler8) [6067c86] Command 'RemoveSnapshot' id: 'fec75b49-40a7-4439-8dde-
d9db3bb221b2' child commands '[b77126ce-f696-4bdc-a0f6-b411841c5ad4, 24b9ea03-2be1-4f0a-a1ab-3ba54c209f58, c2b114ea-5b53-4686-86c1-175c97ad19a8]' executions were completed, status 'FAILED'
2016-06-05 14:14:13,520 ERROR [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (DefaultQuartzScheduler10) [6067c86] Ending command 'org.ovirt.engine.core.bll.snapshots.RemoveSnap
shotCommand' with failure.
2016-06-05 14:14:13,724 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler10) [6067c86] Correlation ID: 69d26cca, Job ID: d1762475-b0e4-44c6-9d15-4d36f2b1a4e0, Call Stack: null, Custom Event ID: -1, Message: Failed to delete snapshot 'sbu2' for VM 'vm1'.
VDSM log:
--------------------------------
Thread-430::DEBUG::2016-06-05 14:13:30,758::resourceManager::241::Storage.ResourceManager.Request::(grant) ResName=`Storage.7590d3ff-797a-4042-90f8-4094e6d01665`ReqID=`8e5f76d1-4683-46d5-b823-43ad815e8165`::Granted request
Thread-430::DEBUG::2016-06-05 14:13:30,758::task::829::Storage.TaskManager.Task::(resourceAcquired) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_resourcesAcquired: Storage.7590d3ff-797a-4042-90f8-4094e6d01665 (shared)
Thread-430::DEBUG::2016-06-05 14:13:30,758::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting False
Thread-430::DEBUG::2016-06-05 14:13:30,759::lvm::288::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-3 /usr/bin/sudo -n /usr/sbin/lvm lvs --config ' devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/3514f0c5a516000c8|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '|' --ignoreskippedcluster -o uuid,name,vg_name,attr,size,seg_start_pe,devices,tags 7590d3ff-797a-4042-90f8-4094e6d01665 (cwd None)
Thread-430::DEBUG::2016-06-05 14:13:30,961::lvm::288::Storage.Misc.excCmd::(cmd) SUCCESS: <err> = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; <rc> = 0
Thread-430::DEBUG::2016-06-05 14:13:30,963::lvm::463::Storage.LVM::(_reloadlvs) lvs reloaded
Thread-430::WARNING::2016-06-05 14:13:30,963::lvm::609::Storage.LVM::(getLv) lv: 9f680fdb-fe8a-462c-9977-7aec183ec6af not found in lvs vg: 7590d3ff-797a-4042-90f8-4094e6d01665 response
Thread-430::ERROR::2016-06-05 14:13:30,963::task::868::Storage.TaskManager.Task::(_setError) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 875, in _run
return fn(*args, **kargs)
File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 3039, in getVolumeInfo
volUUID=volUUID).getInfo()
File "/usr/share/vdsm/storage/sd.py", line 619, in produceVolume
volUUID)
File "/usr/share/vdsm/storage/blockVolume.py", line 370, in __init__
manifest = self.manifestClass(repoPath, sdUUID, imgUUID, volUUID)
File "/usr/share/vdsm/storage/blockVolume.py", line 72, in __init__
volUUID)
File "/usr/share/vdsm/storage/volume.py", line 80, in __init__
self.validate()
File "/usr/share/vdsm/storage/blockVolume.py", line 156, in validate
raise se.VolumeDoesNotExist(self.volUUID)
VolumeDoesNotExist: Volume does not exist: ('9f680fdb-fe8a-462c-9977-7aec183ec6af',)
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::887::Storage.TaskManager.Task::(_run) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._run: 18cc50f9-e374-46bb-9389-e48f192628c8 ('7590d3ff-797a-4042-90f8-4094e6d01665', '3f76ceea-84c5-4893-8c7d-3bf2d8eea69e', '9c3f4477-4041-4c4f-8add-b547db25c839', '9f680fdb-fe8a-462c-9977-7aec183ec6af') {} failed - stopping task
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::1248::Storage.TaskManager.Task::(stop) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::stopping in state preparing (force False)
Thread-430::DEBUG::2016-06-05 14:13:30,989::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 1 aborting True
Thread-430::INFO::2016-06-05 14:13:30,989::task::1173::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::aborting: Task is aborted: 'Volume does not exist' - code 201
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::1178::Storage.TaskManager.Task::(prepare) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Prepare: aborted: Volume does not exist
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::995::Storage.TaskManager.Task::(_decref) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::ref 0 aborting True
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::930::Storage.TaskManager.Task::(_doAbort) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::Task._doAbort: force False
Thread-430::DEBUG::2016-06-05 14:13:30,990::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state preparing -> state aborting
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::552::Storage.TaskManager.Task::(__state_aborting) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::_aborting: recover policy none
Thread-430::DEBUG::2016-06-05 14:13:30,990::task::597::Storage.TaskManager.Task::(_updateState) Task=`18cc50f9-e374-46bb-9389-e48f192628c8`::moving from state aborting -> state failed
Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::952::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {'Storage.7590d3ff-797a-4042-90f8-4094e6d01665': < ResourceRef 'Storage.7590d3ff-797a-4042-90f8-4094e6d01665', isValid: 'True' obj: 'None'>}
Thread-430::DEBUG::2016-06-05 14:13:30,991::resourceManager::989::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Tested with the following code: ------------------------------------------ rhevm-4.0.0.4-0.1.el7ev.noarch vdsm-4.18.2-0.el7ev.x86_64 Tested with the following scenario: ------------------------------------- 1.Created Vm with 3 disks, 1 iscsi - preallocated, 1 iscsi - thin and 1 nfs - thin 2. Created snapshots sd1, sd2, sd3 and sd4 while VM is dowm 3. Deleted sd2 successfully 4. Deleted sd3 successfully 5. Started the VM 6. Created snapshot su1 7. Created snapshot su2 8. Created snapshot su3 9. Created snapshot su4 10. Deleted snapshot sd1 successfully 11. Deleted snapshot su2 successfully Moving to VERIFIED! Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1743.html |