Description of problem: If the leaf image is ILLEGAL, the second attempt of the cold merge will get stuck in prepareMerge task. A cold merge was done between images 80fe00ce and f23a1be1 and the leaf image 80fe00ce was ILLEGAL. === 2018-08-20 14:00:20,080+0530 INFO (jsonrpc/4) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=5215c2c0-9e13-4ee7-b660-0e715bf57084, task_id=1c5f4c02-f293-4b28-9324-1dd0ca728f08 (api:46) === The base volume f23a1be1 was prepared successfully. === 2018-08-20 15:14:11,149+0530 INFO (jsonrpc/6) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=c98cf9f2-374d-403e-90f5-05ba52869b48, task_id=5ae42759-3445-416a-9caf-938a4a561a7f (api:46) 2018-08-20 15:14:12,040+0530 INFO (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/f23a1be1-d4bf-4fcb-8e24-7af156c4464d (volume:566) 2018-08-20 15:14:12,062+0530 INFO (tasks/0) [storage.LVM] Activating lvs: vg=c21260fa-bc27-4bd3-9841-99da4e2d0402 lvs=['f23a1be1-d4bf-4fcb-8e24-7af156c4464d'] (lvm:1294) === Preparation of leaf image got failed since it's illegal. === 2018-08-20 15:14:14,514+0530 INFO (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/80fe00ce-9417-4dd9-ba6f-d4a1ed721a21 (volume:566) 2018-08-20 15:14:14,545+0530 ERROR (tasks/0) [storage.TaskManager.Task] (Task='5ae42759-3445-416a-9caf-938a4a561a7f') Unexpected error (task:875) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run return fn(*args, **kargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper return method(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1869, in prepareMerge merge.prepare(subchainInfo) File "/usr/lib/python2.7/site-packages/vdsm/storage/merge.py", line 179, in prepare with subchain.prepare(): File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/python2.7/site-packages/vdsm/storage/merge.py", line 158, in prepare vol.prepare(rw=rw, justme=True) File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 571, in prepare raise se.prepareIllegalVolumeError(self.volUUID) prepareIllegalVolumeError: Cannot prepare illegal volume: ('80fe00ce-9417-4dd9-ba6f-d4a1ed721a21',) === However, the Image.teardown was not called and hence the "resource" was not released. Now the next merge attempt will get stuck at "preparing volume". === 2018-08-20 15:14:36,808+0530 INFO (jsonrpc/1) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=10aa522d-22a3-47a4-b09d-c8112b4eaa60, task_id=a15324c1-8f3d-4f10-a360-4f108ca8dd39 (api:46) 2018-08-20 15:14:38,686+0530 INFO (tasks/0) [storage.ThreadPool.WorkerThread] START task a15324c1-8f3d-4f10-a360-4f108ca8dd39 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f9020439560>>, args=None) (threadPool:208) 2018-08-20 15:14:39,325+0530 INFO (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/f23a1be1-d4bf-4fcb-8e24-7af156c4464d (volume:566) === === vdsm-client Host getAllTasksInfo { "a15324c1-8f3d-4f10-a360-4f108ca8dd39": { "verb": "prepareMerge", "id": "a15324c1-8f3d-4f10-a360-4f108ca8dd39" } } === The gdb shows this suck thread waiting to acquire lock. === f=f@entry=Frame 0x7f9038189210, for file /usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py, line 234, in wait (self=<Request(_namespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402', _doneEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<Condition(release=<instancemethod at remote 0x7f90203a91e0>, _Condition__lock=<Lock at remote 0x7f90380e11b8>, acquire=<instancemethod at remote 0x7f9020532140>, _Condition__cond=<Cond at remote 0x7f9038086950>) at remote 0x7f9038086c90>) at remote 0x7f90381c8510>, _lockType='exclusive', _name='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', reqID='2f905e40-c803-494d-adc8-2cb84d658527', _callback=<function at remote 0x7f905c0d6c80>, _isActive=True, _log=<SimpleLogAdapter(logger=<Logger(name='storage.ResourceManager.Request', parent=<Logger(name='storage.ResourceManager', parent=<Logger(name='storage', parent=<RootLogger(name='root', parent=None, handlers=[<SysLogHandler(socket=<_socketobject at remote 0x7f9071ac19f0>, level=30, lock=<RLock at remote 0x7f906...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040 f=f@entry=Frame 0x7f90040536d0, for file /usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py, line 1025, in acquireResource (namespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402', name='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', lockType='exclusive', timeout=None), throwflag=throwflag@entry=0) f=f@entry=Frame 0x7f8fe8011c60, for file /usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py, line 376, in llPrepare (self=<BlockVolumeManifest(imgUUID='4b31cffe-d931-4a82-bc2c-aaf808c87084', voltype='INTERNAL', _imagePath='/rhev/data-center/mnt/blockSD/c21260fa-bc27-4bd3-9841-99da4e2d0402/images/4b31cffe-d931-4a82-bc2c-aaf808c87084', _volumePath='/rhev/data-center/mnt/blockSD/c21260fa-bc27-4bd3-9841-99da4e2d0402/images/4b31cffe-d931-4a82-bc2c-aaf808c87084/f23a1be1-d4bf-4fcb-8e24-7af156c4464d', repoPath='/rhev/data-center/mnt/blockSD', sdUUID='c21260fa-bc27-4bd3-9841-99da4e2d0402', volUUID='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', lvmActivationNamespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402') at remote 0x7f9038240e50>, rw=True, setrw=False, access='exclusive'), throwflag=throwflag@entry=0) ==== It's not possible to stop this task as well because it will with error below. === 2018-08-20 16:54:16,128+0530 INFO (jsonrpc/5) [vdsm.api] START stopTask(taskID=u'a15324c1-8f3d-4f10-a360-4f108ca8dd39', spUUID=None, options=None) from=::1,51912, task_id=20da88ef-fa32-4b30-823d-bb368a6d4a71 (api:46) 2018-08-20 16:54:16,128+0530 INFO (jsonrpc/5) [vdsm.api] FINISH stopTask error=Task is aborted: u'a15324c1-8f3d-4f10-a360-4f108ca8dd39' - code 411 from=::1,51912, task_id=20da88ef-fa32-4b30-823d-bb368a6d4a71 (api:50) === "Force" option is not available in vdsm-client which has requested in bug 1592180 Version-Release number of selected component (if applicable): vdsm-4.20.27.2-1.el7ev.x86_64 rhvm-4.2.5.2-0.1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Make the leaf image of the image illegal (Customer's leaf image become ILLEGAL because of bug 1598594) in storage domain metadata. 2. Do cold merge twice. 3. For the second cold merge, the prepareMerge will get stuck. Actual results: The merge task in getting stuck in vdsm Expected results: The merge should not create a stuck task in vdsm. Additional info:
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.2.z': '?'}', ] For more info please contact: rhv-devops
1. Created a VM with a disk and 2 snapshots 2. Set snapshot 1 to illegal: engine=# update images set imagestatus = '4' where image_guid = '9888c17d-27dc-494f-8600-6e11c345bae5'; UPDATE 1 3. Tried to delete snapshot 1 -> Failed as expected: 2019-01-31 16:45:40,589+02 WARN [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-82) [2311d2ba-d0e7-4c56-9aa1-2d3dc8a339da] Validation of action 'RemoveSnapshot' failed for user admin@i nternal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISKS_ILLEGAL,$diskAliases test1_Disk1 4. Set snapshot 1 to legal and set snapshot 2 to illegal: engine=# update images set imagestatus = '4' where image_guid = 'b5500e18-905e-4168-b9dd-4f40a619ff9c'; UPDATE 1 5. Tried to delete snapshot 2 -> Failed as expected: 2019-01-31 16:47:26,666+02 WARN [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-79) [9809a1a7-be5b-4719-a6a3-c0832c1d91ba] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISKS_ILLEGAL,$diskAliases test1_Disk1 ========= Used: ovirt-engine-4.3.0-0.8.master.20190121201726.gite3ad1b2.el7.noarch vdsm-4.30.6-17.gitcfd81b7.el7.x86_64
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1085
sync2jira