Bug 1619233 - prepareMerge task is stuck when executing a cold merge on illegal image
Summary: prepareMerge task is stuck when executing a cold merge on illegal image
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.5
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: Eyal Shenitzky
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks: 1631341
TreeView+ depends on / blocked
 
Reported: 2018-08-20 11:42 UTC by nijin ashok
Modified: 2021-12-10 17:03 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1631341 (view as bug list)
Environment:
Last Closed: 2019-05-08 12:38:08 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3570711 0 None None None 2018-08-21 02:47:13 UTC
Red Hat Product Errata RHEA-2019:1085 0 None None None 2019-05-08 12:38:29 UTC
oVirt gerrit 93940 0 'None' MERGED core: validation destination snapshot images are legal before cold merge 2021-02-08 16:44:48 UTC
oVirt gerrit 94033 0 'None' MERGED core: validation destination snapshot images are legal before cold merge 2021-02-08 16:44:48 UTC

Description nijin ashok 2018-08-20 11:42:06 UTC
Description of problem:

If the leaf image is ILLEGAL, the second attempt of the cold merge will get stuck in prepareMerge task.

A cold merge was done between images 80fe00ce and f23a1be1 and the leaf image 80fe00ce was ILLEGAL.

===
2018-08-20 14:00:20,080+0530 INFO  (jsonrpc/4) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=5215c2c0-9e13-4ee7-b660-0e715bf57084, task_id=1c5f4c02-f293-4b28-9324-1dd0ca728f08 (api:46)
===

The base volume f23a1be1 was prepared successfully.


===
2018-08-20 15:14:11,149+0530 INFO  (jsonrpc/6) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=c98cf9f2-374d-403e-90f5-05ba52869b48, task_id=5ae42759-3445-416a-9caf-938a4a561a7f (api:46)

2018-08-20 15:14:12,040+0530 INFO  (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/f23a1be1-d4bf-4fcb-8e24-7af156c4464d (volume:566)
2018-08-20 15:14:12,062+0530 INFO  (tasks/0) [storage.LVM] Activating lvs: vg=c21260fa-bc27-4bd3-9841-99da4e2d0402 lvs=['f23a1be1-d4bf-4fcb-8e24-7af156c4464d'] (lvm:1294)
===

Preparation of leaf image got failed since it's illegal.

===
2018-08-20 15:14:14,514+0530 INFO  (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/80fe00ce-9417-4dd9-ba6f-d4a1ed721a21 (volume:566)
2018-08-20 15:14:14,545+0530 ERROR (tasks/0) [storage.TaskManager.Task] (Task='5ae42759-3445-416a-9caf-938a4a561a7f') Unexpected error (task:875)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 882, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/task.py", line 336, in run
    return self.cmd(*self.argslist, **self.argsdict)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line 79, in wrapper
    return method(self, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/sp.py", line 1869, in prepareMerge
    merge.prepare(subchainInfo)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/merge.py", line 179, in prepare
    with subchain.prepare():
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/merge.py", line 158, in prepare
    vol.prepare(rw=rw, justme=True)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/volume.py", line 571, in prepare
    raise se.prepareIllegalVolumeError(self.volUUID)
prepareIllegalVolumeError: Cannot prepare illegal volume: ('80fe00ce-9417-4dd9-ba6f-d4a1ed721a21',)
===

However, the Image.teardown was not called and hence the  "resource" was not released.

Now the next merge attempt will get stuck at "preparing volume".

===
2018-08-20 15:14:36,808+0530 INFO  (jsonrpc/1) [vdsm.api] START prepareMerge(spUUID=u'0aa2db2e-9bf5-11e8-8497-001a4a17015d', subchainInfo={u'img_id': u'4b31cffe-d931-4a82-bc2c-aaf808c87084', u'sd_id': u'c21260fa-bc27-4bd3-9841-99da4e2d0402', u'top_id': u'80fe00ce-9417-4dd9-ba6f-d4a1ed721a21', u'base_id': u'f23a1be1-d4bf-4fcb-8e24-7af156c4464d'}) from=::ffff:10.65.177.57,41952, flow_id=10aa522d-22a3-47a4-b09d-c8112b4eaa60, task_id=a15324c1-8f3d-4f10-a360-4f108ca8dd39 (api:46)

2018-08-20 15:14:38,686+0530 INFO  (tasks/0) [storage.ThreadPool.WorkerThread] START task a15324c1-8f3d-4f10-a360-4f108ca8dd39 (cmd=<bound method Task.commit of <vdsm.storage.task.Task instance at 0x7f9020439560>>, args=None) (threadPool:208)
2018-08-20 15:14:39,325+0530 INFO  (tasks/0) [storage.VolumeManifest] Volume: preparing volume c21260fa-bc27-4bd3-9841-99da4e2d0402/f23a1be1-d4bf-4fcb-8e24-7af156c4464d (volume:566)
===


===
vdsm-client Host getAllTasksInfo
{
    "a15324c1-8f3d-4f10-a360-4f108ca8dd39": {
        "verb": "prepareMerge", 
        "id": "a15324c1-8f3d-4f10-a360-4f108ca8dd39"
    }
}
===

The gdb shows this suck thread waiting to acquire lock.

===

    f=f@entry=Frame 0x7f9038189210, for file /usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py, line 234, in wait (self=<Request(_namespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402', _doneEvent=<_Event(_Verbose__verbose=False, _Event__flag=False, _Event__cond=<Condition(release=<instancemethod at remote 0x7f90203a91e0>, _Condition__lock=<Lock at remote 0x7f90380e11b8>, acquire=<instancemethod at remote 0x7f9020532140>, _Condition__cond=<Cond at remote 0x7f9038086950>) at remote 0x7f9038086c90>) at remote 0x7f90381c8510>, _lockType='exclusive', _name='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', reqID='2f905e40-c803-494d-adc8-2cb84d658527', _callback=<function at remote 0x7f905c0d6c80>, _isActive=True, _log=<SimpleLogAdapter(logger=<Logger(name='storage.ResourceManager.Request', parent=<Logger(name='storage.ResourceManager', parent=<Logger(name='storage', parent=<RootLogger(name='root', parent=None, handlers=[<SysLogHandler(socket=<_socketobject at remote 0x7f9071ac19f0>, level=30, lock=<RLock at remote 0x7f906...(truncated), throwflag=throwflag@entry=0) at /usr/src/debug/Python-2.7.5/Python/ceval.c:3040


    f=f@entry=Frame 0x7f90040536d0, for file /usr/lib/python2.7/site-packages/vdsm/storage/resourceManager.py, line 1025, in acquireResource (namespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402', name='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', lockType='exclusive', timeout=None), throwflag=throwflag@entry=0)

    f=f@entry=Frame 0x7f8fe8011c60, for file /usr/lib/python2.7/site-packages/vdsm/storage/blockVolume.py, line 376, in llPrepare (self=<BlockVolumeManifest(imgUUID='4b31cffe-d931-4a82-bc2c-aaf808c87084', voltype='INTERNAL', _imagePath='/rhev/data-center/mnt/blockSD/c21260fa-bc27-4bd3-9841-99da4e2d0402/images/4b31cffe-d931-4a82-bc2c-aaf808c87084', _volumePath='/rhev/data-center/mnt/blockSD/c21260fa-bc27-4bd3-9841-99da4e2d0402/images/4b31cffe-d931-4a82-bc2c-aaf808c87084/f23a1be1-d4bf-4fcb-8e24-7af156c4464d', repoPath='/rhev/data-center/mnt/blockSD', sdUUID='c21260fa-bc27-4bd3-9841-99da4e2d0402', volUUID='f23a1be1-d4bf-4fcb-8e24-7af156c4464d', lvmActivationNamespace='03_lvm_c21260fa-bc27-4bd3-9841-99da4e2d0402') at remote 0x7f9038240e50>, rw=True, setrw=False, access='exclusive'), throwflag=throwflag@entry=0)
====

It's not possible to stop this task as well because it will with error below.

===
2018-08-20 16:54:16,128+0530 INFO  (jsonrpc/5) [vdsm.api] START stopTask(taskID=u'a15324c1-8f3d-4f10-a360-4f108ca8dd39', spUUID=None, options=None) from=::1,51912, task_id=20da88ef-fa32-4b30-823d-bb368a6d4a71 (api:46)
2018-08-20 16:54:16,128+0530 INFO  (jsonrpc/5) [vdsm.api] FINISH stopTask error=Task is aborted: u'a15324c1-8f3d-4f10-a360-4f108ca8dd39' - code 411 from=::1,51912, task_id=20da88ef-fa32-4b30-823d-bb368a6d4a71 (api:50)
===

"Force" option is not available in vdsm-client which has requested in bug 1592180

Version-Release number of selected component (if applicable):

vdsm-4.20.27.2-1.el7ev.x86_64
rhvm-4.2.5.2-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. Make the leaf image of the image illegal (Customer's leaf image become ILLEGAL because of bug 1598594) in storage domain metadata.

2. Do cold merge twice.
 
3. For the second cold merge, the prepareMerge will get stuck.

Actual results:

The merge task in getting stuck in vdsm

Expected results:

The merge should not create a stuck task in vdsm.

Additional info:

Comment 2 RHV bug bot 2018-09-19 12:15:59 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.2.z': '?'}', ]

For more info please contact: rhv-devops

Comment 5 Elad 2019-01-31 14:50:41 UTC
1. Created a VM with a disk and 2 snapshots

2. Set snapshot 1 to illegal:
engine=# update images set imagestatus = '4' where image_guid = '9888c17d-27dc-494f-8600-6e11c345bae5';
UPDATE 1

3. Tried to delete snapshot 1 -> Failed as expected:
2019-01-31 16:45:40,589+02 WARN  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-82) [2311d2ba-d0e7-4c56-9aa1-2d3dc8a339da] Validation of action 'RemoveSnapshot' failed for user admin@i
nternal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISKS_ILLEGAL,$diskAliases test1_Disk1

4. Set snapshot 1 to legal and set snapshot 2 to illegal:
engine=# update images set imagestatus = '4' where image_guid = 'b5500e18-905e-4168-b9dd-4f40a619ff9c';
UPDATE 1

5. Tried to delete snapshot 2 -> Failed as expected:
2019-01-31 16:47:26,666+02 WARN  [org.ovirt.engine.core.bll.snapshots.RemoveSnapshotCommand] (default task-79) [9809a1a7-be5b-4719-a6a3-c0832c1d91ba] Validation of action 'RemoveSnapshot' failed for user admin@internal-authz. Reasons: VAR__TYPE__SNAPSHOT,VAR__ACTION__REMOVE,ACTION_TYPE_FAILED_DISKS_ILLEGAL,$diskAliases test1_Disk1




=========
Used:
ovirt-engine-4.3.0-0.8.master.20190121201726.gite3ad1b2.el7.noarch
vdsm-4.30.6-17.gitcfd81b7.el7.x86_64

Comment 7 errata-xmlrpc 2019-05-08 12:38:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085

Comment 8 Daniel Gur 2019-08-28 13:12:49 UTC
sync2jira

Comment 9 Daniel Gur 2019-08-28 13:17:01 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.