Bug 1461295 - [downstream clone - 4.1.4] [RFE] Provide a way to correlate each 'run and protect' thread to its task
[downstream clone - 4.1.4] [RFE] Provide a way to correlate each 'run and pro...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm (Show other bugs)
4.1.0
Unspecified Unspecified
unspecified Severity high
: ovirt-4.1.4
: ---
Assigned To: Nir Soffer
Lucie Leistnerova
: FutureFeature, ZStream
Depends On: 1425596
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-14 03:20 EDT by rhev-integ
Modified: 2017-07-27 14:03 EDT (History)
17 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: 1425596
Environment:
Last Closed: 2017-07-27 14:03:45 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 77456 master MERGED api: Show task id in storage api logs 2017-06-14 03:21 EDT
oVirt gerrit 77893 ovirt-4.1 MERGED api: Show task id in storage api logs 2017-07-04 08:23 EDT

  None (edit)
Description rhev-integ 2017-06-14 03:20:34 EDT
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1425596 +++
======================================================================

Today, there is no way to correlate directly (without guessing and hoping it is your lucky day) from 'run and protect' thread to the task opened to fulfill the request. Unless you go to engine.log and try correlating the command from there.
If engine.log is missing or system has high traffic of events going on and the threads in the pool are being reused frequently, the likelihood of correlating the thread to its task successfully decreases. 

Example of current behavior:
~~~
INFO  (jsonrpc/6) [dispatcher] Run and protect: create_lease(lease={u'sd_id': u'd4ad5a78-1d75-457b-8caf-400d2d7917a6', u'lease_id': u'7eb2c25b-1bc8-4016-a9a5-73c6a69d2fa2'})

INFO  (tasks/2) [storage.ThreadPool.WorkerThread] START task a6a2107d-63b0-42b3-ac60-4ebbb1d1e3a9 (cmd=<bound method Task.commit of <storage.task.Task instance at 0x31afa70>>, args=None)

INFO  (tasks/2) [storage.xlease] Adding lease '7eb2c25b-1bc8-4016-a9a5-73c6a69d2fa2' in lockspace u'd4ad5a78-1d75-457b-8caf-400d2d7917a6' (xlease:548)
~~~

Expected behavior:
Add a correlation ID to be present in both thread and task. This sounds like the best option to me.

Or somehow add the task information in run and protect message or the other way around. But the problem with other way around is that jsonrpc threads seem to be reused from a pool of threads and they do not really have a unique identifier as the task does.

(Originally by Marina Kalinin)
Comment 3 rhev-integ 2017-06-14 03:20:48 EDT
All storage verbs run in a task, and the task id is available via vars.task.id.
We just need to add the task id in the logger creating the "Run and protect"
messages.

Can be something like:

INFO  (jsonrpc/6) [dispatcher] Run and protect: create_lease(lease={u'sd_id': u'd4ad5a78-1d75-457b-8caf-400d2d7917a6', u'lease_id': u'7eb2c25b-1bc8-4016-a9a5-73c6a69d2fa2'}) (task=a6a2107d-63b0-42b3-ac60-4ebbb1d1e3a9)
...
INFO  (tasks/2) [storage.ThreadPool.WorkerThread] START task a6a2107d-63b0-42b3-ac60-4ebbb1d1e3a9 (cmd=<bound method Task.commit of <storage.task.Task instance at 0x31afa70>>, args=None)
...
INFO  (jsonrpc/6) [dispatcher] Run and protect: create_lease, Return response: None)
(task=a6a2107d-63b0-42b3-ac60-4ebbb1d1e3a9)

Adding correction id can be nice, but we don't have it now in vdsm, it was dropped
when we moved to jsonrpc.

(Originally by Nir Soffer)
Comment 4 rhev-integ 2017-06-14 03:20:53 EDT
This is trivial using the new logging infrastructure introduced in 4.1.3.

(Originally by Nir Soffer)
Comment 6 rhev-integ 2017-07-07 08:24:50 EDT
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'v4.19.21' doesn't contain patch 'https://gerrit.ovirt.org/77893']
gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.21

For more info please contact: rhv-devops@redhat.com
Comment 7 Nir Soffer 2017-07-07 08:35:49 EDT
Francesco, this patch is merged for some time, why it is not included in 4.19.21?
Comment 8 Allon Mureinik 2017-07-10 08:16:55 EDT
This is most definitely included in the tag:

amureini@mureinik ~/src/git/vdsm [master] $ git log -n1 160e48faa528e4d664a9388d285626cae7dbaab0
commit 160e48faa528e4d664a9388d285626cae7dbaab0
Author: Nir Soffer <nsoffer@redhat.com>
Date:   Sun May 28 19:08:44 2017 +0300

    api: Show task id in storage api logs
    
    Following storage asynchronous task (e.g. Lease.create) is hard; you
    must find the api START log, and then locate the task thread by some of
    the arguments (e.g. lease_id). However all storage api calls run in a
    task, and the task id is known during the api call.
    
    This patch adds the task id to the api logs, making it easier to follow
    the task.
    
    Here is an example api call:
    
        2017-05-28 19:32:28,094+0300 INFO  (jsonrpc/0) [vdsm.api] START deleteImage(
        sdUUID=u'aed577ea-d1ca-4ebe-af80-f852c7ce59bb', spUUID=u'6c99f4e5-8588-46f5-a818-e11151c1d19c',
        imgUUID=u'14018222-70d7-493c-b1f8-f509eca3c1ae', postZero=u'true', force=u'false', discard=False)
        from=::ffff:10.35.0.111,47418, task_id=ef3b0a5c-420e-4865-bef7-b79be89291dd (api:46)
    
    Using the task_id, we can easily locate the worker thread handling this
    request:
    
        2017-05-28 19:32:28,204+0300 INFO  (tasks/6) [storage.ThreadPool.WorkerThread]
        START task ef3b0a5c-420e-4865-bef7-b79be89291dd ...
    
    Change-Id: Iaca3c357ef78391673d69f57b5abe32a362ba9a1
    Bug-Url: https://bugzilla.redhat.com/1461295
    Signed-off-by: Nir Soffer <nsoffer@redhat.com>
amureini@mureinik ~/src/git/vdsm [master] $ git tag --contains 160e48faa528e4d664a9388d285626cae7dbaab0
v4.19.21
Comment 11 Francesco Romani 2017-07-10 09:53:34 EDT
(In reply to Allon Mureinik from comment #8)
> This is most definitely included in the tag:

Yep, just like Allon said (thanks!). Clearing NEEDINFO.
Comment 12 Lucie Leistnerova 2017-07-18 11:14:57 EDT
I'm not able to reproduce the 'Run and protect' action on my hosts. Please provide the steps. Thank you.
Comment 13 Lucie Leistnerova 2017-07-20 08:09:26 EDT
Every vdsm.api START/FINISH call has its task_id. This format replaced the 'Run and protect' prefix in patch https://gerrit.ovirt.org/#/c/76771/

So the start/finish of asynchronous action (e.g create a disk) can be correlate to 'START task' action.

verified in vdsm-4.19.22-1.el7ev.x86_64
Comment 15 errata-xmlrpc 2017-07-27 14:03:45 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1815

Note You need to log in before you can comment on or make changes to this bug.