Bug 1339045 - [RFE] Provide a way to correlate live migration flow from engine.log to vdsm.log
Summary: [RFE] Provide a way to correlate live migration flow from engine.log to vdsm.log
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: RFEs
Version: 3.6.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Moran Goldboim
QA Contact: Gil Klein
URL:
Whiteboard:
Depends On:
Blocks: 902971 1425219
TreeView+ depends on / blocked
 
Reported: 2016-05-24 03:22 UTC by Marina Kalinin
Modified: 2019-11-14 14:51 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-23 17:56:51 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Marina Kalinin 2016-05-24 03:22:00 UTC
Right now, there is no way to correlate between vm live migration flows from engine.log to vdsm.log. And it is very inconvenient analyzing customer cases with vm migration issues.

For example:
engine.log:
~~~
2016-05-20 14:24:57,183 INFO  [org.ovirt.engine.core.bll.MigrateVmCommand] (org.ovirt.thread.pool-7-thread-36) [1fa81e9a] Running command: MigrateVmCommand internal: false. Entities affected :  ID: b29a85df-9125-4502-950e-b3d98641c2b2 Type: VMAction group MIGRATE_VM with role type USER
~~~
So we have correlation id, however, searching vdsm.logs with that value returns nothing.
After searching by VM UUID and migration key word, it is possible to find multiple threads related, but no direct correlation. And it makes it very hard processing.

vdsm.log on source:
~~~
vdsm.log.21.xz:jsonrpc.Executor/3::DEBUG::2016-05-20 17:58:35,516::__init__::503::jsonrpc.JsonRpcServer::(_serveRequest) Calling 'VM.migrate' in bridge with {u'params': {u'tunneled': u'false', u'dstqemu': u'10.10.10.10', u'src': u'source_host', u'dst': u'dest_host:54321', u'vmId': u'b29a85df-9125-4502-950e-b3d98641c2b2', u'abortOnError': u'true', u'method': u'online'}, u'vmID': u'b29a85df-9125-4502-950e-b3d98641c2b2'}
..
Thread-1023795::DEBUG::2016-05-20 17:58:35,547::migration::148::virt.vm::(_setupVdsConnection) vmId=`b29a85df-9125-4502-950e-b3d98641c2b2`::Initiating connection with destination
~~~

If there was some unique denominator to search vdsm logs for, could be much more convenient. Right now migration flow is spread over multiple threads and without unique identifier, it is like shooting in the dark.

Comment 1 Michal Skrivanek 2016-05-24 07:58:20 UTC
generally you're right, but the current logs really should be enough in most cases.
Assuming time is in sync (and that is a requirement we raise an alarm for when not fulfilled) there is only a handful of migration calls for that specific VM. It takes minutes for migration to fail and reissue a new one for the same VM. So "starting with engine.log, getting the timestamp, grepping for migrate verb in vdsm with that VM id" should work good enough, IMHO.

We also have the correlation id on engine side, slowly went out of use (not sure if it's usable at all anymore). We can use that, but it means yet another parameter for migration for no other good reason. 
Dan, what do you think?

Comment 2 Dan Kenigsberg 2016-05-24 10:53:48 UTC
I would love to see Engine's Correlation ID passed to Vdsm and logged in each and every verb. I have a vague memory that we had that with xmlrpc, and used html headers to carry it out of band.

Piotr, I believe that this can be easily introduced in jsonrpc as a special argument named _correlation_id that is logged by jsonrpc, and is not passed to any verb. What do you think?

Comment 3 Piotr Kliczewski 2016-05-24 11:03:32 UTC
The change is not complex and could be introduced but sometime ago there was a decision not to provide it. This RFE is duplicate of [1].


[1] https://bugzilla.redhat.com/1112120

Comment 4 Michal Skrivanek 2016-05-24 11:10:53 UTC
Hi Marina, do you want to reopen the discussion in bug 1112120? It is indeed about the same thing, so I would close this one as a duplicate.

Comment 5 Marina Kalinin 2016-05-24 15:26:47 UTC
I would be happy to bring correlation id back. (I never supported the original decision).
In some flows, it is possible to track flows between engine and vdsm by task id, but this is only for storage async tasks. So - insufficient.
Going through 100 vdsm logs and searching by entity id is a pain, even though if it is time limited. Going by unique correlation id would make support job significantly easier.

Comment 6 Robert McSwain 2016-05-26 15:36:47 UTC
+1 to this. Anything that makes sorting through the vdsm.log files faster and more efficiently is a major help to us in CEE. I'm with mku that bringing the correlation ID back would be an excellent solution.

Comment 7 Michal Skrivanek 2016-06-17 11:44:49 UTC
bug 1112120 reopened, let's move discussion back there

*** This bug has been marked as a duplicate of bug 1112120 ***

Comment 8 Marina Kalinin 2016-06-20 02:17:00 UTC
Since the duplicate bug is closed, reopening this RFE.
It should be open for tracking and making sure we are not missing it in the next release.

Comment 9 Michal Skrivanek 2016-06-20 09:00:29 UTC
Please see more details in bug 1112120

Comment 11 Marina Kalinin 2017-02-13 22:34:11 UTC
Michal,

Are there any changes in the recent logging mechanism, that maybe this bug is already fixed or no?

Comment 14 Yaniv Kaul 2017-06-07 19:08:25 UTC
Piotr, the correlation ID work is helping here?

Comment 15 Piotr Kliczewski 2017-06-08 13:40:41 UTC
We could add correlation id to the logs in live migration flow but with correlation id implementation we have it is easy to correlate engine and vdsm logs now. Please see more details in bug 1112120.

Comment 20 Yaniv Kaul 2018-04-22 06:20:03 UTC
I think the request has been satisfied?

Comment 21 Marina Kalinin 2018-04-23 17:56:51 UTC
Yes.
Looks beautiful.
Gordon, thanks for checking this for me.

Comment 22 Franta Kust 2019-05-16 13:07:41 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.