Bug 950199

Summary: Correlation ids are being re-used and don't appear to match up correctly in the engine and vdsm logs.
Product: Red Hat Enterprise Virtualization Manager Reporter: Gordon Watson <gwatson>
Component: ovirt-engineAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED DUPLICATE QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.1.3CC: acathrow, dyasny, iheim, kroberts, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul, yzaslavs
Target Milestone: ---   
Target Release: 3.2.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: infra
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-24 07:02:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Watson 2013-04-09 20:14:46 UTC
Description of problem:

While analysing a customer's problem, I tried to use a correlation id in the engine logs to find relevant entries in the vdsm logs. What I was found was;

1) The correlation id appears to have been re-used.
2) Specific entries in the engine logs with a specific correlation id do not match anything appropiate in the vdsm logs, however other entries with this correlation id do exist in the vdsm logs.

See below for an example.


Version-Release number of selected component (if applicable):

RHEV-M:  rhevm-3.1.0-50.el6 (3.1.3)

Hosts:

There are four RHEL6.4 hosts in a cluster.

- Host-A;

Mar 11 13:19:29 Installed: vdsm-4.10.2-1.6.el6.x86_64
Mar 15 11:39:21 Updated: vdsm-4.10.2-1.8.el6ev.x86_64
Mar 28 09:32:50 Updated: libvirt-0.10.2-18.el6_4.2.x86_64


- Host-B;

Mar 11 13:14:28 Installed: vdsm-4.10.2-1.6.el6.x86_64
Mar 11 12:23:06 Updated: libvirt-0.10.2-18.el6.x86_64
Mar 15 11:54:21 Updated: vdsm-4.10.2-1.8.el6ev.x86_64


- Host-C;

current = vdsm-4.10.2-1.8.el6ev.x86_64
Mar 14 03:43:16 Updated: libvirt-0.10.2-18.el6.x86_64


- Host-D;

current = vdsm-4.10.2-1.8.el6ev.x86_64
Mar 22 13:54:59 Updated: libvirt-0.10.2-18.el6_4.2.x86_64


How reproducible:

Not.

Steps to Reproduce:
1. n/a
2.
3.
 
 
Actual results:

An example of this was when looking in the engine logs for a snapshot problem related to VM with id '18ed3b02-0a8f-4f25-87df-ce16922f01c3'. The engine log contains relevant entries with correlation id (69567332) on 2013-03-27, as shown immediately below, but none of the vdsm logs do. The vdsm logs contain entries with this correlation id for other dates for other activities, e.g. on Host-A there is an entry for 'call vmGetMigrationStatus with ('3fbd3711-05fe-47cb-aa44-1a300398760f',)". This is a different VM than the one that the snapshot was being created for. 


-  engine.log;

2013-03-27 16:21:29,362 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-84) [69567332] BaseAsyncTask::LogEndTaskFailure: Task 93a60605-f562-40de-95ce-c64faf8c21c4 (Parent Command CreateAllSnapshotsFromVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
2013-03-27 16:21:29,362 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-84) [69567332] SPMAsyncTask::PollTask: Polling task 4b676118-dba7-40fa-9496-4e30cc117208 (Parent Command CreateAllSnapshotsFromVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-03-27 16:21:29,368 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-84) [69567332] BaseAsyncTask::LogEndTaskFailure: Task 4b676118-dba7-40fa-9496-4e30cc117208 (Parent Command CreateAllSnapshotsFromVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
2013-03-27 16:21:29,368 INFO  [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-84) [69567332] SPMAsyncTask::PollTask: Polling task 985b9f5d-9dfb-489f-af00-f5256dced657 (Parent Command CreateAllSnapshotsFromVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) returned status finished, result 'cleanSuccess'.
2013-03-27 16:21:29,374 ERROR [org.ovirt.engine.core.bll.SPMAsyncTask] (QuartzScheduler_Worker-84) [69567332] BaseAsyncTask::LogEndTaskFailure: Task 985b9f5d-9dfb-489f-af00-f5256dced657 (Parent Command CreateAllSnapshotsFromVm, Parameters Type org.ovirt.engine.core.common.asynctasks.AsyncTaskParameters) ended with failure:
2013-03-27 16:21:29,374 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-84) [69567332] EntityAsyncTask::EndAction: Ending action for 5 tasks (entity ID: 18ed3b02-0a8f-4f25-87df-ce16922f01c3): calling EndAction for action type CreateAllSnapshotsFromVm.
2013-03-27 16:21:29,374 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (pool-3-thread-44) [69567332] EntityAsyncTask::EndCommandAction [within thread] context: Attempting to EndAction CreateAllSnapshotsFromVm, executionIndex: 0
2013-03-27 16:21:29,398 ERROR [org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand] (pool-3-thread-44) [69567332] Ending command with failure: org.ovirt.engine.core.bll.CreateAllSnapshotsFromVmCommand
2013-03-27 16:21:29,374 INFO  [org.ovirt.engine.core.bll.EntityAsyncTask] (QuartzScheduler_Worker-84) [69567332] EntityAsyncTask::EndActionIfNecessary: All tasks of entity 18ed3b02-0a8f-4f25-87df-ce16922f01c3 has ended -> executing EndAction



-  Host-A;

$ grep 69567332 vdsm.log*
vdsm.log:Thread-762100::DEBUG::2013-03-28 09:25:12,361::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3fbd3711-05fe-47cb-aa44-1a300398760f',) {} flowID [69567332]
vdsm.log:Thread-762101::DEBUG::2013-03-28 09:25:12,408::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetMigrationStatus with ('3fbd3711-05fe-47cb-aa44-1a300398760f',) {} flowID [69567332]
vdsm.log:Thread-762515::DEBUG::2013-03-28 09:36:16,367::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call getCapabilities with () {} flowID [69567332]
vdsm.log.36:Thread-346098::DEBUG::2013-03-22 04:04:31,273::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('96f27e3e-e9d9-4e99-8804-315921aaea71',) {} flowID [69567332]
vdsm.log.42:Thread-213938::DEBUG::2013-03-19 14:41:44,418::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('8e07e1bd-0591-4f21-8dc6-8f41abb77d16',) {} flowID [69567332]
vdsm.log.42:Thread-214135::DEBUG::2013-03-19 14:45:49,151::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('8e07e1bd-0591-4f21-8dc6-8f41abb77d16',) {} flowID [69567332]
vdsm.log.43:Thread-201692::DEBUG::2013-03-19 09:11:35,985::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3fbd3711-05fe-47cb-aa44-1a300398760f',) {} flowID [69567332]
vdsm.log.43:Thread-204145::DEBUG::2013-03-19 10:19:22,128::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3048f649-3208-4c8e-8f77-b88b813edc5d',) {} flowID [69567332]
vdsm.log.43:Thread-210333::DEBUG::2013-03-19 13:03:08,389::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('8e07e1bd-0591-4f21-8dc6-8f41abb77d16',) {} flowID [69567332]
vdsm.log.44:Thread-165176::DEBUG::2013-03-18 16:01:52,318::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('6a7a25be-007a-4a41-b226-2814039a20f6',) {} flowID [69567332]
vdsm.log.45:Thread-156135::DEBUG::2013-03-18 11:56:27,479::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('6a7a25be-007a-4a41-b226-2814039a20f6',) {} flowID [69567332]
vdsm.log.45:Thread-158840::DEBUG::2013-03-18 13:09:43,272::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('8e07e1bd-0591-4f21-8dc6-8f41abb77d16',) {} flowID [69567332]
vdsm.log.45:Thread-160346::DEBUG::2013-03-18 13:48:57,638::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('cbfb1a77-8acf-4eb5-9ce1-7b088a1379aa',) {} flowID [69567332]



-  Host-B;

$ grep 69567332 vdsm.log
Thread-665089::DEBUG::2013-03-28 09:48:53,879::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmCreate with ({'custom': {'device_29febd85-8d77-49ab-af64-1d4a6ef58233device_0290795c-e504-497e-8369-7c3237eb8e09device_c8afd480-1c6d-4bef-9b81-1b0cbf625f99': 'VmDevice {vmId=22a2c712-49e9-4efa-9b2b-d5971a1a6727, deviceId=c8afd480-1c6d-4bef-9b81-1b0cbf625f99, device=unix, type=channel, bootOrder=0, specParams={}, address={port=1, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, alias=channel0}', 'device_29febd85-8d77-49ab-af64-1d4a6ef58233device_0290795c-e504-497e-8369-7c3237eb8e09device_c8afd480-1c6d-4bef-9b81-1b0cbf625f99device_edf01b84-3ad8-4d99-b0ea-282dd8c5ee84device_75045c31-0aab-477d-843e-8e42c19c7f18': 'VmDevice {vmId=22a2c712-49e9-4efa-9b2b-d5971a1a6727, deviceId=75045c31-0aab-477d-843e-8e42c19c7f18, device=spicevmc, type=channel, bootOrder=0, specParams={}, address={port=3, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, alias=channel2}', 'device_29febd85-8d77-49ab-af64-1d4a6ef58233device_0290795c-e504-497e-8369-7c3237eb8e09': 'VmDevice {vmId=22a2c712-49e9-4efa-9b2b-d5971a1a6727, deviceId=0290795c-e504-497e-8369-7c3237eb8e09, device=virtio-serial, type=controller, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x04, function=0x0}, managed=false, plugged=true, readOnly=false, alias=virtio-serial0}', 'device_29febd85-8d77-49ab-af64-1d4a6ef58233': 'VmDevice {vmId=22a2c712-49e9-4efa-9b2b-d5971a1a6727, deviceId=29febd85-8d77-49ab-af64-1d4a6ef58233, device=ide, type=controller, bootOrder=0, specParams={}, address={bus=0x00, domain=0x0000, type=pci, slot=0x01, function=0x1}, managed=false, plugged=true, readOnly=false, alias=ide0}', 'device_29febd85-8d77-49ab-af64-1d4a6ef58233device_0290795c-e504-497e-8369-7c3237eb8e09device_c8afd480-1c6d-4bef-9b81-1b0cbf625f99device_edf01b84-3ad8-4d99-b0ea-282dd8c5ee84': 'VmDevice {vmId=22a2c712-49e9-4efa-9b2b-d5971a1a6727, deviceId=edf01b84-3ad8-4d99-b0ea-282dd8c5ee84, device=unix, type=channel, bootOrder=0, specParams={}, address={port=2, bus=0, controller=0, type=virtio-serial}, managed=false, plugged=true, readOnly=false, alias=channel1}'}, 'keyboardLayout': 'en-us', 'kvmEnable': 'true', 'pitReinjection': 'false', 'acpiEnable': 'true', 'emulatedMachine': 'rhel6.3.0', 'vmId': '22a2c712-49e9-4efa-9b2b-d5971a1a6727', 'devices': [{'device': 'qxl', 'specParams': {'vram': '65536'}, 'type': 'video', 'deviceId': '11d3ce8b-e43b-433a-8a96-82abfc5db44c', 'address': {'bus': '0x00', ' slot': '0x02', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}}, {'index': '2', 'iface': 'ide', 'address': {' controller': '0', ' target': '0', 'unit': '0', ' bus': '1', ' type': 'drive'}, 'specParams': {'path': ''}, 'readonly': 'true', 'deviceId': 'ed2eb0c1-e850-411c-b11a-5faee5a85059', 'path': '', 'device': 'cdrom', 'shared': 'false', 'type': 'disk'}, {'index': 0, 'iface': 'virtio', 'format': 'cow', 'bootOrder': '1', 'poolID': '6de1c3c1-9e75-4c61-8bc6-b8a59472f6d9', 'volumeID': '0a120f01-685e-4703-9dcc-139b735c6bbc', 'imageID': 'e439941c-6d42-488d-bde0-52f3bc9bb11f', 'specParams': {}, 'readonly': 'false', 'domainID': '1cfd5b71-6d87-4e89-ae9c-9f4cf06330bd', 'optional': 'false', 'deviceId': 'e439941c-6d42-488d-bde0-52f3bc9bb11f', 'address': {'bus': '0x00', ' slot': '0x05', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'device': 'disk', 'shared': 'false', 'propagateErrors': 'off', 'type': 'disk'}, {'nicModel': 'pv', 'macAddr': '00:1a:4a:a8:10:1f', 'network': 'rhevm', 'filter': 'vdsm-no-mac-spoofing', 'specParams': {}, 'deviceId': 'b04e0f30-7560-47f3-b410-ed5be20ccbae', 'address': {'bus': '0x00', ' slot': '0x03', ' domain': '0x0000', ' type': 'pci', ' function': '0x0'}, 'device': 'bridge', 'type': 'interface'}, {'device': 'memballoon', 'specParams': {'model': 'virtio'}, 'type': 'balloon', 'deviceId': '3bce38f4-7e0f-4563-b494-5775e12bfa0e'}], 'smp': '2', 'vmType': 'kvm', 'timeOffset': '-2', 'memSize': 1024, 'spiceSslCipherSuite': 'DEFAULT', 'cpuType': 'Nehalem', 'spiceSecureChannels': 'smain,sinputs,scursor,splayback,srecord,sdisplay,susbredir,ssmartcard', 'smpCoresPerSocket': '1', 'vmName': 'xxxxxx', 'display': 'qxl', 'transparentHugePages': 'true', 'nice': '0'},) {} flowID [69567332]

$ xzgrep 69567332 vdsm.log*xz
vdsm.log.16.xz:Thread-370015::DEBUG::2013-03-22 15:42:54,003::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3048f649-3208-4c8e-8f77-b88b813edc5d',) {} flowID [69567332]
vdsm.log.16.xz:Thread-370016::DEBUG::2013-03-22 15:42:54,022::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('5680f631-5ce4-4e98-8e8d-bc4ff64be588',) {} flowID [69567332]
vdsm.log.16.xz:Thread-370860::DEBUG::2013-03-22 16:02:26,039::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3048f649-3208-4c8e-8f77-b88b813edc5d',) {} flowID [69567332]
vdsm.log.16.xz:Thread-370924::DEBUG::2013-03-22 16:03:27,131::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3048f649-3208-4c8e-8f77-b88b813edc5d',) {} flowID [69567332]
vdsm.log.16.xz:Thread-370954::DEBUG::2013-03-22 16:03:58,087::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3048f649-3208-4c8e-8f77-b88b813edc5d',) {} flowID [69567332]
vdsm.log.16.xz:Thread-371797::DEBUG::2013-03-22 16:22:46,434::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('22a2c712-49e9-4efa-9b2b-d5971a1a6727',) {} flowID [69567332]
vdsm.log.25.xz:Thread-212800::DEBUG::2013-03-19 14:36:08,105::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('e7a281ee-1f65-4332-8746-3652d6490d66',) {} flowID [69567332]
vdsm.log.25.xz:Thread-212830::DEBUG::2013-03-19 14:36:39,374::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('e7a281ee-1f65-4332-8746-3652d6490d66',) {} flowID [69567332]
vdsm.log.27.xz:Thread-166542::DEBUG::2013-03-18 16:53:22,082::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('3fbd3711-05fe-47cb-aa44-1a300398760f',) {} flowID [69567332]
vdsm.log.28.xz:Thread-155942::DEBUG::2013-03-18 12:07:09,401::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('5680f631-5ce4-4e98-8e8d-bc4ff64be588',) {} flowID [69567332]
vdsm.log.28.xz:Thread-162027::DEBUG::2013-03-18 14:52:58,918::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('22a2c712-49e9-4efa-9b2b-d5971a1a6727',) {} flowID [69567332]



-  Host-C;

$ grep 69567332 vdsm.log
$

$ xzgrep 69567332 vdsm.log*xz
vdsm.log.27.xz:Thread-415968::DEBUG::2013-03-21 12:05:58,654::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('9252666e-2987-4357-8546-a8e1240c7de3',) {} flowID [69567332]
vdsm.log.28.xz:Thread-411311::DEBUG::2013-03-21 10:28:38,175::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('9252666e-2987-4357-8546-a8e1240c7de3',) {} flowID [69567332]



-  Host-D;

$ grep 69567332 vdsm.log
Thread-144940::DEBUG::2013-03-28 09:48:53,454::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('22a2c712-49e9-4efa-9b2b-d5971a1a6727',) {} flowID [69567332]
Thread-144941::DEBUG::2013-03-28 09:48:53,469::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmDestroy with ('22a2c712-49e9-4efa-9b2b-d5971a1a6727',) {} flowID [69567332]

$ xzgrep 69567332 vdsm.log*xz
vdsm.log.4.xz:Thread-41095::DEBUG::2013-03-26 09:49:50,168::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('5680f631-5ce4-4e98-8e8d-bc4ff64be588',) {} flowID [69567332]
vdsm.log.5.xz:Thread-1706::DEBUG::2013-03-25 15:42:04,502::BindingXMLRPC::903::vds::(wrapper) client [<ip_addr>]::call vmGetStats with ('8e07e1bd-0591-4f21-8dc6-8f41abb77d16',) {} flowID [69567332]



Expected results:

I was expecting to be able to find appropriate "errors" in the vdsm logs containing the correlation id found in the engine log.


Additional info:

Comment 3 Itamar Heim 2013-04-14 06:41:27 UTC
bug 844335 may be relevant here

Comment 4 Itamar Heim 2013-04-14 06:42:40 UTC
*** Bug 950192 has been marked as a duplicate of this bug. ***

Comment 8 Yair Zaslavsky 2013-04-22 10:34:18 UTC
From what I saw - Thre are two cases of re-use -
a. re-use is due to the fact that the child commands of CreateAllSnapshotsFromVmCommand exist.
There's no problem with that.
b. Re-use in getVmStatis (vdsm side)

Can we get engine.log and vdsm.log?

In reply to comment #3 -
Yes, this bug is relevant.