Bug 1720977 - [logging] limit getStats
Summary: [logging] limit getStats
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.3.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ovirt-4.4.0
: ---
Assignee: Marcin Sobczyk
QA Contact: Guilherme Santos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-17 03:28 UTC by Germano Veit Michel
Modified: 2020-08-04 13:27 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-08-04 13:27:06 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:3246 0 None None None 2020-08-04 13:27:38 UTC
oVirt gerrit 103110 0 'None' MERGED API: limit 'getStats' output in logs 2020-06-05 07:59:21 UTC

Description Germano Veit Michel 2019-06-17 03:28:39 UTC
Description of problem:

The following is logged in full every 15s:

2019-06-17 13:19:37,543+1000 INFO  (jsonrpc/6) [api.host] FINISH getStats return={'status': {'message': 'Done', 'code': 0}, 'info': {'cpuStatistics': {'1': {'cpuUser': '0.86', 'nodeIndex': 0, 'cpuSys': '0.53', 'cpuIdle': '98.61'}, '0': {'cpuUser': '0.27', 'nodeIndex': 0, 'cpuSys': '0.13', 'cpuIdle': '99.60'}, '3': {'cpuUser': '0.73', 'nodeIndex': 0, 'cpuSys': '0.13', 'cpuIdle': '99.14'}, '2': {'cpuUser': '0.67', 'nodeIndex': 0, 'cpuSys': '0.33', 'cpuIdle': '99.00'}}, 'numaNodeMemFree': {'0': {'memPercent': 54, 'memFree': '1820'}}, 'memShared': 0, 'thpState': 'always', 'ksmMergeAcrossNodes': True, 'vmCount': 1, 'memUsed': '36', 'storageDomains': {u'e839d116-dc89-467e-a458-178706b6d581': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00108635', 'lastCheck': '0.2', 'valid': True}, u'fc8bd719-2e0f-4c09-938e-de56265a019c': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000966798', 'lastCheck': '0.3', 'valid': True}, u'6e608093-de3e-43b0-92d5-5da642d9e526': {'code': 0, 'actual': True, 'version': 5, 'acquired': True, 'delay': '0.00106591', 'lastCheck': '0.5', 'valid': True}, u'c0339a3b-3bc0-41d0-bc04-a199762bbcd2': {'code': 0, 'actual': True, 'version': 0, 'acquired': True, 'delay': '0.000904442', 'lastCheck': '0.3', 'valid': True}}, 'incomingVmMigrations': 0, 'network': {'ovirtmgmt': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovirtmgmt', 'tx': '1172847558', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '80380098197', 'state': 'up'}, 'lo': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'lo', 'tx': '138822983', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '138822983', 'state': 'up'}, 'ovs-system': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'ovs-system', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '0', 'state': 'down'}, 'vnet0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'vnet0', 'tx': '408472', 'txDropped': '0', 'duplex': 'full', 'sampleTime': 1560741577.532669, 'rx': '36644', 'state': 'up'}, ';vdsmdummy;': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': ';vdsmdummy;', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '0', 'state': 'down'}, 'br-int': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '3', 'name': 'br-int', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '0', 'state': 'down'}, 'genev_sys_6081': {'rxErrors': '0', 'txErrors': '8', 'speed': '1000', 'rxDropped': '0', 'name': 'genev_sys_6081', 'tx': '0', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '0', 'state': 'up'}, 'eth0': {'rxErrors': '0', 'txErrors': '0', 'speed': '1000', 'rxDropped': '0', 'name': 'eth0', 'tx': '1172920444', 'txDropped': '0', 'duplex': 'unknown', 'sampleTime': 1560741577.532669, 'rx': '80481365790', 'state': 'up'}}, 'txDropped': '3', 'anonHugePages': '798', 'ksmPages': 100, 'elapsedTime': '250139.71', 'cpuLoad': '0.03', 'cpuSys': '0.28', 'diskStats': {'/var/log': {'free': '95208'}, '/var/run/vdsm/': {'free': '1893'}, '/tmp': {'free': '95208'}}, 'cpuUserVdsmd': '0.53', 'netConfigDirty': 'False', 'memCommitted': 2048, 'ksmState': False, 'vmMigrating': 0, 'ksmCpu': 0, 'memAvailable': 2808, 'bootTime': '1559868049', 'haStats': {'active': False, 'configured': False, 'score': 0, 'localMaintenance': False, 'globalMaintenance': False}, 'momStatus': 'active', 'multipathHealth': {}, 'rxDropped': '0', 'outgoingVmMigrations': 0, 'swapTotal': 4095, 'swapFree': 4095, 'hugepages': defaultdict(<type 'dict'>, {2048: {'resv_hugepages': 0, 'free_hugepages': 0, 'nr_overcommit_hugepages': 0, 'surplus_hugepages': 0, 'vm.free_hugepages': 0, 'nr_hugepages': 0, 'nr_hugepages_mempolicy': 0}}), 'dateTime': '2019-06-17T03:19:37 GMT', 'cpuUser': '0.63', 'memFree': 2552, 'cpuIdle': '99.09', 'vmActive': 1, 'v2vJobs': {}, 'cpuSysVdsmd': '0.20'}} from=::ffff:10.64.24.30,57500 (api:54)

I don't think this data needs to be logged on every call, it could always be supressed or sent to DEBUG level, or logged every N calls like some other apis. 
Also hypervisors usually have other tools capturing performance metrics (i.e. systat, pcp...), which can be more detailed than 15s and easier to parse and work with. A good chunk ends up in the DWH/metrics anyway if we need to refer to the past with data provided by VDSM.

It produces a considerable number of lines in VDSM logs:
# grep getStats /var/log/vdsm/vdsm.log | wc -l
3132

Version-Release number of selected component (if applicable):
vdsm-4.30.13-4.el7ev.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Observe VM logs of an activated host.

Comment 1 Marcin Sobczyk 2019-06-27 09:22:34 UTC
This log line is created by a generic part of code - it handles logging results of all the calls to VDSM's API - there is no easy way to filter them out. Lowering the logging level is not really an option - API calls results' are very useful when debugging issues with your host. AFAIK these logs are also parsed by external entities so shortening the output is also out. To sum up - despite the amount of data the log lines produce, I think we should stay with that we have right now.

Comment 2 Germano Veit Michel 2019-06-27 22:26:41 UTC
Isn't this created by the same generic code?
2019-06-26 18:01:03,781-0400 INFO  (jsonrpc/2) [api.host] FINISH getAllVmStats return={'status': {'message': 'Done', 'code': 0}, 'statsList': (suppressed)}

What external entities parse this?

If it cannot be done feel free to close it.

Comment 3 Marcin Sobczyk 2019-07-18 06:36:18 UTC
One learns something new every day :) Turns out we have a class in VDSM that allows us to wrap logged objects to suppress the output. I can try to implement a solution, that would only include the 'info' part of 'getStatus' call when logging with DEBUG level.

> What external entities parse this?

I had a chat once with Shirly Radco about rsyslog crawling over our log files.

Shirly, do you find change like this acceptable?

Comment 4 Shirly Radco 2019-08-14 13:46:14 UTC
I dont see an issue with this change

Comment 5 Michal Skrivanek 2019-09-11 13:29:20 UTC
meh...not sure really...if the only thing you get is the vdsm.log this is the only information we have about the host's condition. We did suppress the vm stats because they are really excessive, but we at least log how many vms are being returned (on engine side)

Comment 6 Germano Veit Michel 2019-09-11 22:18:30 UTC
(In reply to Michal Skrivanek from comment #5)
> meh...not sure really...if the only thing you get is the vdsm.log this is
Unlikely to get just vdsm.log. Worst case scenario we ask for a tar of /var/log/, which is likely to include sar.
I don't find these stats useful, they are too basic.

For some reason gerrit was not attached, change is already merged.

Comment 7 RHV bug bot 2019-10-22 17:25:47 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 8 RHV bug bot 2019-10-22 17:39:38 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 9 RHV bug bot 2019-10-22 17:46:49 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 10 RHV bug bot 2019-10-22 18:02:38 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 11 RHV bug bot 2019-11-19 11:52:58 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 12 RHV bug bot 2019-11-19 12:03:01 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 13 RHV bug bot 2019-12-13 13:17:31 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 14 RHV bug bot 2019-12-20 17:46:40 UTC
WARN: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (ON_QA) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 15 Guilherme Santos 2020-01-03 21:04:27 UTC
Verified on:
vdsm-4.40.0-164.git38a19bb.el8ev.x86_64
ovirt-host-4.4.0-0.2.alpha.el8ev.x86_64

Steps:
1. Check logs on /var/log/vdsm/vdsm.log

Results:
getStats INFO log suppressed:
2020-01-03 16:33:46,773+0100 INFO  (jsonrpc/5) [api.host] FINISH getStats return={'status': {'code': 0, 'message': 'Done'}, 'info': (suppressed)} from=::ffff:10.37.137.216,52406 (api:54)

Comment 16 RHV bug bot 2020-01-08 14:50:08 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 17 RHV bug bot 2020-01-08 15:20:27 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 18 RHV bug bot 2020-01-24 19:51:52 UTC
WARN: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Found non-acked flags: '{}', ]

For more info please contact: rhv-devops

Comment 21 errata-xmlrpc 2020-08-04 13:27:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246


Note You need to log in before you can comment on or make changes to this bug.