Bug 1437883 - RHV Manager sometimes shows a wrong value of CPU usage of host
Summary: RHV Manager sometimes shows a wrong value of CPU usage of host
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.7
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ovirt-4.2.0
: ---
Assignee: Andrej Krejcir
QA Contact: Liran Rotenberg
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-31 12:11 UTC by Nirav Dave
Modified: 2021-06-10 12:08 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-05-15 17:41:54 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3302741 0 None None None 2017-12-28 08:34:57 UTC
Red Hat Product Errata RHEA-2018:1488 0 None None None 2018-05-15 17:43:01 UTC
oVirt gerrit 76138 0 master MERGED stats: Avoid possible division by zero in CPU monitoring 2021-02-15 17:17:25 UTC

Description Nirav Dave 2017-03-31 12:11:42 UTC
Description of problem:

The cpu stats in the notifier logs on RHEV-M is showing a value beyond the threhold of 100%.


Version-Release number of selected component (if applicable):

It is RHEV-M 4.0.0


How reproducible:
It is not persistent, but it shows sometimes under heavy load.

Actual results:
2017-03-28 11:56:27.106+02    | Used CPU of host <host_name> [2147483647%] exceeded defined threshold [90%].

Expected results:
The cpu load value should be under 100% and should not show such a big number

Additional info:
The vdsm on host is sending the correct value but it may be a calculation issue at ovirt-engine side.

Comment 2 Nirav Dave 2017-03-31 18:13:52 UTC
Hello,

Please do let know if more info is needed on the description, but I have only notifier logs which gives %load beyond the threshold limit(100%)

Comment 4 Nirav Dave 2017-04-06 12:46:06 UTC
Hello,

I have shared required logs internally. 

Thanks,
Nirav Dave

Comment 5 Nirav Dave 2017-04-06 12:50:29 UTC
(In reply to Nirav Dave from comment #4)
> Hello,
> 
> I have shared required logs internally. 
> 
> Thanks,
> Nirav Dave

I have shared with Yaniv Kaul.

Comment 7 Martin Sivák 2017-04-19 10:08:27 UTC
Andrej, please verify we still have the code in 4.1. I suspect there is a division by zero and negative infinity somewhere.

Comment 8 Andrej Krejcir 2017-04-27 12:12:46 UTC
I have not found any calculations with cpu usage in the engine. It just stores the values and displays them, no division.

Probably VDSM sometimes sent large values of 'cpuUser' or 'cpuSys'.

Comment 9 Nirav Dave 2017-04-27 12:20:15 UTC
Hi Andrej,

Thanks for the update. 

If VDSM is sending the larger values can we have catch in ovirt code to rectify the invalid values, probably the out range value exception so that we know that VDSM is sending such values.

Thanks,
Nirav Dave

Comment 11 Andrej Krejcir 2017-05-03 12:50:32 UTC
The vdsStats file contains just the current state of host statistics.
VDSM logs with DEBUG level from the time when the bug happened would be useful
to see if the VDSM really sends wrong data.

Comment 17 Andrej Krejcir 2017-08-09 13:41:32 UTC
The VDSM returns incorrect values for 'cpuUser', 'cpuSys' and 'cpuIdle':

- at 2017-07-06 11:09:45,127 in vdsm.log.7:
  - cpuUser = -1259180371.05
  - cpuSys  = -1259180370.17
  - cpuIdle =  2518360841.23

- at 2017-07-06 16:00:09,363 in vdsm.log.3: 
  - cpuUser = 2924878764.12
  - cpuSys  = 2924878764.12
  - cpuIdle = 0.00

It is probably a division by zero, which is fixed by the patch.

Comment 21 rhev-integ 2017-11-02 13:37:50 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No relevant external trackers attached]

For more info please contact: rhv-devops

Comment 22 rhev-integ 2017-11-02 21:08:07 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 26 Liran Rotenberg 2017-11-29 13:27:24 UTC
Verified on:
4.2.0-0.5.master.el7

Step of verification:
1. Load up the Host CPU.
2. Accumulate VDSM log and check the cpuUser, cpuSys and cpuIdle values.
3. See that the values are between 0 to 100.

I ran the case and checked the values on the log and vdsm-client Host getStats. The check was made on 3 different host_sample_stats_interval vdsm timing.
The default(15), on 0 and on 45.

The CPU values were in the range of 0 to 100.

Comment 27 RHV bug bot 2017-12-06 16:14:48 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 28 RHV bug bot 2017-12-06 16:50:55 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 29 RHV bug bot 2017-12-12 21:14:23 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 30 RHV bug bot 2017-12-18 17:04:49 UTC
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed:

[Project 'vdsm'/Component 'ovirt-engine' mismatch]

For more info please contact: rhv-devops

Comment 33 errata-xmlrpc 2018-05-15 17:41:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 34 Franta Kust 2019-05-16 13:07:46 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.