Bug 2006970 - Percent CPU frequently goes above 100
Summary: Percent CPU frequently goes above 100
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd-libpod-stats
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z2
: 16.2 (Train on RHEL 8.4)
Assignee: Nobody
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-22 18:21 UTC by Paul Leimer
Modified: 2022-03-23 22:12 UTC (History)
5 users (show)

Fixed In Version: collectd-libpod-stats-1.0.4-1.el8ost
Doc Type: Bug Fix
Doc Text:
In cases where high CPU use was monitored in a multi-core system, the calculation for CPU use was inaccurate. + With this update, the calculation of CPU use in a multi-core scenario is now accurate. The latest STF dashboards have been adjusted to incorporate this update.
Clone Of:
Environment:
Last Closed: 2022-03-23 22:11:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github infrawatch collectd-libpod-stats pull 5 0 None open Account for counter rollover 2021-10-06 18:06:08 UTC
Github infrawatch dashboards pull 40 0 None open Limit CPU percent panels in cloud view to 100% 2021-09-30 16:52:38 UTC
Red Hat Issue Tracker OSP-9828 0 None None None 2021-11-15 12:43:50 UTC
Red Hat Product Errata RHBA-2022:1001 0 None None None 2022-03-23 22:12:05 UTC

Description Paul Leimer 2021-09-22 18:21:14 UTC
Description of problem:

Percent CPU frequently goes above 100%, often reaching levels far beyond 1000000%. Typically this happens in a very short time interval and creates a peak in a graph 

Steps to Reproduce:

View STF cloud dashboards after a couple of days of monitoring a cloud and these peaks can be seen

Comment 3 Paul Leimer 2021-09-30 17:13:03 UTC
I consider the dashboard adjustment in GH pull #40 workaround to mitigate the effects of this bug in dashboards. A patch to libpod-stats must still be completed.

Comment 4 Paul Leimer 2021-10-01 14:39:17 UTC
Reproduce this bug locally:

1. Launch collectd with the libpod-stats plugin loaded, be sure that collectd is writing to a data store like Prometheus so that metrics can be graphed
2. Start another container that was not previously running
3. CPU percentage calculations for the container in step 2 will spike

Comment 5 Paul Leimer 2021-10-06 18:06:09 UTC
After further evaluation, the above process does not necessarily reproduce the bug. Rather, it demonstrates a > 100% usage when multiple cores are working hard at once, in which case >100% is expected behavior.

The real bug is suspected to be because of the usage of unsigned integers to calculate difference between cpu utilization at different points in time. If a counter resets, the numerator might result in a unsigned int where the high order bits have been flipped (as a result of 2's-complement). This has been fixed upstream in the attached PR.

Comment 16 errata-xmlrpc 2022-03-23 22:11:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.2), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1001


Note You need to log in before you can comment on or make changes to this bug.