Bug 1910006
Summary: | Accounting of steal time as CPU usage | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | wvoesch |
Component: | Monitoring | Assignee: | Jayapriya Pai <janantha> |
Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
Severity: | low | Docs Contact: | |
Priority: | low | ||
Version: | 4.7 | CC: | alegrand, alklein, anpicker, brueckner, danijel.soldo, danili, dgrisonn, erooth, Holger.Wolf, hongyli, janantha, juzhao, jwiedman, kakkoyun, lcosic, pkrupa, spasquie, tstaudt |
Target Milestone: | --- | Keywords: | EasyFix |
Target Release: | 4.7.0 | Flags: | janantha:
needinfo-
|
Hardware: | s390x | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-26 17:35:21 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1878766 | ||
Bug Blocks: | 1903544 |
Description
wvoesch
2020-12-22 10:03:40 UTC
> Observe that the steal time will be counted as CPU usage of node A. Where do you observe this? After including this change in kube-prometheus [1] and propagating it to cluster-monitoring-operator [2] we no longer treat steal time as part of CPU usage (result of [3]). Plus if you are using `instance:node_cpu:rate:sum` recording rule, then CPU usage is counted as: `node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}` averaged over last 3 minutes [1]: https://github.com/prometheus-operator/kube-prometheus/commit/87ddb30a41253dce66bde0006634f30817ccb07a [2]: https://github.com/openshift/cluster-monitoring-operator/pull/993 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1878766 Hi Pawel, I observe this in the WebUI overview for a particular node: https://console-openshift-console.apps.<cluster-name>.<domain>/k8s/cluster/nodes/<worker>.<cluster-name>.<domain> The version I observed this was: 4.7.0-0.nightly-s390x-2020-12-15-081322 Hi Jayapriya, could you please specify which information you need? Have no s390x machine, can't test. Tried to test on AWS, deployed app which need 7CPU on a node with 4CPU, all the pods are running and use up all 4 CPU, but didn't see CPU steal time on all the other nodes. Wait for wvoesch to verify. Wolfgang can you help to check in s390x machines, we don't have the platform, and the issue is not happen with AWS/GCP Making Jinqi's request un-private as Wolfgang is a Partner Engineer and cannot see private comment(s) tested with 4.8.0-0.nightly-2021-06-10-071057, steal time is removed from CPU usage - expr: sum(rate(node_cpu_seconds_total{mode!="idle",mode!="iowait",mode!="steal"}[3m])) BY (instance) record: instance:node_cpu:rate:sum Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.21 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2762 |