Bug 1953105
| Summary: | RHCOS system components registered a 3.5x increase in CPU use over an e2e run before and after 4/9 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | RHCOS | Assignee: | Timothée Ravier <travier> |
| Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.8 | CC: | dornelas, jligon, keyoung, lucab, miabbott, mrussell, nstielau, travier, wking |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:03:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2021-04-23 21:45:19 UTC
Of note, it appears the CPU not changed to system.slice or kubepods.slice on a node went from about 79.8 millicore to 89.6 millicore
(query is (sum by (id) (container_cpu_usage_seconds_total{container="",id="/"}) - scalar(sum by (id) (container_cpu_usage_seconds_total{container="",id="/system.slice"})) - scalar(sum by (id) (container_cpu_usage_seconds_total{container="",id="/kubepods.slice"}))) / 60 / 60 / 6)
Is that pure kernel CPU? Is a 10% regression in steady state in the kernel something we should notice (good chance whatever udisks or auditd are doing may be related of course)?
> Top users after (4/23)
>
> {id="/system.slice/udisks2.service"}
> 0.23363467572083332
For reference, this is from `udisks2-2.9.0-6.el8`.
The service unit is a new one coming with 8.4 content, as I don't see the package in previous RHCOS images based on 8.3 content (thus it wasn't running at all in the "before" measurement).
So far I haven't found an explicit reference to it in RHCOS manifests, so I think this is coming in as a transitive dependency of some other RPMs.
It seems to have first appeared in `48.84.202104131710-0`.
Will be included in next openshift/os bump. `udisks2` was removed from RHCOS 4.8 in 48.84.202105071421-0 Newer builds of RHCOS 4.8 have been included in the OCP 4.8 nightly payloads, so moving this to MODIFIED. Verified on 4.8.0-0.nightly-2021-05-21-200728. udisks2 is removed from RHCOS 48.84.202105211054-0.
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-05-21-200728 True False 10m Cluster version is 4.8.0-0.nightly-2021-05-21-200728
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-ln-d7rw6f2-f76d1-9wg8c-master-0 Ready master 28m v1.21.0-rc.0+c656d63
ci-ln-d7rw6f2-f76d1-9wg8c-master-1 Ready master 28m v1.21.0-rc.0+c656d63
ci-ln-d7rw6f2-f76d1-9wg8c-master-2 Ready master 28m v1.21.0-rc.0+c656d63
ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx Ready worker 21m v1.21.0-rc.0+c656d63
ci-ln-d7rw6f2-f76d1-9wg8c-worker-c-vmrbc Ready worker 21m v1.21.0-rc.0+c656d63
ci-ln-d7rw6f2-f76d1-9wg8c-worker-d-j5l8b Ready worker 21m v1.21.0-rc.0+c656d63
$ oc debug node/ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx
Starting pod/ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# rpm -qa | grep udisk
sh-4.4# rpm-ostree status
State: idle
Deployments:
* pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f620068b78e684b615ac01c5b79d6043bee9727644b1a976d45ae023d49fa850
CustomOrigin: Managed by machine-config-operator
Version: 48.84.202105211054-0 (2021-05-21T10:58:00Z)
ostree://92ede04b462bc884de5562062fb45e06d803754cbaa466e3a2d34b4ee5e9634b
Version: 48.84.202105190318-0 (2021-05-19T03:22:10Z)
sh-4.4# exit
exit
sh-4.2# exit
exit
Removing debug pod ...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |