Bug 1953105
Summary: | RHCOS system components registered a 3.5x increase in CPU use over an e2e run before and after 4/9 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | RHCOS | Assignee: | Timothée Ravier <travier> |
Status: | CLOSED ERRATA | QA Contact: | Michael Nguyen <mnguyen> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.8 | CC: | dornelas, jligon, keyoung, lucab, miabbott, mrussell, nstielau, travier, wking |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 23:03:30 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Clayton Coleman
2021-04-23 21:45:19 UTC
Of note, it appears the CPU not changed to system.slice or kubepods.slice on a node went from about 79.8 millicore to 89.6 millicore (query is (sum by (id) (container_cpu_usage_seconds_total{container="",id="/"}) - scalar(sum by (id) (container_cpu_usage_seconds_total{container="",id="/system.slice"})) - scalar(sum by (id) (container_cpu_usage_seconds_total{container="",id="/kubepods.slice"}))) / 60 / 60 / 6) Is that pure kernel CPU? Is a 10% regression in steady state in the kernel something we should notice (good chance whatever udisks or auditd are doing may be related of course)? > Top users after (4/23)
>
> {id="/system.slice/udisks2.service"}
> 0.23363467572083332
For reference, this is from `udisks2-2.9.0-6.el8`.
The service unit is a new one coming with 8.4 content, as I don't see the package in previous RHCOS images based on 8.3 content (thus it wasn't running at all in the "before" measurement).
So far I haven't found an explicit reference to it in RHCOS manifests, so I think this is coming in as a transitive dependency of some other RPMs.
It seems to have first appeared in `48.84.202104131710-0`.
Will be included in next openshift/os bump. `udisks2` was removed from RHCOS 4.8 in 48.84.202105071421-0 Newer builds of RHCOS 4.8 have been included in the OCP 4.8 nightly payloads, so moving this to MODIFIED. Verified on 4.8.0-0.nightly-2021-05-21-200728. udisks2 is removed from RHCOS 48.84.202105211054-0. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-21-200728 True False 10m Cluster version is 4.8.0-0.nightly-2021-05-21-200728 $ oc get nodes NAME STATUS ROLES AGE VERSION ci-ln-d7rw6f2-f76d1-9wg8c-master-0 Ready master 28m v1.21.0-rc.0+c656d63 ci-ln-d7rw6f2-f76d1-9wg8c-master-1 Ready master 28m v1.21.0-rc.0+c656d63 ci-ln-d7rw6f2-f76d1-9wg8c-master-2 Ready master 28m v1.21.0-rc.0+c656d63 ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx Ready worker 21m v1.21.0-rc.0+c656d63 ci-ln-d7rw6f2-f76d1-9wg8c-worker-c-vmrbc Ready worker 21m v1.21.0-rc.0+c656d63 ci-ln-d7rw6f2-f76d1-9wg8c-worker-d-j5l8b Ready worker 21m v1.21.0-rc.0+c656d63 $ oc debug node/ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx Starting pod/ci-ln-d7rw6f2-f76d1-9wg8c-worker-b-7hgrx-debug ... To use host binaries, run `chroot /host` If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# rpm -qa | grep udisk sh-4.4# rpm-ostree status State: idle Deployments: * pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f620068b78e684b615ac01c5b79d6043bee9727644b1a976d45ae023d49fa850 CustomOrigin: Managed by machine-config-operator Version: 48.84.202105211054-0 (2021-05-21T10:58:00Z) ostree://92ede04b462bc884de5562062fb45e06d803754cbaa466e3a2d34b4ee5e9634b Version: 48.84.202105190318-0 (2021-05-19T03:22:10Z) sh-4.4# exit exit sh-4.2# exit exit Removing debug pod ... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |