Bug 2004585 - prometheus-k8s-0 cpu usage keeps increasing for the first 3 days
Summary: prometheus-k8s-0 cpu usage keeps increasing for the first 3 days
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.9
Hardware: x86_64
OS: Linux
Target Milestone: ---
: 4.10.0
Assignee: Luis Sanchez
QA Contact: Ke Wang
Depends On:
Blocks: 2012346
TreeView+ depends on / blocked
Reported: 2021-09-15 15:33 UTC by yliu1
Modified: 2022-03-10 16:11 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2022-03-10 16:10:42 UTC
Target Upstream Version:

Attachments (Terms of Use)
prometheus pod cpu usage for the first 5 days (67.21 KB, image/png)
2021-09-15 15:33 UTC, yliu1
no flags Details

System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 1238 0 None Draft Bug 2004585: prometheus-k8s-0 cpu usage keeps increasing for the first 3 days 2021-10-05 17:42:37 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:11:15 UTC

Description yliu1 2021-09-15 15:33:31 UTC
Created attachment 1823359 [details]
prometheus pod cpu usage for the first 5 days

Description of problem:

On a freshly deployed SNO cluster, cpu usage of prometheus pod will keep increase from ~0.15 to ~0.45 for the first 3 days. It takes too long to settle.

Version-Release number of selected component (if applicable):
4.9.0 rc.0

How reproducible:

Steps to Reproduce:
1. Deploy a sno cluster
2. Deploy some workload pods
3. Observe cpu usage of each platform pod via prometheus queries for a week

Actual results:
3. prometheus-k8s-0 cpu usage kept increasing steadily for the first 3 days from ~0.15 to ~0.45, and settled on the 4th day.

Expected results:
3. It should settle earlier with lower cpu usage (maybe).

Additional info:
Prom chart for prometheus-k8s-0 pod is attached

Comment 1 Philip Gough 2021-09-15 16:09:11 UTC
Looking at the attached screenshot, I don't see any major issues.

What alarms you about the situation and what results/threshold would you expect to see instead?

Comment 2 yliu1 2021-09-15 16:58:14 UTC
We are expecting < 300 mc based on previous measurements.

Comment 15 yliu1 2021-10-26 16:37:43 UTC
Verified on 4.9.1. 
prometheus-k8s-0 pod cpu usage has been relatively stable in the past 4 days since fresh deployment. It uses ~0.17 cpu in steady state.

Comment 16 Ke Wang 2021-10-28 07:33:34 UTC
Verified on 4.10 nightly payload, prometheus-k8s-0 pod cpu usage is around 0.1 with some workload pods created by clusterbuster tool(./clusterbuster -P server -b 5 -p 2 -D .01 -M 1 -N 4 -r 4 -d 2 -c 6 -m 1000 -v -s 5 -x) in the past 2 days.

Comment 21 errata-xmlrpc 2022-03-10 16:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.