Bug 1951705
| Summary: | kube-apiserver needs alerts on CPU utlization | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | David Eads <deads> |
| Component: | kube-apiserver | Assignee: | David Eads <deads> |
| Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.8 | CC: | aos-bugs, mfojtik, xxia |
| Target Milestone: | --- | ||
| Target Release: | 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-27 23:02:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
David Eads
2021-04-20 19:09:48 UTC
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-04-30-201824 True False 88m Cluster version is 4.8.0-0.nightly-2021-04-30-201824
$ oc get prometheusrules.monitoring.coreos.com/cpu-utilization -n openshift-kube-apiserver -oyaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
creationTimestamp: "2021-05-06T02:03:48Z"
generation: 1
managedFields:
- apiVersion: monitoring.coreos.com/v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
.: {}
f:groups: {}
manager: cluster-kube-apiserver-operator
operation: Update
time: "2021-05-06T02:03:48Z"
name: cpu-utilization
namespace: openshift-kube-apiserver
resourceVersion: "17201"
uid: ce838b0c-07b0-47e7-a915-4ab4ef4ce2a4
spec:
groups:
- name: control-plane-cpu-utilization
rules:
- alert: HighOverallControlPlaneCPU
expr: |
sum(
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
AND on (instance) label_replace( kube_node_role{role="master"}, "instance", "$1", "node", "(.+)" )
)
/
count(kube_node_role{role="master"})
> 60
for: 10m
labels:
severity: warning
- alert: ExtremelyHighIndividualControlPlaneCPU
expr: |
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 90 AND on (instance) label_replace( kube_node_role{role="master"}, "instance", "$1", "node", "(.+)" )
for: 5m
labels:
severity: critical
The HighOverallControlPlaneCPU has already been applied, also can be seen in web-console.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |