Bug 1996785
| Summary: | Unused rules in CMO | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Haoyu Sun <hasun> |
| Component: | Monitoring | Assignee: | Haoyu Sun <hasun> |
| Status: | CLOSED ERRATA | QA Contact: | hongyan li <hongyli> |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.9 | CC: | amuller, anpicker, aos-bugs, arajkuma, erooth, hongyli, mrobson |
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-18 17:47:55 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Haoyu Sun
2021-08-23 16:37:00 UTC
$ oc -n openshift-monitoring get cm prometheus-k8s-rulefiles-0 -oyaml|grep -E '.yaml|build_error_rate|cluster_quantile:apiserver_request_duration_seconds:histogram_quantile|code:registry_api_request_count:rate:sum|instance:node_cpu:ratio|kube_pod_status_ready:etcd:sum|kube_pod_status_ready:image_registry:sum|namespace:container_spec_cpu_shares:sum|node:node_num_cpu:sum|pod:container_spec_cpu_shares:sum'
openshift-kube-apiserver-kube-apiserver-slos.yaml: |
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
$ oc -n openshift-monitoring get cm telemetry-config -oyaml|grep -E 'build_error_rate|cluster_quantile:apiserver_request_duration_seconds:histogram_quantile|code:registry_api_request_count:rate:sum|instance:node_cpu:ratio|kube_pod_status_ready:etcd:sum|kube_pod_status_ready:image_registry:sum|namespace:container_spec_cpu_shares:sum|node:node_num_cpu:sum|pod:container_spec_cpu_shares:sum'
no result
$ for i in $(oc -n openshift-monitoring get cm | grep grafana | awk '{print $1}'); do echo $i; oc -n openshift-monitoring get cm $i -oyaml | grep -E "build_error_rate|cluster_quantile:apiserver_request_duration_seconds:histogram_quantile|code:registry_api_request_count:rate:sum|instance:node_cpu:ratio|kube_pod_status_ready:etcd:sum|kube_pod_status_ready:image_registry:sum|namespace:container_spec_cpu_shares:sum|node:node_num_cpu:sum|pod:container_spec_cpu_shares:sum"; echo -e "\n"; done
grafana-dashboard-cluster-total
no result
$ oc -n openshift-monitoring get cm prometheus-k8s-rulefiles-0 -oyaml|grep -E -C20 cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
---
- expr: |
histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET"}[5m]))) > 0
labels:
quantile: "0.99"
verb: read
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
- expr: |
histogram_quantile(0.99, sum by (le, resource) (rate(apiserver_request_duration_seconds_bucket{job="apiserver",verb=~"POST|PUT|PATCH|DELETE"}[5m]))) > 0
labels:
quantile: "0.99"
verb: write
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
- expr: |
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT"}[5m])) without(instance, pod))
labels:
quantile: "0.99"
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
- expr: |
histogram_quantile(0.9, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT"}[5m])) without(instance, pod))
labels:
quantile: "0.9"
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
- expr: |
histogram_quantile(0.5, sum(rate(apiserver_request_duration_seconds_bucket{job="apiserver",subresource!="log",verb!~"LIST|WATCH|WATCHLIST|DELETECOLLECTION|PROXY|CONNECT"}[5m])) without(instance, pod))
labels:
quantile: "0.5"
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
$ oc -n openshift-kube-apiserver get prometheusrules kube-apiserver-slos -oyaml | grep "cluster_quantile:apiserver_request_duration_seconds:histogram_quantile"
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
record: cluster_quantile:apiserver_request_duration_seconds:histogram_quantile
The above recording rules under namespace openshift-kube-apiserver that are not unused in alerts, telemetry metrics, console, or dashboard definitions are wasting resources in Openshift Cluster, they should be removed, but they are not shipped by monitoring, will file a new bug.
Test with payload 4.9.0-0.nightly-2021-09-05-192114 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |