Hide Forgot
+++ This bug was initially created as a clone of Bug #2025230 +++ Description of problem: The ClusterAutoscalerUnschedulablePods created by this component doesn't describe an actual problematic condition that requires human action. Per the documentation: > In many cases this alert is normal and expected depending on the configuration of the autoscaler. This doesn't meet the warning alert criteria: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#warning-alerts Version-Release number of selected component (if applicable): 4.9 How reproducible: Consistent Steps to Reproduce: 1. 2. 3. Actual results: warning level alert that doesn't require human interaction Expected results: warning and critical level alerts require human interaction Additional info:
Set up cluster using cluster-bot with https://github.com/openshift/cluster-autoscaler-operator/pull/230 Verified severity of ClusterAutoscalerUnschedulablePods alert is "info" now. liuhuali@Lius-MacBook-Pro huali-test % oc create -f clusterautoscale.yaml clusterautoscaler.autoscaling.openshift.io/default created liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule NAME AGE cluster-autoscaler-default 17s machine-api-operator-prometheus-rules 24m liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule cluster-autoscaler-default -o yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2021-11-24T01:50:58Z" generation: 1 labels: prometheus: k8s role: alert-rules name: cluster-autoscaler-default namespace: openshift-machine-api ownerReferences: - apiVersion: autoscaling.openshift.io/v1 blockOwnerDeletion: true controller: true kind: ClusterAutoscaler name: default uid: b2ba55d9-6674-4fa8-9a0f-0770f9899dbb resourceVersion: "26060" uid: ad2cc225-e346-4b36-bbb0-4dcb9db342a7 spec: groups: - name: general.rules rules: - alert: ClusterAutoscalerUnschedulablePods annotations: message: Cluster Autoscaler has {{ $value }} unschedulable pods expr: cluster_autoscaler_unschedulable_pods_count{service="cluster-autoscaler-default"} > 0 for: 20m labels: severity: info - alert: ClusterAutoscalerNotSafeToScale annotations: message: Cluster Autoscaler is reporting that the cluster is not ready for scaling expr: cluster_autoscaler_cluster_safe_to_autoscale{service="cluster-autoscaler-default"} != 1 for: 15m labels: severity: warning - alert: ClusterAutoscalerUnableToScaleCPULimitReached annotations: message: Cluster Autoscaler has reached its CPU core limit and is unable to scale out expr: cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"} for: 15m labels: severity: info - alert: ClusterAutoscalerUnableToScaleMemoryLimitReached annotations: message: Cluster Autoscaler has reached its Memory bytes limit and is unable to scale out expr: cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"} for: 15m labels: severity: info liuhuali@Lius-MacBook-Pro huali-test %
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest True False 2m16s Cluster version is 4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.9.10 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4889