Hide Forgot
Description of problem: The ClusterAutoscalerUnschedulablePods created by this component doesn't describe an actual problematic condition that requires human action. Per the documentation: > In many cases this alert is normal and expected depending on the configuration of the autoscaler. This doesn't meet the warning alert criteria: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#warning-alerts Version-Release number of selected component (if applicable): 4.9 How reproducible: Consistent Steps to Reproduce: 1. 2. 3. Actual results: warning level alert that doesn't require human interaction Expected results: warning and critical level alerts require human interaction Additional info:
Verified on 4.10.0-0.nightly-2021-11-22-195410, severity of ClusterAutoscalerUnschedulablePods alert is "info" now. liuhuali@Lius-MacBook-Pro ~ % oc create -f clusterautoscaler.yaml clusterautoscaler.autoscaling.openshift.io/default created liuhuali@Lius-MacBook-Pro ~ % oc -n openshift-machine-api get prometheusrule NAME AGE cluster-autoscaler-default 2m32s machine-api-operator-prometheus-rules 62m liuhuali@Lius-MacBook-Pro ~ % oc -n openshift-machine-api get prometheusrule cluster-autoscaler-default -o yaml apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: creationTimestamp: "2021-11-23T14:44:28Z" generation: 1 labels: prometheus: k8s role: alert-rules name: cluster-autoscaler-default namespace: openshift-machine-api ownerReferences: - apiVersion: autoscaling.openshift.io/v1 blockOwnerDeletion: true controller: true kind: ClusterAutoscaler name: default uid: 057f8bf9-ec0b-4fc8-8ab2-a3afb40a0251 resourceVersion: "37912" uid: 3a070375-df61-43e8-b9e2-acaf45dfcadd spec: groups: - name: general.rules rules: - alert: ClusterAutoscalerUnschedulablePods annotations: message: Cluster Autoscaler has {{ $value }} unschedulable pods expr: cluster_autoscaler_unschedulable_pods_count{service="cluster-autoscaler-default"} > 0 for: 20m labels: severity: info - alert: ClusterAutoscalerNotSafeToScale annotations: message: Cluster Autoscaler is reporting that the cluster is not ready for scaling expr: cluster_autoscaler_cluster_safe_to_autoscale{service="cluster-autoscaler-default"} != 1 for: 15m labels: severity: warning - alert: ClusterAutoscalerUnableToScaleCPULimitReached annotations: message: Cluster Autoscaler has reached its CPU core limit and is unable to scale out expr: cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"} for: 15m labels: severity: info - alert: ClusterAutoscalerUnableToScaleMemoryLimitReached annotations: message: Cluster Autoscaler has reached its Memory bytes limit and is unable to scale out expr: cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"} for: 15m labels: severity: info liuhuali@Lius-MacBook-Pro ~ %
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056