Bug 2025582

Summary: ClusterAutoscalerUnschedulablePods should not be a warning
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: Cloud ComputeAssignee: Michael McCune <mimccune>
Cloud Compute sub component: Cluster Autoscaler QA Contact: Huali Liu <huliu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, huliu
Version: 4.9Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-06 11:22:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2025230    
Bug Blocks: 2026237    

Description OpenShift BugZilla Robot 2021-11-22 14:23:57 UTC
+++ This bug was initially created as a clone of Bug #2025230 +++

Description of problem:
The ClusterAutoscalerUnschedulablePods created by this component doesn't describe an actual problematic condition that requires human action.

Per the documentation:
> In many cases this alert is normal and expected depending on the configuration of the autoscaler.

This doesn't meet the warning alert criteria: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#warning-alerts


Version-Release number of selected component (if applicable):
4.9

How reproducible:
Consistent


Steps to Reproduce:
1. 
2.
3.

Actual results:
warning level alert that doesn't require human interaction


Expected results:
warning and critical level alerts require human interaction


Additional info:

Comment 1 Huali Liu 2021-11-24 01:58:07 UTC
Set up cluster using cluster-bot with https://github.com/openshift/cluster-autoscaler-operator/pull/230

Verified severity of ClusterAutoscalerUnschedulablePods alert is "info" now.

liuhuali@Lius-MacBook-Pro huali-test % oc create -f clusterautoscale.yaml 
clusterautoscaler.autoscaling.openshift.io/default created
liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule
NAME                                    AGE
cluster-autoscaler-default              17s
machine-api-operator-prometheus-rules   24m
liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule cluster-autoscaler-default -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2021-11-24T01:50:58Z"
  generation: 1
  labels:
    prometheus: k8s
    role: alert-rules
  name: cluster-autoscaler-default
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: autoscaling.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterAutoscaler
    name: default
    uid: b2ba55d9-6674-4fa8-9a0f-0770f9899dbb
  resourceVersion: "26060"
  uid: ad2cc225-e346-4b36-bbb0-4dcb9db342a7
spec:
  groups:
  - name: general.rules
    rules:
    - alert: ClusterAutoscalerUnschedulablePods
      annotations:
        message: Cluster Autoscaler has {{ $value }} unschedulable pods
      expr: cluster_autoscaler_unschedulable_pods_count{service="cluster-autoscaler-default"}
        > 0
      for: 20m
      labels:
        severity: info
    - alert: ClusterAutoscalerNotSafeToScale
      annotations:
        message: Cluster Autoscaler is reporting that the cluster is not ready for
          scaling
      expr: cluster_autoscaler_cluster_safe_to_autoscale{service="cluster-autoscaler-default"}
        != 1
      for: 15m
      labels:
        severity: warning
    - alert: ClusterAutoscalerUnableToScaleCPULimitReached
      annotations:
        message: Cluster Autoscaler has reached its CPU core limit and is unable to
          scale out
      expr: cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"}
      for: 15m
      labels:
        severity: info
    - alert: ClusterAutoscalerUnableToScaleMemoryLimitReached
      annotations:
        message: Cluster Autoscaler has reached its Memory bytes limit and is unable
          to scale out
      expr: cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"}
      for: 15m
      labels:
        severity: info
liuhuali@Lius-MacBook-Pro huali-test %

Comment 2 Huali Liu 2021-11-24 02:00:01 UTC
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest   True        False         2m16s   Cluster version is 4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest

Comment 7 errata-xmlrpc 2021-12-06 11:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.10 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4889