Bug 2025582 - ClusterAutoscalerUnschedulablePods should not be a warning
Summary: ClusterAutoscalerUnschedulablePods should not be a warning
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.z
Assignee: Michael McCune
QA Contact: Huali Liu
URL:
Whiteboard:
Depends On: 2025230
Blocks: 2026237
TreeView+ depends on / blocked
 
Reported: 2021-11-22 14:23 UTC by OpenShift BugZilla Robot
Modified: 2021-12-06 11:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-06 11:22:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-autoscaler-operator pull 230 0 None open [release-4.9] Bug 2025582: Change ClusterAutoscalerUnschedulablePods severity to info 2021-11-23 15:03:51 UTC
Red Hat Product Errata RHBA-2021:4889 0 None None None 2021-12-06 11:22:45 UTC

Description OpenShift BugZilla Robot 2021-11-22 14:23:57 UTC
+++ This bug was initially created as a clone of Bug #2025230 +++

Description of problem:
The ClusterAutoscalerUnschedulablePods created by this component doesn't describe an actual problematic condition that requires human action.

Per the documentation:
> In many cases this alert is normal and expected depending on the configuration of the autoscaler.

This doesn't meet the warning alert criteria: https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#warning-alerts


Version-Release number of selected component (if applicable):
4.9

How reproducible:
Consistent


Steps to Reproduce:
1. 
2.
3.

Actual results:
warning level alert that doesn't require human interaction


Expected results:
warning and critical level alerts require human interaction


Additional info:

Comment 1 Huali Liu 2021-11-24 01:58:07 UTC
Set up cluster using cluster-bot with https://github.com/openshift/cluster-autoscaler-operator/pull/230

Verified severity of ClusterAutoscalerUnschedulablePods alert is "info" now.

liuhuali@Lius-MacBook-Pro huali-test % oc create -f clusterautoscale.yaml 
clusterautoscaler.autoscaling.openshift.io/default created
liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule
NAME                                    AGE
cluster-autoscaler-default              17s
machine-api-operator-prometheus-rules   24m
liuhuali@Lius-MacBook-Pro huali-test % oc -n openshift-machine-api get prometheusrule cluster-autoscaler-default -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: "2021-11-24T01:50:58Z"
  generation: 1
  labels:
    prometheus: k8s
    role: alert-rules
  name: cluster-autoscaler-default
  namespace: openshift-machine-api
  ownerReferences:
  - apiVersion: autoscaling.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: ClusterAutoscaler
    name: default
    uid: b2ba55d9-6674-4fa8-9a0f-0770f9899dbb
  resourceVersion: "26060"
  uid: ad2cc225-e346-4b36-bbb0-4dcb9db342a7
spec:
  groups:
  - name: general.rules
    rules:
    - alert: ClusterAutoscalerUnschedulablePods
      annotations:
        message: Cluster Autoscaler has {{ $value }} unschedulable pods
      expr: cluster_autoscaler_unschedulable_pods_count{service="cluster-autoscaler-default"}
        > 0
      for: 20m
      labels:
        severity: info
    - alert: ClusterAutoscalerNotSafeToScale
      annotations:
        message: Cluster Autoscaler is reporting that the cluster is not ready for
          scaling
      expr: cluster_autoscaler_cluster_safe_to_autoscale{service="cluster-autoscaler-default"}
        != 1
      for: 15m
      labels:
        severity: warning
    - alert: ClusterAutoscalerUnableToScaleCPULimitReached
      annotations:
        message: Cluster Autoscaler has reached its CPU core limit and is unable to
          scale out
      expr: cluster_autoscaler_cluster_cpu_current_cores >= cluster_autoscaler_cpu_limits_cores{direction="maximum"}
      for: 15m
      labels:
        severity: info
    - alert: ClusterAutoscalerUnableToScaleMemoryLimitReached
      annotations:
        message: Cluster Autoscaler has reached its Memory bytes limit and is unable
          to scale out
      expr: cluster_autoscaler_cluster_memory_current_bytes >= cluster_autoscaler_memory_limits_bytes{direction="maximum"}
      for: 15m
      labels:
        severity: info
liuhuali@Lius-MacBook-Pro huali-test %

Comment 2 Huali Liu 2021-11-24 02:00:01 UTC
liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest   True        False         2m16s   Cluster version is 4.9.0-0.ci.test-2021-11-24-011831-ci-ln-wigsmwb-latest

Comment 7 errata-xmlrpc 2021-12-06 11:22:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.9.10 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4889


Note You need to log in before you can comment on or make changes to this bug.