Bug 1819029
Summary: | How to handle alert ClusterAutoscalerUnschedulablePods | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> | |
Component: | Cloud Compute | Assignee: | Michael McCune <mimccune> | |
Cloud Compute sub component: | Other Providers | QA Contact: | sunzhaohua <zhsun> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | low | |||
Priority: | unspecified | CC: | agarcial, mgugino | |
Version: | 4.3.0 | |||
Target Milestone: | --- | |||
Target Release: | 4.6.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1820654 (view as bug list) | Environment: | ||
Last Closed: | 2020-10-27 15:56:40 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1827307 | |||
Attachments: |
Description
Hongkai Liu
2020-03-31 01:33:50 UTC
Created attachment 1674917 [details]
openshift-machine-api.cluster-autoscaler-default-5476d56447-5ww92.24h.log
Created attachment 1674918 [details]
openshift-machine-api.machine-api-controllers-7c696b9657-m8t4c.machine-controller.24h.log
Created attachment 1674920 [details]
prometheus.query
This alert is caused by the cluster autoscaler's inability to scale up. This alert is normal and expected depending on cluster autoscaler's configuration. In this particular case, there is a bug in the cluster autoscaler. I'm going to open a new BZ and link it here. In the mean time, this bug should remain open until we document the cause and remedy of this particular alert under normal circumstances. Thank Michael for help me fix the autoscaler. Assigning to Michael McCune as he has a Jira card to document all of the alerts over the next sprint tagging with upcomingSprint to re-evaluate priority. just adding a note here that i am starting to investigate this issue. i think the next best action we can take is to start creating a document for the cluster-autoscaler-operator to document these alerts and possible guidance around them. Michael Gugino started a pull request[0] for the machine-api-operator to document those alerts, we should do the same for the cluster-autoscaler-operator. [0] https://github.com/openshift/machine-api-operator/pull/606 i have created an issue on the cluster-autoscaler-operator to track this: https://github.com/openshift/cluster-autoscaler-operator/issues/153 ideally we will have a PR in place for the documentation in the next sprint. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |