Bug 1918910
Summary: | Scale from zero annotations should not requeue if instance type missing | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michael Gugino <mgugino> |
Component: | Cloud Compute | Assignee: | Danil Grigorev <dgrigore> |
Cloud Compute sub component: | Other Providers | QA Contact: | Milind Yadav <miyadav> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | low | ||
Priority: | low | CC: | dgrigore, zhsun |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Scale from zero annotations should not requeue if instance type missing
Consequence: Constant requeue and error spam in machine-set controller logs.
Fix: If instance type is not resolved automatically, user should set the annotation manually. Logs would suggest steps, and which annotations to check.
Result: Scale from zero for unknown instance types would work assuming user manually provide annotation.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-27 22:36:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michael Gugino
2021-01-21 16:49:45 UTC
To whoever ends up implementing this, we should check for the same issue in GCP and Azure as well Marking for 4.8 as this is a relatively simple fix and we should be able to get it done Checked all providers, seems only AWS and Azure are currently affected. Posted fix for both. Validated for aws cluster on : Steps : 1. create a machineset by copying existing machinesets provide invalid instance type 2.create cas using below yaml apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s 3. create machine autoscaler using below yaml , provide the machineset with having invalid instance type to this apiVersion: "autoscaling.openshift.io/v1beta1" kind: "MachineAutoscaler" metadata: name: mas1 namespace: "openshift-machine-api" spec: minReplicas: 1 maxReplicas: 4 scaleTargetRef: apiVersion: machine.openshift.io/v1beta1 kind: MachineSet name: miyadav-aws-15-sxwkl-worker-invalid Monitor machine-controller logs machine-controller logs : . . . E0415 09:35:29.465499 1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: %sm5.invalid E0415 09:35:29.465515 1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU] I0415 09:35:29.466144 1 recorder.go:98] controller-runtime/manager/events "msg"="Warning" "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"miyadav-aws-15-sxwkl-worker-invalid","uid":"3d43cb5f-d28a-45ae-9616-0eae26e11dde","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"53367"} "reason"="FailedUpdate" . . . Additional info along with previous comment#4 the log appears during creation of each object ( machineset , cas , mas) On same build validated for GCP and Azure GCP has different log messages than azure and gcp Additional info : GCP - "....To fix this, manually populate machine annotations for your instance type.." Azure and aws : "....Failed to set autoscaling from zero annotations, instance type unknown..." There are no requeue logs after these messages Moved to VERIFIED , based on testing data. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |