Bug 1918910 - Scale from zero annotations should not requeue if instance type missing
Summary: Scale from zero annotations should not requeue if instance type missing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.8.0
Assignee: Danil Grigorev
QA Contact: Milind Yadav
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-21 16:49 UTC by Michael Gugino
Modified: 2024-10-01 17:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Scale from zero annotations should not requeue if instance type missing Consequence: Constant requeue and error spam in machine-set controller logs. Fix: If instance type is not resolved automatically, user should set the annotation manually. Logs would suggest steps, and which annotations to check. Result: Scale from zero for unknown instance types would work assuming user manually provide annotation.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:36:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-aws pull 397 0 None open Bug 1918910: Only log error on wrong instance type for scale from zero 2021-03-25 09:29:58 UTC
Github openshift cluster-api-provider-azure pull 213 0 None open Bug 1918910: Only log error on wrong instance type for scale from zero 2021-03-25 09:43:59 UTC
Github openshift cluster-api-provider-gcp pull 157 0 None open Bug 1918910: Only log error on nonexistent instance type for scale from zero 2021-03-25 11:27:45 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:36:56 UTC

Description Michael Gugino 2021-01-21 16:49:45 UTC
https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machineset/controller.go#L114

We should return nil, not error, so we don't requeue.  It's not like the type is going to magically show up later.  We should log a message about it.

Must do this ^^^
Can do the following later:

Potentially we could set an annotation as well to inform the user on the machineset object itself, TBD if that's desirable.

Also, we can instruct the user to add the annotations themselves so they don't have to wait until the next release / some future time to take advantage themselves.

Comment 1 Joel Speed 2021-02-05 15:28:20 UTC
To whoever ends up implementing this, we should check for the same issue in GCP and Azure as well

Marking for 4.8 as this is a relatively simple fix and we should be able to get it done

Comment 2 Danil Grigorev 2021-03-25 09:45:13 UTC
Checked all providers, seems only AWS and Azure are currently affected. Posted fix for both.

Comment 4 Milind Yadav 2021-04-15 09:47:44 UTC
Validated for aws cluster on :


Steps :
1. create a machineset by copying existing machinesets provide invalid instance type 

2.create cas using below yaml 
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s


3. create machine autoscaler using below yaml , provide the machineset with having invalid instance type to this 
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: mas1
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1
  maxReplicas: 4
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: miyadav-aws-15-sxwkl-worker-invalid



Monitor machine-controller logs 

machine-controller logs :

.
.
.
E0415 09:35:29.465499       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: %sm5.invalid
E0415 09:35:29.465515       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0415 09:35:29.466144       1 recorder.go:98] controller-runtime/manager/events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"miyadav-aws-15-sxwkl-worker-invalid","uid":"3d43cb5f-d28a-45ae-9616-0eae26e11dde","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"53367"} "reason"="FailedUpdate"
.
.
.

Comment 5 Milind Yadav 2021-04-15 10:03:33 UTC
Additional info along with previous comment#4 

the log appears during creation of each object ( machineset , cas , mas)

Comment 6 Milind Yadav 2021-04-15 14:08:02 UTC
On same build validated for GCP and Azure 

GCP has different log messages than azure and gcp

Additional info :

GCP - "....To fix this, manually populate machine annotations for your instance type.."

Azure and aws : "....Failed to set autoscaling from zero annotations, instance type unknown..."

There are no requeue logs after these messages


Moved to VERIFIED , based on testing data.

Comment 9 errata-xmlrpc 2021-07-27 22:36:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.