Bug 1918910

Summary:	Scale from zero annotations should not requeue if instance type missing
Product:	OpenShift Container Platform	Reporter:	Michael Gugino <mgugino>
Component:	Cloud Compute	Assignee:	Danil Grigorev <dgrigore>
Cloud Compute sub component:	Other Providers	QA Contact:	Milind Yadav <miyadav>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	low
Priority:	low	CC:	dgrigore, zhsun
Version:	4.7
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: Scale from zero annotations should not requeue if instance type missing Consequence: Constant requeue and error spam in machine-set controller logs. Fix: If instance type is not resolved automatically, user should set the annotation manually. Logs would suggest steps, and which annotations to check. Result: Scale from zero for unknown instance types would work assuming user manually provide annotation.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-07-27 22:36:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michael Gugino 2021-01-21 16:49:45 UTC

https://github.com/openshift/cluster-api-provider-aws/blob/master/pkg/actuators/machineset/controller.go#L114

We should return nil, not error, so we don't requeue.  It's not like the type is going to magically show up later.  We should log a message about it.

Must do this ^^^
Can do the following later:

Potentially we could set an annotation as well to inform the user on the machineset object itself, TBD if that's desirable.

Also, we can instruct the user to add the annotations themselves so they don't have to wait until the next release / some future time to take advantage themselves.

Comment 1 Joel Speed 2021-02-05 15:28:20 UTC

To whoever ends up implementing this, we should check for the same issue in GCP and Azure as well

Marking for 4.8 as this is a relatively simple fix and we should be able to get it done

Comment 2 Danil Grigorev 2021-03-25 09:45:13 UTC

Checked all providers, seems only AWS and Azure are currently affected. Posted fix for both.

Comment 4 Milind Yadav 2021-04-15 09:47:44 UTC

Validated for aws cluster on :


Steps :
1. create a machineset by copying existing machinesets provide invalid instance type 

2.create cas using below yaml 
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s


3. create machine autoscaler using below yaml , provide the machineset with having invalid instance type to this 
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: mas1
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1
  maxReplicas: 4
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: miyadav-aws-15-sxwkl-worker-invalid



Monitor machine-controller logs 

machine-controller logs :

.
.
.
E0415 09:35:29.465499       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: %sm5.invalid
E0415 09:35:29.465515       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0415 09:35:29.466144       1 recorder.go:98] controller-runtime/manager/events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"miyadav-aws-15-sxwkl-worker-invalid","uid":"3d43cb5f-d28a-45ae-9616-0eae26e11dde","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"53367"} "reason"="FailedUpdate"
.
.
.

Comment 5 Milind Yadav 2021-04-15 10:03:33 UTC

Additional info along with previous comment#4 

the log appears during creation of each object ( machineset , cas , mas)

Comment 6 Milind Yadav 2021-04-15 14:08:02 UTC

On same build validated for GCP and Azure 

GCP has different log messages than azure and gcp

Additional info :

GCP - "....To fix this, manually populate machine annotations for your instance type.."

Azure and aws : "....Failed to set autoscaling from zero annotations, instance type unknown..."

There are no requeue logs after these messages


Moved to VERIFIED , based on testing data.

Comment 9 errata-xmlrpc 2021-07-27 22:36:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438