Bug 2085390

Summary: machine-controller is case sensitive which can lead to false/positive errors
Product: OpenShift Container Platform Reporter: Simon Reber <sreber>
Component: Cloud ComputeAssignee: Mike Fedosin <mfedosin>
Cloud Compute sub component: Cluster Autoscaler QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact: Jeana Routh <jrouth>
Severity: low    
Priority: low CC: mfedosin
Version: 4.10   
Target Milestone: ---   
Target Release: 4.12.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
* Previously, the Machine API provider Azure did not treated user-provided values for instance types as case sensitive. This led to false-positive errors when instance types were correct but did not match the case. With this release, instance types are converted to the lower case so that users get correct results without false-positive errors for mismatched case. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2085390[*BZ#2085390*])
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-17 19:48:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2109487    

Description Simon Reber 2022-05-13 08:42:00 UTC
Description of problem:

The `machine-controller` appears to be case-senstive for the `InstanceTypes` specified in https://github.com/openshift/cluster-api-provider-azure/blob/release-4.10/pkg/cloud/azure/actuators/machineset/azure_instance_types.go#L933 and thus will generate an Error when specifying `Standard_D8s_V4` instead of `Standard_D8s_v4` (V4 is specified instead of v4)

The Error reported by `machine-controller` is as follow:

E0302 12:34:48.199121       1 controller.go:130] Unable to set scale from zero annotations: unknown instance type: %sStandard_D8s_V4
E0302 12:34:48.199139       1 controller.go:131] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]

The problem is that from a customer point of view the `InstanceType` is correct but OpenShift Container Platform 4 does not consider this to be true because it expects the `InstanceType` in the format written in https://github.com/openshift/cluster-api-provider-azure/blob/release-4.10/pkg/cloud/azure/actuators/machineset/azure_instance_types.go#L933


Same is also happening on AWS:

E0513 08:25:03.938273       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: %sT3.2XLARGE
E0513 08:25:03.938306       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0513 08:25:03.953200       1 logr.go:252] events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"sreber03180849-pptkt-worker-test-us-west-1c","uid":"9d97eb81-1456-4a51-b499-0324fcb1e036","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1020816"} "reason"="FailedUpdate"

I know defining an instance in AWS like `T3.2XLARGE` is silly but it may happen and it's still the `InstanceType` that seems available and therefore the misbehavior is confusing.

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.10.12

How reproducible:

 - Always

Steps to Reproduce:
1. Install OpenShift Container Platform 4 on AWS, Azure or GCP
2. Create a new `MachineSet` and specify the `InstanceType` as available but the `InstanceType` should be specified not case sensitive (`Standard_D8s_V4` instead of `Standard_D8s_v4` is such an example)
3. See the Machine being created failing and `machine-controller` reporting the above errors

Actual results:

E0302 12:34:48.199121       1 controller.go:130] Unable to set scale from zero annotations: unknown instance type: %sStandard_D8s_V4
E0302 12:34:48.199139       1 controller.go:131] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]

Expected results:

If the `InstanceType` specified is correct (when normalising case sensitivity) it should be recognized and made working. Especially as Cloud API's may not be case senstive and thus cause confusion if OpenShift Container Platform 4 is case senstive when it comes to `InstanceType`

Additional info:

Comment 2 Joel Speed 2022-05-26 13:10:26 UTC
Mike is planning to look into this next week

Comment 3 Joel Speed 2022-07-04 15:46:41 UTC
Mike is going to look into this, bumped to low as this has an easy workaround

Comment 5 sunzhaohua 2022-07-19 03:53:37 UTC
Verified on azure, failed on gcp and aws. 
@mfedosin On aws and gcp instancetype is still case sensitive, or we plan to implement case insensitive only on azure

Azure:
$ oc edit machine zhsun19-flpj5-worker-southcentralus1-hcqwk
      vmSize: standard_d4s_V3

$ oc get machine                                                                                                           [10:36:45]
NAME                                         PHASE     TYPE              REGION           ZONE   AGE
zhsun19-flpj5-worker-southcentralus1-hcqwk   Running   Standard_D4s_v3   southcentralus   1      22m

AWS:
$ oc get machine                                                                                                        [10:38:15]
NAME                                              PHASE     TYPE         REGION      ZONE         AGE
pewang-0719awsefs-47p49-worker-us-east-2c-tjmdc   Failed                                          7m55s
  errorMessage: 'error launching instance: Invalid value ''m6i.XLARGE'' for InstanceType.'
  errorReason: InvalidConfiguration
  lastUpdated: "2022-07-19T02:30:25Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2022-07-19T02:30:25Z"
      message: 'error launching instance: Invalid value ''m6i.XLARGE'' for InstanceType.'
      reason: MachineCreationFailed
      status: "False"

E0719 02:35:01.757187       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: m6i.XLARGE
E0719 02:35:01.757195       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: [machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0719 02:35:01.757324       1 logr.go:252] events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"pewang-0719awsefs-47p49-worker-us-east-2c","uid":"ff21a73a-330f-43f3-b11c-b14b67dc17bc","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"68997"} "reason"="FailedUpdate"

GCP:
$ oc get machine                                                                                             
NAME                            PHASE     TYPE            REGION        ZONE            AGE
zhsungcp-zb9kx-worker-a-mxrht   Failed                                                  23m

$ oc edit machine zhsungcp-zb9kx-worker-a-mxrht
      message: 'googleapi: Error 400: Invalid value for field ''resource.machineType'':
        ''zones/us-central1-a/machineTypes/N1-Standard-4''. Machine type with name
        ''N1-Standard-4'' does not exist in zone ''us-central1-a''., invalid'
      reason: MachineCreationFailed

Comment 6 sunzhaohua 2022-07-21 02:59:43 UTC
Move to verified, checked on aws and gcp, instance types are case senstive.

$ gcloud compute instances create gcelab2 --machine-type N1-Standard-2 --zone us-central1-a                                                                            [10:50:31]
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Invalid value for field 'resource.machineType': 'https://compute.googleapis.com/compute/v1/projects/openshift-gce-devel/zones/us-central1-a/machineTypes/N1-Standard-2'. Machine type with name 'N1-Standard-2' does not exist in zone 'us-central1-a'.

$ aws ec2 describe-instances --filters "Name=instance-type,Values=m6i.XLARGE" 
$ aws ec2 describe-instances --filters "Name=instance-type,Values=m6i.xlarge"
RESERVATIONS    301721915996    r-09c74d5f04a91f15c
INSTANCES       0       x86_64  9F5DEE65-B8F9-47CB-9894-7480FF980314    False   True    xen     ami-026e5701f495c94a2   i-03281d1184c94b51d     m6i.xlarge      2022-07-20T02:42:12+00:00                         /dev/xvda       ebs     User initiated (2022-07-21 02:24:53 GMT)        hvm
CAPACITYRESERVATIONSPECIFICATION        open
CPUOPTIONS      2       2

Comment 10 errata-xmlrpc 2023-01-17 19:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 11 Red Hat Bugzilla 2023-09-18 04:37:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days