Bug 2085390 - machine-controller is case sensitive which can lead to false/positive errors
Summary: machine-controller is case sensitive which can lead to false/positive errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.10
Hardware: x86_64
OS: Linux
low
low
Target Milestone: ---
: 4.12.0
Assignee: Mike Fedosin
QA Contact: sunzhaohua
Jeana Routh
URL:
Whiteboard:
Depends On:
Blocks: 2109487
TreeView+ depends on / blocked
 
Reported: 2022-05-13 08:42 UTC by Simon Reber
Modified: 2023-09-18 04:37 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
* Previously, the Machine API provider Azure did not treated user-provided values for instance types as case sensitive. This led to false-positive errors when instance types were correct but did not match the case. With this release, instance types are converted to the lower case so that users get correct results without false-positive errors for mismatched case. (link:https://bugzilla.redhat.com/show_bug.cgi?id=2085390[*BZ#2085390*])
Clone Of:
Environment:
Last Closed: 2023-01-17 19:48:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-api-provider-azure pull 27 0 None open Bug 2085390: make Azure instance types case-insensitive 2022-07-04 18:31:48 UTC
Red Hat Product Errata RHSA-2022:7399 0 None None None 2023-01-17 19:49:04 UTC

Description Simon Reber 2022-05-13 08:42:00 UTC
Description of problem:

The `machine-controller` appears to be case-senstive for the `InstanceTypes` specified in https://github.com/openshift/cluster-api-provider-azure/blob/release-4.10/pkg/cloud/azure/actuators/machineset/azure_instance_types.go#L933 and thus will generate an Error when specifying `Standard_D8s_V4` instead of `Standard_D8s_v4` (V4 is specified instead of v4)

The Error reported by `machine-controller` is as follow:

E0302 12:34:48.199121       1 controller.go:130] Unable to set scale from zero annotations: unknown instance type: %sStandard_D8s_V4
E0302 12:34:48.199139       1 controller.go:131] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]

The problem is that from a customer point of view the `InstanceType` is correct but OpenShift Container Platform 4 does not consider this to be true because it expects the `InstanceType` in the format written in https://github.com/openshift/cluster-api-provider-azure/blob/release-4.10/pkg/cloud/azure/actuators/machineset/azure_instance_types.go#L933


Same is also happening on AWS:

E0513 08:25:03.938273       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: %sT3.2XLARGE
E0513 08:25:03.938306       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0513 08:25:03.953200       1 logr.go:252] events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"sreber03180849-pptkt-worker-test-us-west-1c","uid":"9d97eb81-1456-4a51-b499-0324fcb1e036","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"1020816"} "reason"="FailedUpdate"

I know defining an instance in AWS like `T3.2XLARGE` is silly but it may happen and it's still the `InstanceType` that seems available and therefore the misbehavior is confusing.

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.10.12

How reproducible:

 - Always

Steps to Reproduce:
1. Install OpenShift Container Platform 4 on AWS, Azure or GCP
2. Create a new `MachineSet` and specify the `InstanceType` as available but the `InstanceType` should be specified not case sensitive (`Standard_D8s_V4` instead of `Standard_D8s_v4` is such an example)
3. See the Machine being created failing and `machine-controller` reporting the above errors

Actual results:

E0302 12:34:48.199121       1 controller.go:130] Unable to set scale from zero annotations: unknown instance type: %sStandard_D8s_V4
E0302 12:34:48.199139       1 controller.go:131] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: %v[machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]

Expected results:

If the `InstanceType` specified is correct (when normalising case sensitivity) it should be recognized and made working. Especially as Cloud API's may not be case senstive and thus cause confusion if OpenShift Container Platform 4 is case senstive when it comes to `InstanceType`

Additional info:

Comment 2 Joel Speed 2022-05-26 13:10:26 UTC
Mike is planning to look into this next week

Comment 3 Joel Speed 2022-07-04 15:46:41 UTC
Mike is going to look into this, bumped to low as this has an easy workaround

Comment 5 sunzhaohua 2022-07-19 03:53:37 UTC
Verified on azure, failed on gcp and aws. 
@mfedosin On aws and gcp instancetype is still case sensitive, or we plan to implement case insensitive only on azure

Azure:
$ oc edit machine zhsun19-flpj5-worker-southcentralus1-hcqwk
      vmSize: standard_d4s_V3

$ oc get machine                                                                                                           [10:36:45]
NAME                                         PHASE     TYPE              REGION           ZONE   AGE
zhsun19-flpj5-worker-southcentralus1-hcqwk   Running   Standard_D4s_v3   southcentralus   1      22m

AWS:
$ oc get machine                                                                                                        [10:38:15]
NAME                                              PHASE     TYPE         REGION      ZONE         AGE
pewang-0719awsefs-47p49-worker-us-east-2c-tjmdc   Failed                                          7m55s
  errorMessage: 'error launching instance: Invalid value ''m6i.XLARGE'' for InstanceType.'
  errorReason: InvalidConfiguration
  lastUpdated: "2022-07-19T02:30:25Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2022-07-19T02:30:25Z"
      message: 'error launching instance: Invalid value ''m6i.XLARGE'' for InstanceType.'
      reason: MachineCreationFailed
      status: "False"

E0719 02:35:01.757187       1 controller.go:115] Unable to set scale from zero annotations: unknown instance type: m6i.XLARGE
E0719 02:35:01.757195       1 controller.go:116] Autoscaling from zero will not work. To fix this, manually populate machine annotations for your instance type: [machine.openshift.io/vCPU machine.openshift.io/memoryMb machine.openshift.io/GPU]
I0719 02:35:01.757324       1 logr.go:252] events "msg"="Warning"  "message"="Failed to set autoscaling from zero annotations, instance type unknown" "object"={"kind":"MachineSet","namespace":"openshift-machine-api","name":"pewang-0719awsefs-47p49-worker-us-east-2c","uid":"ff21a73a-330f-43f3-b11c-b14b67dc17bc","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"68997"} "reason"="FailedUpdate"

GCP:
$ oc get machine                                                                                             
NAME                            PHASE     TYPE            REGION        ZONE            AGE
zhsungcp-zb9kx-worker-a-mxrht   Failed                                                  23m

$ oc edit machine zhsungcp-zb9kx-worker-a-mxrht
      message: 'googleapi: Error 400: Invalid value for field ''resource.machineType'':
        ''zones/us-central1-a/machineTypes/N1-Standard-4''. Machine type with name
        ''N1-Standard-4'' does not exist in zone ''us-central1-a''., invalid'
      reason: MachineCreationFailed

Comment 6 sunzhaohua 2022-07-21 02:59:43 UTC
Move to verified, checked on aws and gcp, instance types are case senstive.

$ gcloud compute instances create gcelab2 --machine-type N1-Standard-2 --zone us-central1-a                                                                            [10:50:31]
ERROR: (gcloud.compute.instances.create) Could not fetch resource:
 - Invalid value for field 'resource.machineType': 'https://compute.googleapis.com/compute/v1/projects/openshift-gce-devel/zones/us-central1-a/machineTypes/N1-Standard-2'. Machine type with name 'N1-Standard-2' does not exist in zone 'us-central1-a'.

$ aws ec2 describe-instances --filters "Name=instance-type,Values=m6i.XLARGE" 
$ aws ec2 describe-instances --filters "Name=instance-type,Values=m6i.xlarge"
RESERVATIONS    301721915996    r-09c74d5f04a91f15c
INSTANCES       0       x86_64  9F5DEE65-B8F9-47CB-9894-7480FF980314    False   True    xen     ami-026e5701f495c94a2   i-03281d1184c94b51d     m6i.xlarge      2022-07-20T02:42:12+00:00                         /dev/xvda       ebs     User initiated (2022-07-21 02:24:53 GMT)        hvm
CAPACITYRESERVATIONSPECIFICATION        open
CPUOPTIONS      2       2

Comment 10 errata-xmlrpc 2023-01-17 19:48:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7399

Comment 11 Red Hat Bugzilla 2023-09-18 04:37:06 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.