1921627 – GCP UPI installation failed due to exceeding gcp limitation of instance group name

Bug 1921627 - GCP UPI installation failed due to exceeding gcp limitation of instance group name

Summary: GCP UPI installation failed due to exceeding gcp limitation of instance group...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Aditya Narayanaswamy
QA Contact:	Jianli Wei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-28 10:36 UTC by Yang Yang
Modified:	2022-03-10 16:03 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Sometimes, the instance group name becomes longer than the maximum size of 64 characters. The user is restricted in the naming process since we add the "-instance-group" suffix. Shortening the suffix to "-ig" to reduce the restriction in the naming.
Clone Of:
Environment:
Last Closed:	2022-03-10 16:02:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift installer pull 4828	0	None	open	Bug 1921627: Shorten instance group suffix to ig	2021-04-07 23:54:18 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	None	None	2022-03-10 16:03:19 UTC

Description Yang Yang 2021-01-28 10:36:23 UTC

Description:

GCP upi installation failed because the instance group name exceeds the gcp limitation which is at most 63. 

UPI template defines instance group name as 
'name': context.properties['infra_id'] + '-master-' + zone + '-instance-group'. In IPI it's name = "${var.cluster_id}-master-${var.zones[count.index]}".

It would be good to shorten the instance group name in UPI and align it with IPI


gcloud deployment-manager deployments create ${INFRA_ID}-infra --config 02_infra.yaml

ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1611632607807-5b9c57518c84f-43398d57-d16056e6]: errors:
- code: RESOURCE_ERROR
  location: /deployments/storage-47h3p-5z6c9-lb/resources/storage-47h3p-5z6c9-master-northamerica-northeast1-c-instance-group
  message: "{\"ResourceType\":\"compute.v1.instanceGroup\",\"ResourceErrorCode\":\"\
    400\",\"ResourceErrorMessage\":{\"code\":400,\"errors\":[{\"domain\":\"global\"\
    ,\"message\":\"Invalid value for field 'resource.name': 'storage-47h3p-5z6c9-master-northamerica-northeast1-c-instance-group'.\
    \ Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)'\"

Version:

4.7.0-0.nightly-2021-01-22-134922

Platform:

GCP

Please specify:

* UPI (semi-manual installation on customized infrastructure)

Actual result:

GCP UPI failed

Expected result:

GCP UPI passed

Comment 1 To Hung Sze 2021-01-28 14:48:37 UTC

Problem originally found through our automation using:
cluster name: storage-47h3p
region: northamerica-northeast1

Comment 2 To Hung Sze 2021-01-28 18:59:43 UTC

With 
name: tsze-longname-1234567890-1234567890-1234567890-1a
region: us-central1

We get slightly different error:
$ gcloud deployment-manager deployments create ${INFRA_ID}-infra --config 02_infra.yaml
The fingerprint of the deployment is b'7OO8Dixsswx3FLSVKnstoQ=='
Waiting for create [operation-1611860224697-5b9fa741ee455-05c0b600-5f0735e7]...failed.                                    
ERROR: (gcloud.deployment-manager.deployments.create) Error in Operation [operation-1611860224697-5b9fa741ee455-05c0b600-5f0735e7]: errors:
- code: RESOURCE_ERROR
  location: /deployments/tsze-longname-1234567-82bbw-infra/resources/tsze-longname-1234567-82bbw-cluster-ip
  message: "{\"ResourceType\":\"compute.v1.address\",\"ResourceErrorCode\":\"400\"\
    ,\"ResourceErrorMessage\":{\"code\":400,\"errors\":[{\"domain\":\"global\",\"\
    message\":\"Invalid value for field 'resource.subnetwork': 'https://www.googleapis.com/compute/v1/projects/openshift-qe/regions/northamerica-northeast1/subnetworks/tsze-upi-long-1234567-4mfs6-master-subnet'.\
    \ Subnetwork must be in the same region.\",\"reason\":\"invalid\"}],\"message\"\
    :\"Invalid value for field 'resource.subnetwork': 'https://www.googleapis.com/compute/v1/projects/openshift-qe/regions/northamerica-northeast1/subnetworks/tsze-upi-long-1234567-4mfs6-master-subnet'.\
    \ Subnetwork must be in the same region.\",\"statusMessage\":\"Bad Request\",\"\
    requestPath\":\"https://compute.googleapis.com/compute/v1/projects/openshift-qe/regions/us-central1/addresses\"\
    ,\"httpMethod\":\"POST\"}}"

Comment 3 Johnny Liu 2021-01-29 03:08:25 UTC

Personally I think "storage-47h3p" is not some unacceptable long cluster name.
Compared with IPI install with the same cluster name and region, ipi is using a different format to name these instance group.

Comment 4 To Hung Sze 2021-01-29 17:19:43 UTC

Please ignore my last comment.
Looks like I made a mistake.

Tried again for a UPI with long name in us-central1.
Install finished and the nodes are:
$ ./oc get nodes
NAME                                                      STATUS   ROLES    AGE    VERSION
tsze-alongname-123456-65kln-m-0.c.openshift-qe.internal   Ready    master   120m   v1.20.0+4b40bb4
tsze-alongname-123456-65kln-m-1.c.openshift-qe.internal   Ready    master   120m   v1.20.0+4b40bb4
tsze-alongname-123456-65kln-m-2.c.openshift-qe.internal   Ready    master   120m   v1.20.0+4b40bb4
tsze-alongname-123456-65kln-worker-a-gdbr6                Ready    worker   111m   v1.20.0+4b40bb4
tsze-alongname-123456-65kln-worker-b-pnf7d                Ready    worker   111m   v1.20.0+4b40bb4

Sorry.

Comment 5 Brenton Leanhardt 2021-02-04 18:31:35 UTC

We're lowering the severity since the templates are mostly for reference.

Comment 6 Jeremiah Stuever 2021-03-29 17:30:44 UTC

The differences between IPI and UPI can be seen here:

IPI: https://github.com/openshift/installer/blob/3bc71bb64699e5f96b0e9716609bf8ed0560fcfe/data/data/gcp/master/main.tf#L74
UPI: https://github.com/openshift/installer/blob/3bc71bb64699e5f96b0e9716609bf8ed0560fcfe/upi/gcp/02_lb_int.py#L54

Basically, the UPI template appends `-instance-group` to the end of the name used by IPI. This is because the GCP Template Manager objects must have unique names and so we appended the resource type to the end of all names. We could shorten the name here to be `-ig`, which would reduce the overall count.

Comment 7 Matthew Staebler 2021-03-30 01:41:07 UTC

Note that even with an IPI install, a cluster name over 22 characters will fail due to the instance group name being too long when the region is northamerica-northeast1.
With unadulterated UPI scripts, the cluster name can only be 8 characters long in northamerica-northeast1.

Comment 8 Matthew Staebler 2021-03-30 01:41:50 UTC

(In reply to Jeremiah Stuever from comment #6)
> We could shorten the name here to be `-ig`, which would reduce the overall
> count.

I like the idea of reducing the suffix from "-instance-group" to something smaller, such as "-ig".

Comment 9 Matthew Staebler 2021-06-08 18:31:22 UTC

Moving this out of the 4.8.0 release. There are two PR that still need to be merged for this BZ. This BZ is low enough severity to justify skipping entirely for 4.8.

Comment 10 Russell Teague 2021-07-12 17:33:49 UTC

Needs PR review.

Comment 11 Aditya Narayanaswamy 2021-08-02 15:40:46 UTC

Needs PR review.

Comment 12 Aditya Narayanaswamy 2021-08-24 12:29:59 UTC

Needs PR review.

Comment 13 Aditya Narayanaswamy 2021-11-16 19:09:26 UTC

Needs PR review.

Comment 21 errata-xmlrpc 2022-03-10 16:02:37 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.