Bug 1878108 - OCP 4.6 installation fails in CPU quota check for OSD on GCP
Summary: OCP 4.6 installation fails in CPU quota check for OSD on GCP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Abhinav Dahiya
QA Contact: To Hung Sze
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-11 10:55 UTC by Manuel Dewald
Modified: 2020-10-27 16:40 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:40:10 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4163 0 None closed Bug 1878108: asset/quota/gcp: use GCP api to find CPU count for constraint and guess only on failure 2020-12-02 23:53:02 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:40:24 UTC

Description Manuel Dewald 2020-09-11 10:55:12 UTC
Description of problem:

When installing 4.6 in OSD, the quota check returns

failed to generate asset \\\"Platform Quota Check\\\": error(MissingQuota): compute.googleapis.com/cpus is not available in us-east1 because The required number of resources (114688) is more than the limit of 2400\"\nlevel=fatal msg=\"bootstrap host address and at least one control plane host address must be provided\"\n" installID=cz4gg5hr 

How reproducible:

Install a OCP cluster with a install config like the following on GCP:


compute:
- name: worker
  platform:
    gcp:
      osDisk:
        DiskSizeGB: 0
        DiskType: ""
      type: custom-4-16384
  replicas: 4
controlPlane:
  name: master
  platform:
    gcp:
      osDisk:
        DiskSizeGB: 0
        DiskType: ""
      type: custom-4-16384
  replicas: 3
kind: InstallConfig
metadata:
  creationTimestamp: null
  labels:
    api.openshift.com/environment: integration
    api.openshift.com/id: some-id
    api.openshift.com/managed: "true"
    api.openshift.com/name: some-name
    hive.openshift.io/cluster-type: managed
  name: gshereme-test1
  namespace: uhc-integration-1fkg61fq9oieabah0b1i03k2es3l1mgs
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineCIDR: 10.0.0.0/16
  machineNetwork:
  - cidr: 10.0.0.0/16
  serviceNetwork:
  - 172.30.0.0/16
platform:
  gcp:
    projectID: some-project
    region: us-east1
pullSecret: ""
sshKey: ssh-rsa.....


Actual results:

failed to generate asset \\\"Platform Quota Check\\\": error(MissingQuota): compute.googleapis.com/cpus is not available in us-east1 because The required number of resources (114688) is more than the limit of 2400\"\nlevel=fatal msg=\"bootstrap host address and at least one control plane host address must be provided\"\n" installID=cz4gg5hr 


Expected results:

The cluster installs successfully

Potential cause:
The cause is most likely in the installer quota check:
https://github.com/openshift/installer/blob/5972c875c3ef1cd13c43c52b7cda2660efdda1b3/pkg/asset/quota/gcp/gcp.go#L164

As we use custom machine types (name: custom-4-16384)
The quota check assumes the second number in the string is the number of CPUs, but in our case it is not. It extracts this second number as CPU count here: https://github.com/openshift/installer/blob/5972c875c3ef1cd13c43c52b7cda2660efdda1b3/pkg/asset/quota/gcp/gcp.go#L199

and this is multiplied with the count of machines (compute: 4, controlPlane: 7) which results in 7*16384 = 114688 CPUs.

Comment 2 Greg Sheremeta 2020-09-11 11:22:23 UTC
The default GCP machine names look like this: m2-ultramem-208
where Abhinav's assumption in machineTypeToQuota() holds up. 208 is the vcpu count.

But GCP can have custom types too, and we use these extensively in OSD.
custom-4-16384

Comment 3 Rick Rackow 2020-09-11 15:41:04 UTC
preserving information from Slack:
GPC is allowing custom image that don't start with the machine type and then assumes "N1" as type.

```$ gcloud beta compute instances create example-instance-test --machine-type custom-4-3840
Created [https://www.googleapis.com/compute/beta/projects/innate-attic-182119/zones/us-east1-b/instances/example-instance-test].
NAME                   ZONE        MACHINE_TYPE               PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
example-instance-test  us-east1-b  custom (4 vCPU, 3.75 GiB)               10.142.0.4   34.75.155.249  RUNNING```

```$ gcloud beta compute instances create example-instance-test-1 --machine-type test-custom-4-3840
ERROR: (gcloud.beta.compute.instances.create) Could not fetch machine type:
 - The resource 'projects/innate-attic-182119/zones/us-east1-b/machineTypes/test-custom-4-3840' was not found```

That is causing a problem when the name of the machine type is analzed here https://github.com/openshift/installer/blob/287658271951b5f8dbf1a77c77ff1557d81c5931/pkg/asset/quota/gcp/gcp.go#L199

Comment 4 To Hung Sze 2020-09-17 18:07:18 UTC
@mdewald@redhat.com, did you create the custom type: custom-4-16384?
If yes, could you please include more details about the type?
I want to add a test case to capture this change and reflect what you have / had.
Thanks in advance.

Comment 6 Greg Sheremeta 2020-09-17 19:17:04 UTC
custom-4-16384 is a shortened alias for n1-custom-4-16384, which is a built-in GCP machine type.

It means 4 CPU, 16384 memory.

https://cloud.google.com/compute/docs/machine-types#custom_machine_types

Comment 9 To Hung Sze 2020-09-21 19:27:21 UTC
using 4.6.0-0.nightly-2020-09-21-114202 and
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    gcp:
            osDisk:
              DiskSizeGB: 0
              DiskType: ""
            type: custom-4-16384
    replicas: 3
(same for control)
I am able to bring up a cluster and the machine (as shown in web console) has correct type)
Instance Type
custom-4-16384

From gcp console:
Machine type
custom (4 vCPUs, 16 GB memory)

Comment 14 errata-xmlrpc 2020-10-27 16:40:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.