Bug 1908171

Summary: GCP: Installation fails when installing cluster with n1-custom-4-16384custom type (n1-custom-4-16384)
Product: OpenShift Container Platform Reporter: To Hung Sze <tsze>
Component: InstallerAssignee: Patrick Dillon <padillon>
Installer sub component: openshift-installer QA Contact: To Hung Sze <tsze>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: padillon, yanyang
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: installing with a custom machine type fails because "terraform apply" command is used a second time in order to destroy bootstrap resources. Consequence: trying to install a gcp cluster with a custom machine type causes installation to fail because terraform implementation falsely interprets the second apply as a change to the machine type of the existing vm. Fix: we tell terraform to ignore the lifecycle change to machine types. Result: installation with custom machine types succeeds.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:44:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
.openshift-install.log none

Description To Hung Sze 2020-12-16 01:40:25 UTC
Created attachment 1739499 [details]
.openshift-install.log

(this defect started as install failed with long name and custom type - turns out it is just n1-custom-4-16384)

Created attachment 1739499 [details]
.openshift-install.log

Version:
openshift-install-linux-4.7.0-0.nightly-2020-12-14-080124


Platform: GCP

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.

Install-config used:
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: 
    gcp:
      type: n1-custom-4-16384
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: 
    gcp:
      type: n1-custom-4-16384
  replicas: 3
metadata:
  creationTimestamp: null
  name: tszegcp121520d-1234567890


Install fails with:
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="  on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg="  31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="  on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg="  31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="  on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg="  31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=fatal msg="failed disabling bootstrap load balancing: failed to apply Terraform: failed to complete the change"


Note:
Installations with the long name and custom type separately both succeed.

Comment 1 To Hung Sze 2020-12-16 01:46:11 UTC
I have the must-gather.
Please let me know if it can help.

Comment 2 Patrick Dillon 2020-12-16 18:11:28 UTC
Note, this is happening during bootstrap destroy. 

Similar PR: https://github.com/openshift/installer/pull/2325 fixes BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1746119

Comment 3 Patrick Dillon 2020-12-18 22:48:03 UTC
I reproduced this with both a 3 character cluster name and a 25 character cluster name, so I think cluster name is unrelated. Should be similar fix as https://bugzilla.redhat.com/show_bug.cgi?id=1908171#c2 but for machine_type. PR should be up soon.

Comment 5 To Hung Sze 2021-01-07 17:45:57 UTC
Verified with 4.7 fc1
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      type: n1-custom-4-16384
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
     gcp:
       type: n1-custom-4-16384
  replicas: 3
metadata:
  creationTimestamp: null
  name: tszegcp010720c-1234567890

Comment 8 errata-xmlrpc 2021-02-24 15:44:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633