Cause: installing with a custom machine type fails because "terraform apply" command is used a second time in order to destroy bootstrap resources.
Consequence: trying to install a gcp cluster with a custom machine type causes installation to fail because terraform implementation falsely interprets the second apply as a change to the machine type of the existing vm.
Fix: we tell terraform to ignore the lifecycle change to machine types.
Result: installation with custom machine types succeeds.
Created attachment 1739499[details]
.openshift-install.log
(this defect started as install failed with long name and custom type - turns out it is just n1-custom-4-16384)
Created attachment 1739499[details]
.openshift-install.log
Version:
openshift-install-linux-4.7.0-0.nightly-2020-12-14-080124
Platform: GCP
#Please specify the platform type: aws, libvirt, openstack or baremetal etc.
Install-config used:
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform:
gcp:
type: n1-custom-4-16384
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform:
gcp:
type: n1-custom-4-16384
replicas: 3
metadata:
creationTimestamp: null
name: tszegcp121520d-1234567890
Install fails with:
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg=" on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg=" 31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg=" on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg=" 31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg="Error: Changing the machine_type, min_cpu_platform, service_account, or enable display on a started instance requires stopping it. To acknowledge this, please set allow_stopping_for_update = true in your config. You can also stop it by setting desired_status = \"TERMINATED\", but the instance will not be restarted after the update."
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error msg=" on ../../../../../tmp/openshift-install-672567699/master/main.tf line 31, in resource \"google_compute_instance\" \"master\":"
time="2020-12-15T20:32:26-05:00" level=error msg=" 31: resource \"google_compute_instance\" \"master\" {"
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=error
time="2020-12-15T20:32:26-05:00" level=fatal msg="failed disabling bootstrap load balancing: failed to apply Terraform: failed to complete the change"
Note:
Installations with the long name and custom type separately both succeed.
I reproduced this with both a 3 character cluster name and a 25 character cluster name, so I think cluster name is unrelated. Should be similar fix as https://bugzilla.redhat.com/show_bug.cgi?id=1908171#c2 but for machine_type. PR should be up soon.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2020:5633