Description of problem: In order for the disaster recovery scenario of replacing one failed master (where the VM disappears from underlying infra) we need to be able to provision a new master node in case of IPI using the `machine` resource. I tried to create a machine resource in GCP but it fails with something like: ----- I0220 16:36:10.612023 1 actuator.go:80] alpate-hqg7k-m-3: Checking if machine exists I0220 16:36:11.043501 1 controller.go:260] alpate-hqg7k-m-3: reconciling machine triggers idempotent update I0220 16:36:11.043706 1 actuator.go:98] alpate-hqg7k-m-3: Updating machine I0220 16:36:11.225755 1 reconciler.go:372] alpate-hqg7k-m-3: reconciling instance for targetpool with cloud provider; desired state: true I0220 16:36:11.751938 1 machine_scope.go:159] alpate-hqg7k-m-3: status unchanged E0220 16:36:11.759837 1 controller.go:262] alpate-hqg7k-m-3: error updating machine: failed to add instance alpate-hqg7k-m-3 to target pool alpate-hqg7k-api: googleapi: Error 403: Required 'compute.targetPools.addInstance' permissio n for 'projects/openshift-gce-devel/regions/us-east1/targetPools/alpate-hqg7k-api', forbidden Note: A similar step works for AWS
Verified clusterversion: 4.5.0-0.nightly-2020-03-26-211208 Created a new master machine, machine was created successfully and joined the cluster. $ oc get node NAME STATUS ROLES AGE VERSION zhsun5-6f4rk-m-0.c.openshift-qe.internal Ready master 45m v1.17.1 zhsun5-6f4rk-m-00.c.openshift-qe.internal Ready master 9m48s v1.17.1 zhsun5-6f4rk-m-1.c.openshift-qe.internal Ready master 45m v1.17.1 zhsun5-6f4rk-m-2.c.openshift-qe.internal Ready master 45m v1.17.1 zhsun5-6f4rk-w-a-92p5n.c.openshift-qe.internal Ready worker 33m v1.17.1 zhsun5-6f4rk-w-b-txrc5.c.openshift-qe.internal Ready worker 33m v1.17.1 zhsun5-6f4rk-w-c-zphwl.c.openshift-qe.internal Ready worker 33m v1.17.1 I0327 02:52:01.131989 1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine I0327 02:52:01.143928 1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled" "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun5-6f4rk-m-00"} I0327 02:52:01.143986 1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine I0327 02:52:01.143997 1 actuator.go:75] zhsun5-6f4rk-m-00: Checking if machine exists I0327 02:52:01.550094 1 reconciler.go:302] zhsun5-6f4rk-m-00: Machine does not exist I0327 02:52:01.550125 1 controller.go:419] zhsun5-6f4rk-m-00: going into phase "Provisioning" I0327 02:52:01.558883 1 controller.go:307] zhsun5-6f4rk-m-00: reconciling machine triggers idempotent create I0327 02:52:01.560790 1 actuator.go:57] zhsun5-6f4rk-m-00: Creating machine I0327 02:52:03.426919 1 reconciler.go:168] zhsun5-6f4rk-m-00: Reconciling machine object with cloud state I0327 02:52:03.632213 1 reconciler.go:216] zhsun5-6f4rk-m-00: machine status is "PROVISIONING", requeuing... I0327 02:52:03.632329 1 machine_scope.go:161] "zhsun5-6f4rk-m-00": patching machine W0327 02:52:03.650669 1 controller.go:309] zhsun5-6f4rk-m-00: failed to create machine: requeue in: 20s I0327 02:52:03.650697 1 controller.go:400] Actuator returned requeue-after error: requeue in: 20s I0327 02:52:03.650738 1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine I0327 02:52:03.650744 1 actuator.go:75] zhsun5-6f4rk-m-00: Checking if machine exists I0327 02:52:03.655245 1 recorder.go:52] controller-runtime/manager/events "msg"="Warning" "message"="requeue in: 20s" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun5-6f4rk-m-00","uid":"825a4ec6-a0d4-4701-9572-0f66180b2854","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"26443"} "reason"="FailedCreate" I0327 02:52:03.963629 1 controller.go:271] zhsun5-6f4rk-m-00: reconciling machine triggers idempotent update I0327 02:52:03.963660 1 actuator.go:92] zhsun5-6f4rk-m-00: Updating machine I0327 02:52:04.188030 1 reconciler.go:372] zhsun5-6f4rk-m-00: reconciling instance for targetpool with cloud provider; desired state: true I0327 02:52:04.826296 1 reconciler.go:168] zhsun5-6f4rk-m-00: Reconciling machine object with cloud state I0327 02:52:04.948397 1 reconciler.go:216] zhsun5-6f4rk-m-00: machine status is "PROVISIONING", requeuing...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409'
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days