Bug 1807125

Summary: Unable to add a new master gcp
Product: OpenShift Container Platform Reporter: Alay Patel <alpatel>
Component: Cloud ComputeAssignee: Alberto <agarcial>
Cloud Compute sub component: BareMetal Provider QA Contact: sunzhaohua <zhsun>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: mfojtik, skolicha, stbenjam, zhsun
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1812815 (view as bug list) Environment:
Last Closed: 2020-08-27 22:34:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1812815    

Description Alay Patel 2020-02-25 16:28:17 UTC
Description of problem:

In order for the disaster recovery scenario of replacing one failed master (where the VM disappears from underlying infra) we need to be able to provision a new master node in case of IPI using the `machine` resource. I tried to create a machine resource in GCP but it fails with something like:


-----
I0220 16:36:10.612023       1 actuator.go:80] alpate-hqg7k-m-3: Checking if machine exists
I0220 16:36:11.043501       1 controller.go:260] alpate-hqg7k-m-3: reconciling machine triggers idempotent update
I0220 16:36:11.043706       1 actuator.go:98] alpate-hqg7k-m-3: Updating machine
I0220 16:36:11.225755       1 reconciler.go:372] alpate-hqg7k-m-3: reconciling instance for targetpool with cloud provider; desired state: true
I0220 16:36:11.751938       1 machine_scope.go:159] alpate-hqg7k-m-3: status unchanged
E0220 16:36:11.759837       1 controller.go:262] alpate-hqg7k-m-3: error updating machine: failed to add instance alpate-hqg7k-m-3 to target pool alpate-hqg7k-api: googleapi: Error 403: Required 'compute.targetPools.addInstance' permissio
n for 'projects/openshift-gce-devel/regions/us-east1/targetPools/alpate-hqg7k-api', forbidden


Note: A similar step works for AWS

Comment 5 sunzhaohua 2020-03-27 03:07:54 UTC
Verified
clusterversion: 4.5.0-0.nightly-2020-03-26-211208

Created a new master machine, machine was created successfully and joined the cluster.
$ oc get node
NAME                                             STATUS   ROLES    AGE     VERSION
zhsun5-6f4rk-m-0.c.openshift-qe.internal         Ready    master   45m     v1.17.1
zhsun5-6f4rk-m-00.c.openshift-qe.internal        Ready    master   9m48s   v1.17.1
zhsun5-6f4rk-m-1.c.openshift-qe.internal         Ready    master   45m     v1.17.1
zhsun5-6f4rk-m-2.c.openshift-qe.internal         Ready    master   45m     v1.17.1
zhsun5-6f4rk-w-a-92p5n.c.openshift-qe.internal   Ready    worker   33m     v1.17.1
zhsun5-6f4rk-w-b-txrc5.c.openshift-qe.internal   Ready    worker   33m     v1.17.1
zhsun5-6f4rk-w-c-zphwl.c.openshift-qe.internal   Ready    worker   33m     v1.17.1

I0327 02:52:01.131989       1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine
I0327 02:52:01.143928       1 controller.go:282] controller-runtime/controller "msg"="Successfully Reconciled"  "controller"="machine_controller" "request"={"Namespace":"openshift-machine-api","Name":"zhsun5-6f4rk-m-00"}
I0327 02:52:01.143986       1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine
I0327 02:52:01.143997       1 actuator.go:75] zhsun5-6f4rk-m-00: Checking if machine exists
I0327 02:52:01.550094       1 reconciler.go:302] zhsun5-6f4rk-m-00: Machine does not exist
I0327 02:52:01.550125       1 controller.go:419] zhsun5-6f4rk-m-00: going into phase "Provisioning"
I0327 02:52:01.558883       1 controller.go:307] zhsun5-6f4rk-m-00: reconciling machine triggers idempotent create
I0327 02:52:01.560790       1 actuator.go:57] zhsun5-6f4rk-m-00: Creating machine
I0327 02:52:03.426919       1 reconciler.go:168] zhsun5-6f4rk-m-00: Reconciling machine object with cloud state
I0327 02:52:03.632213       1 reconciler.go:216] zhsun5-6f4rk-m-00: machine status is "PROVISIONING", requeuing...
I0327 02:52:03.632329       1 machine_scope.go:161] "zhsun5-6f4rk-m-00": patching machine
W0327 02:52:03.650669       1 controller.go:309] zhsun5-6f4rk-m-00: failed to create machine: requeue in: 20s
I0327 02:52:03.650697       1 controller.go:400] Actuator returned requeue-after error: requeue in: 20s
I0327 02:52:03.650738       1 controller.go:163] zhsun5-6f4rk-m-00: reconciling Machine
I0327 02:52:03.650744       1 actuator.go:75] zhsun5-6f4rk-m-00: Checking if machine exists
I0327 02:52:03.655245       1 recorder.go:52] controller-runtime/manager/events "msg"="Warning"  "message"="requeue in: 20s" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"zhsun5-6f4rk-m-00","uid":"825a4ec6-a0d4-4701-9572-0f66180b2854","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"26443"} "reason"="FailedCreate"
I0327 02:52:03.963629       1 controller.go:271] zhsun5-6f4rk-m-00: reconciling machine triggers idempotent update
I0327 02:52:03.963660       1 actuator.go:92] zhsun5-6f4rk-m-00: Updating machine
I0327 02:52:04.188030       1 reconciler.go:372] zhsun5-6f4rk-m-00: reconciling instance for targetpool with cloud provider; desired state: true
I0327 02:52:04.826296       1 reconciler.go:168] zhsun5-6f4rk-m-00: Reconciling machine object with cloud state
I0327 02:52:04.948397       1 reconciler.go:216] zhsun5-6f4rk-m-00: machine status is "PROVISIONING", requeuing...

Comment 6 Luke Meyer 2020-08-27 22:34:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409'

Comment 7 Red Hat Bugzilla 2023-09-14 05:53:21 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days