Bug 1816806

Summary: system:serviceaccount:kube-system:cloud-provider cannot create resource events
Product: OpenShift Container Platform Reporter: Jeremiah Stuever <jstuever>
Component: kube-controller-managerAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: agarcial, aos-bugs, mfojtik, mgugino, yanyang
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Story Points: ---
Clone Of:
: 1853171 (view as bug list) Environment:
Last Closed: 2020-07-21 10:31:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1821671    
Bug Blocks: 1853171    

Description Jeremiah Stuever 2020-03-24 19:04:54 UTC
Description of problem:

When creating a GCP UPI cluster using Shared VPC (XPN), the cloud provider realizes it cannot create firewall rules in the host project and attempts to emit a kube event in the cluster informing the user to manually create firewall rules. However, it appears it does not have access to do so.


Version-Release number of selected component (if applicable):

4.4, presumably 4.3 and before.

How reproducible:

Always

Steps to Reproduce:
1. Follow steps to produce GCP UPI cluster using Shared VPC (XPN).
2. Wait for ingress to configure load balancers, health checks, and firewall rules.
3. Monitor kube-controller-manager logs.

Actual results:

Fails to add the events.

Expected results:

Events should be added informing user to add additional firewall rules.

Additional info:

Shared VPC (XPN) instructions (PR):
https://github.com/openshift/installer/pull/3278

Upstream documentation (known issue):
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc#known_issues

Comment 2 Maciej Szulik 2020-03-25 10:38:23 UTC
Moving to cloud team since they own the cloud provider bits.

Comment 3 Alberto 2020-03-25 11:29:15 UTC
Hey Maciej, based on those logs openshift-kube-controller-manager has no perms to record events on the ns "openshift-ingress". Which is one is the operator owning the rbac for this pod?
https://github.com/openshift/cluster-openshift-controller-manager-operator/
https://github.com/openshift/cluster-kube-controller-manager-operator

Comment 5 Danil Grigorev 2020-05-13 12:48:47 UTC
Could you check if your cluster was provisioned with ClusterRoleBinding gce:cloud-provider and same ClusterRole? Both of those should be added from add-ons folder. I'm not entirely sure if this is the right track, but my GCP 4.2 cluster container AWS specific clusterRoles only.

- https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31
- https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31

Comment 6 Michael Gugino 2020-05-13 15:24:18 UTC
I'm moving this to target 4.6 for now.  If the fix ends up being super simple, we can ship in 4.5, otherwise, ship to 4.6 and backport after 4.5 GA.

Comment 7 Jeremiah Stuever 2020-05-13 19:54:12 UTC
Looking at output from an `oc adm must-gather`, it doesn't appear to have any cloud-provider specific roles/bindings. This mirrors what I see in a GCP IPI cluster.

$ tree cluster-scoped-resources/rbac.authorization.k8s.io
cluster-scoped-resources/rbac.authorization.k8s.io
├── clusterrolebindings
│   ├── multus-admission-controller-webhook.yaml
│   ├── multus-whereabouts.yaml
│   ├── multus.yaml
│   ├── openshift-sdn-controller.yaml
│   ├── openshift-sdn.yaml
│   └── registry-registry-role.yaml
└── clusterroles
    ├── machine-api-controllers.yaml
    ├── machine-api-operator.yaml
    ├── multus-admission-controller-webhook.yaml
    ├── multus.yaml
    ├── openshift-sdn-controller.yaml
    ├── openshift-sdn.yaml
    ├── system:registry.yaml
    └── whereabouts-cni.yaml

Comment 9 Alberto 2020-05-29 10:41:13 UTC
This needs further investigation. Tagging with upcomingSprint

Comment 10 Danil Grigorev 2020-06-23 08:36:43 UTC
Please provide a full must-gather for this BZ

Comment 12 Alberto 2020-06-30 15:00:00 UTC
This perms are owned by the cluster-kube-controller-manager-operator and it seems to be fixed in 4.5 by https://github.com/openshift/cluster-kube-controller-manager-operator/commit/44559f8a9cb25b7fbf704cad970edd0db13be019#diff-48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to kube-controller-manager component to evaluate backporting or closing this.

Comment 13 Jeremiah Stuever 2020-07-01 17:16:17 UTC
(In reply to Alberto from comment #12)
> This perms are owned by the cluster-kube-controller-manager-operator and it
> seems to be fixed in 4.5 by
> https://github.com/openshift/cluster-kube-controller-manager-operator/commit/
> 44559f8a9cb25b7fbf704cad970edd0db13be019#diff-
> 48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to
> kube-controller-manager component to evaluate backporting or closing this.

This explains why I was struggling to reproduce it using 4.5. I'll give it another look and verify the event is being generated as expected.

Comment 17 zhou ying 2020-07-13 08:23:10 UTC
@Jeremiah Stuever:

xpn is supported since 4.5, I'm wondering how could I verify this issue for ocp 4.4 ? could you please give some advice ?

Comment 21 errata-xmlrpc 2020-07-21 10:31:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2913