Bug 1816806 - system:serviceaccount:kube-system:cloud-provider cannot create resource events
Summary: system:serviceaccount:kube-system:cloud-provider cannot create resource events
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-controller-manager
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.4.z
Assignee: Tomáš Nožička
QA Contact: zhou ying
URL:
Whiteboard:
Depends On: 1821671
Blocks: 1853171
TreeView+ depends on / blocked
 
Reported: 2020-03-24 19:04 UTC by Jeremiah Stuever
Modified: 2020-07-21 10:31 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Clone Of:
: 1853171 (view as bug list)
Environment:
Last Closed: 2020-07-21 10:31:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-controller-manager-operator pull 426 0 None closed [release-4.4] Bug 1816806: Fix gce cloud provider permissions 2020-09-09 19:42:35 UTC
Red Hat Product Errata RHBA-2020:2913 0 None None None 2020-07-21 10:31:46 UTC

Description Jeremiah Stuever 2020-03-24 19:04:54 UTC
Description of problem:

When creating a GCP UPI cluster using Shared VPC (XPN), the cloud provider realizes it cannot create firewall rules in the host project and attempts to emit a kube event in the cluster informing the user to manually create firewall rules. However, it appears it does not have access to do so.


Version-Release number of selected component (if applicable):

4.4, presumably 4.3 and before.

How reproducible:

Always

Steps to Reproduce:
1. Follow steps to produce GCP UPI cluster using Shared VPC (XPN).
2. Wait for ingress to configure load balancers, health checks, and firewall rules.
3. Monitor kube-controller-manager logs.

Actual results:

Fails to add the events.

Expected results:

Events should be added informing user to add additional firewall rules.

Additional info:

Shared VPC (XPN) instructions (PR):
https://github.com/openshift/installer/pull/3278

Upstream documentation (known issue):
https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc#known_issues

Comment 2 Maciej Szulik 2020-03-25 10:38:23 UTC
Moving to cloud team since they own the cloud provider bits.

Comment 3 Alberto 2020-03-25 11:29:15 UTC
Hey Maciej, based on those logs openshift-kube-controller-manager has no perms to record events on the ns "openshift-ingress". Which is one is the operator owning the rbac for this pod?
https://github.com/openshift/cluster-openshift-controller-manager-operator/
https://github.com/openshift/cluster-kube-controller-manager-operator

Comment 5 Danil Grigorev 2020-05-13 12:48:47 UTC
Could you check if your cluster was provisioned with ClusterRoleBinding gce:cloud-provider and same ClusterRole? Both of those should be added from add-ons folder. I'm not entirely sure if this is the right track, but my GCP 4.2 cluster container AWS specific clusterRoles only.

- https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31
- https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31

Comment 6 Michael Gugino 2020-05-13 15:24:18 UTC
I'm moving this to target 4.6 for now.  If the fix ends up being super simple, we can ship in 4.5, otherwise, ship to 4.6 and backport after 4.5 GA.

Comment 7 Jeremiah Stuever 2020-05-13 19:54:12 UTC
Looking at output from an `oc adm must-gather`, it doesn't appear to have any cloud-provider specific roles/bindings. This mirrors what I see in a GCP IPI cluster.

$ tree cluster-scoped-resources/rbac.authorization.k8s.io
cluster-scoped-resources/rbac.authorization.k8s.io
├── clusterrolebindings
│   ├── multus-admission-controller-webhook.yaml
│   ├── multus-whereabouts.yaml
│   ├── multus.yaml
│   ├── openshift-sdn-controller.yaml
│   ├── openshift-sdn.yaml
│   └── registry-registry-role.yaml
└── clusterroles
    ├── machine-api-controllers.yaml
    ├── machine-api-operator.yaml
    ├── multus-admission-controller-webhook.yaml
    ├── multus.yaml
    ├── openshift-sdn-controller.yaml
    ├── openshift-sdn.yaml
    ├── system:registry.yaml
    └── whereabouts-cni.yaml

Comment 9 Alberto 2020-05-29 10:41:13 UTC
This needs further investigation. Tagging with upcomingSprint

Comment 10 Danil Grigorev 2020-06-23 08:36:43 UTC
Please provide a full must-gather for this BZ

Comment 12 Alberto 2020-06-30 15:00:00 UTC
This perms are owned by the cluster-kube-controller-manager-operator and it seems to be fixed in 4.5 by https://github.com/openshift/cluster-kube-controller-manager-operator/commit/44559f8a9cb25b7fbf704cad970edd0db13be019#diff-48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to kube-controller-manager component to evaluate backporting or closing this.

Comment 13 Jeremiah Stuever 2020-07-01 17:16:17 UTC
(In reply to Alberto from comment #12)
> This perms are owned by the cluster-kube-controller-manager-operator and it
> seems to be fixed in 4.5 by
> https://github.com/openshift/cluster-kube-controller-manager-operator/commit/
> 44559f8a9cb25b7fbf704cad970edd0db13be019#diff-
> 48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to
> kube-controller-manager component to evaluate backporting or closing this.

This explains why I was struggling to reproduce it using 4.5. I'll give it another look and verify the event is being generated as expected.

Comment 17 zhou ying 2020-07-13 08:23:10 UTC
@Jeremiah Stuever:

xpn is supported since 4.5, I'm wondering how could I verify this issue for ocp 4.4 ? could you please give some advice ?

Comment 21 errata-xmlrpc 2020-07-21 10:31:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2913


Note You need to log in before you can comment on or make changes to this bug.