Description of problem: When creating a GCP UPI cluster using Shared VPC (XPN), the cloud provider realizes it cannot create firewall rules in the host project and attempts to emit a kube event in the cluster informing the user to manually create firewall rules. However, it appears it does not have access to do so. Version-Release number of selected component (if applicable): 4.4, presumably 4.3 and before. How reproducible: Always Steps to Reproduce: 1. Follow steps to produce GCP UPI cluster using Shared VPC (XPN). 2. Wait for ingress to configure load balancers, health checks, and firewall rules. 3. Monitor kube-controller-manager logs. Actual results: Fails to add the events. Expected results: Events should be added informing user to add additional firewall rules. Additional info: Shared VPC (XPN) instructions (PR): https://github.com/openshift/installer/pull/3278 Upstream documentation (known issue): https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-shared-vpc#known_issues
Moving to cloud team since they own the cloud provider bits.
Hey Maciej, based on those logs openshift-kube-controller-manager has no perms to record events on the ns "openshift-ingress". Which is one is the operator owning the rbac for this pod? https://github.com/openshift/cluster-openshift-controller-manager-operator/ https://github.com/openshift/cluster-kube-controller-manager-operator
Could you check if your cluster was provisioned with ClusterRoleBinding gce:cloud-provider and same ClusterRole? Both of those should be added from add-ons folder. I'm not entirely sure if this is the right track, but my GCP 4.2 cluster container AWS specific clusterRoles only. - https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31 - https://github.com/openshift/kubernetes/blob/bea625fd65446cca33974e904e4d8c374f047c34/cluster/gce/addons/loadbalancing/cloud-provider-binding.yaml#L16-L31
I'm moving this to target 4.6 for now. If the fix ends up being super simple, we can ship in 4.5, otherwise, ship to 4.6 and backport after 4.5 GA.
Looking at output from an `oc adm must-gather`, it doesn't appear to have any cloud-provider specific roles/bindings. This mirrors what I see in a GCP IPI cluster. $ tree cluster-scoped-resources/rbac.authorization.k8s.io cluster-scoped-resources/rbac.authorization.k8s.io ├── clusterrolebindings │ ├── multus-admission-controller-webhook.yaml │ ├── multus-whereabouts.yaml │ ├── multus.yaml │ ├── openshift-sdn-controller.yaml │ ├── openshift-sdn.yaml │ └── registry-registry-role.yaml └── clusterroles ├── machine-api-controllers.yaml ├── machine-api-operator.yaml ├── multus-admission-controller-webhook.yaml ├── multus.yaml ├── openshift-sdn-controller.yaml ├── openshift-sdn.yaml ├── system:registry.yaml └── whereabouts-cni.yaml
This needs further investigation. Tagging with upcomingSprint
Please provide a full must-gather for this BZ
This perms are owned by the cluster-kube-controller-manager-operator and it seems to be fixed in 4.5 by https://github.com/openshift/cluster-kube-controller-manager-operator/commit/44559f8a9cb25b7fbf704cad970edd0db13be019#diff-48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to kube-controller-manager component to evaluate backporting or closing this.
(In reply to Alberto from comment #12) > This perms are owned by the cluster-kube-controller-manager-operator and it > seems to be fixed in 4.5 by > https://github.com/openshift/cluster-kube-controller-manager-operator/commit/ > 44559f8a9cb25b7fbf704cad970edd0db13be019#diff- > 48bd7b72cf07e126ec75a6359cdbeecd. I'm moving this back to > kube-controller-manager component to evaluate backporting or closing this. This explains why I was struggling to reproduce it using 4.5. I'll give it another look and verify the event is being generated as expected.
@Jeremiah Stuever: xpn is supported since 4.5, I'm wondering how could I verify this issue for ocp 4.4 ? could you please give some advice ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2913