Bug 2046554 - infrastructure-operator pod crashes due to insufficient privileges in ACM 2.5
Summary: infrastructure-operator pod crashes due to insufficient privileges in ACM 2.5
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Advanced Cluster Management for Kubernetes
Classification: Red Hat
Component: Cluster Lifecycle
Version: rhacm-2.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: rhacm-2.5
Assignee: Jian Qiu
QA Contact: Hui Chen
Christopher Dawson
URL:
Whiteboard:
: 2050363 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-26 23:01 UTC by Thuy Nguyen
Modified: 2023-09-18 04:30 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-06-09 02:08:48 UTC
Target Upstream Version:
Embargoed:
bot-tracker-sync: rhacm-2.5+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github stolostron backlog issues 19478 0 None None None 2022-01-27 01:31:00 UTC
Red Hat Product Errata RHSA-2022:4956 0 None None None 2022-06-09 02:09:16 UTC

Description Thuy Nguyen 2022-01-26 23:01:38 UTC
Description of the problem: infrastructure-operator pod crashes due to insufficient privileges in ACM 2.5

Release version: ACM 2.5.0

Operator snapshot version: 2.5.0-DOWNSTREAM-2022-01-19-20-35-27 (Final Sprint 0)

OCP version: 4.9.11

Browser Info: Firefox 91.5.0esr (64-bit)

Steps to reproduce:
1. Install ACM 2.5 onto ocm namespace
2. Check infrastructure-operator pod 

Actual results:
The infrastructure-operator pod crashes

# oc get po -n ocm | grep Crash
infrastructure-operator-694dfdf9f6-qtr57                          0/1     CrashLoopBackOff   5 (2m11s ago)   19m

Expected results:
All pods are up and running

Additional info:

infrastructure-operator pod logs -

I0126 22:56:20.692369       1 request.go:668] Waited for 1.034635988s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/imageregistry.open-cluster-management.io/v1alpha1?timeout=32s
{"level":"info","ts":1643237784.394883,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1643237789.1602755,"logger":"setup","msg":"starting manager"}
I0126 22:56:29.160639       1 leaderelection.go:243] attempting to acquire leader lease ocm/86f835c3.agent-install.openshift.io...
{"level":"info","ts":1643237789.160651,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
I0126 22:56:46.196847       1 leaderelection.go:253] successfully acquired lease ocm/86f835c3.agent-install.openshift.io
{"level":"info","ts":1643237806.1971397,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1971996,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972117,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.19722,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972282,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972363,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.197244,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.197251,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972582,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972685,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972754,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.197282,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.1972885,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="}
{"level":"info","ts":1643237806.197295,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting Controller","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig"}
E0126 22:56:46.210237       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:56:46.228175       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:56:47.590934       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:56:47.685548       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:56:49.993785       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:56:50.113672       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:56:53.837276       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:56:54.188915       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:57:03.254215       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:57:03.656575       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:57:22.132234       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:57:22.812778       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:58:03.239809       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
E0126 22:58:11.510740       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope
E0126 22:58:36.529824       1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope
{"level":"error","ts":1643237926.1978145,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Could not wait for Cache to sync","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","error":"failed to wait for agentserviceconfig caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:195\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:221\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/manager/internal.go:697"}
{"level":"error","ts":1643237926.19815,"logger":"controller-runtime.manager","msg":"error received after stop sequence was engaged","error":"leader election lost"}
{"level":"error","ts":1643237926.1982,"logger":"controller-runtime.manager","msg":"error received after stop sequence was engaged","error":"context canceled"}
{"level":"error","ts":1643237926.1982052,"logger":"setup","msg":"problem running manager","error":"failed to wait for agentserviceconfig caches to sync: timed out waiting for cache to be synced","stacktrace":"main.main\n\t/remote-source/assisted-service/app/cmd/operator/main.go:164\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:255"}

Comment 1 ian zhang 2022-02-04 14:32:36 UTC
@mfilanov 

Hi Michael, 

Do you know how do we set up the RBAC for AI on ACM? if so can you please let me know, I will try to follow the previous steps to add the above.

Comment 2 Michael Filanov 2022-02-06 07:53:32 UTC
Hey, all the RNAC config is located at https://github.com/openshift/assisted-service/tree/master/config/rbac 
@asegurap do you know how it is being deployed or how we can debug it?

Comment 3 Michael Hrivnak 2022-02-07 14:28:56 UTC
Wasn't there some automation to convert the CSV (https://github.com/openshift/assisted-service/blob/master/deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml) into manifests for ACM's helm chart?

I understood that the artifact being released and handed off would be an OLM bundle, and that ACM could then consume that for integration into ACM via a helm chart. Is that what's happening, or did we settle into some other process?

Comment 4 Chad Crum 2022-02-07 15:55:48 UTC
*** Bug 2050363 has been marked as a duplicate of this bug. ***

Comment 5 ian zhang 2022-02-07 16:21:49 UTC
I believe the acm is using https://github.com/stolostron/assisted-service-chart to convert the csv to acm deployments.

It seems the AI added some new clusterrole entries, such as clusterrolebindings at https://github.com/openshift/assisted-service/blob/master/deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml, however, the acm side didn't update these new clusterroles to the converter code.

@jagray can you please help us update the acm's converter code? Also @mhrivnak can you please let up know if there's a way to identify all the new changes to the csv? do we just do a diff of the commit?

Comment 6 Jakob 2022-02-07 21:35:52 UTC
I've updated the automation. This seems to have picked up some role changes: https://github.com/stolostron/assisted-service-chart/commit/103d0e9c4c73cc9ae87e1e2984640fbd4afb559f. The workflow had been disabled because there hadn't been activity in the repo for 60 days, which I wasn't aware happened.

Comment 7 ian zhang 2022-02-09 21:13:39 UTC
Hi @thnguyen 

Do can you please try out our latest image to see if the above changes made by Jakob is working or not?

Comment 8 Michael Hrivnak 2022-02-14 16:14:33 UTC
The automation is best, but a diff of CSV would of course show any changes. Looks like this should be resolved now.

Comment 9 Thuy Nguyen 2022-02-14 16:33:51 UTC
@izhang, please change the status to ON_QA if the fix is already in and ready to test. Please also include the upstream/downstream build that contains the fix. Thank you.

Comment 10 ian zhang 2022-02-15 14:33:35 UTC
@jagray 

Hi Jakob, 

Do you know the specific build for this issue?

Comment 11 Jakob 2022-02-15 14:46:15 UTC
I only know the change was committed on Feb 7, 2022. Any build after that date will have the change.

Comment 12 Alex Krzos 2022-02-15 16:59:18 UTC
I have version 2.5.0-DOWNSTREAM-2022-02-10-07-31-45 of ACM deployed and I no longer see the infrastructure operator crashlooping FWIW

Comment 13 Thuy Nguyen 2022-02-16 14:48:14 UTC
Validated on 2.5.0-DOWNSTREAM-2022-02-14-13-53-12.

Comment 18 errata-xmlrpc 2022-06-09 02:08:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4956

Comment 19 Kimjonas34 2023-01-03 05:48:48 UTC Comment hidden (spam)
Comment 20 Red Hat Bugzilla 2023-09-18 04:30:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.