Description of the problem: infrastructure-operator pod crashes due to insufficient privileges in ACM 2.5 Release version: ACM 2.5.0 Operator snapshot version: 2.5.0-DOWNSTREAM-2022-01-19-20-35-27 (Final Sprint 0) OCP version: 4.9.11 Browser Info: Firefox 91.5.0esr (64-bit) Steps to reproduce: 1. Install ACM 2.5 onto ocm namespace 2. Check infrastructure-operator pod Actual results: The infrastructure-operator pod crashes # oc get po -n ocm | grep Crash infrastructure-operator-694dfdf9f6-qtr57 0/1 CrashLoopBackOff 5 (2m11s ago) 19m Expected results: All pods are up and running Additional info: infrastructure-operator pod logs - I0126 22:56:20.692369 1 request.go:668] Waited for 1.034635988s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/imageregistry.open-cluster-management.io/v1alpha1?timeout=32s {"level":"info","ts":1643237784.394883,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"} {"level":"info","ts":1643237789.1602755,"logger":"setup","msg":"starting manager"} I0126 22:56:29.160639 1 leaderelection.go:243] attempting to acquire leader lease ocm/86f835c3.agent-install.openshift.io... {"level":"info","ts":1643237789.160651,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} I0126 22:56:46.196847 1 leaderelection.go:253] successfully acquired lease ocm/86f835c3.agent-install.openshift.io {"level":"info","ts":1643237806.1971397,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1971996,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972117,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.19722,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972282,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972363,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.197244,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.197251,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972582,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972685,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972754,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.197282,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.1972885,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting EventSource","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","source":"kind source: /, Kind="} {"level":"info","ts":1643237806.197295,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Starting Controller","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig"} E0126 22:56:46.210237 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:56:46.228175 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:56:47.590934 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:56:47.685548 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:56:49.993785 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:56:50.113672 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:56:53.837276 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:56:54.188915 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:57:03.254215 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:57:03.656575 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:57:22.132234 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:57:22.812778 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:58:03.239809 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope E0126 22:58:11.510740 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.ClusterRoleBinding: failed to list *v1.ClusterRoleBinding: clusterrolebindings.rbac.authorization.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "clusterrolebindings" in API group "rbac.authorization.k8s.io" at the cluster scope E0126 22:58:36.529824 1 reflector.go:138] pkg/mod/k8s.io/client-go.1/tools/cache/reflector.go:167: Failed to watch *v1.APIService: failed to list *v1.APIService: apiservices.apiregistration.k8s.io is forbidden: User "system:serviceaccount:ocm:assisted-service" cannot list resource "apiservices" in API group "apiregistration.k8s.io" at the cluster scope {"level":"error","ts":1643237926.1978145,"logger":"controller-runtime.manager.controller.agentserviceconfig","msg":"Could not wait for Cache to sync","reconciler group":"agent-install.openshift.io","reconciler kind":"AgentServiceConfig","error":"failed to wait for agentserviceconfig caches to sync: timed out waiting for cache to be synced","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:195\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/internal/controller/controller.go:221\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/remote-source/assisted-service/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.0/pkg/manager/internal.go:697"} {"level":"error","ts":1643237926.19815,"logger":"controller-runtime.manager","msg":"error received after stop sequence was engaged","error":"leader election lost"} {"level":"error","ts":1643237926.1982,"logger":"controller-runtime.manager","msg":"error received after stop sequence was engaged","error":"context canceled"} {"level":"error","ts":1643237926.1982052,"logger":"setup","msg":"problem running manager","error":"failed to wait for agentserviceconfig caches to sync: timed out waiting for cache to be synced","stacktrace":"main.main\n\t/remote-source/assisted-service/app/cmd/operator/main.go:164\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:255"}
@mfilanov Hi Michael, Do you know how do we set up the RBAC for AI on ACM? if so can you please let me know, I will try to follow the previous steps to add the above.
Hey, all the RNAC config is located at https://github.com/openshift/assisted-service/tree/master/config/rbac @asegurap do you know how it is being deployed or how we can debug it?
Wasn't there some automation to convert the CSV (https://github.com/openshift/assisted-service/blob/master/deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml) into manifests for ACM's helm chart? I understood that the artifact being released and handed off would be an OLM bundle, and that ACM could then consume that for integration into ACM via a helm chart. Is that what's happening, or did we settle into some other process?
*** Bug 2050363 has been marked as a duplicate of this bug. ***
I believe the acm is using https://github.com/stolostron/assisted-service-chart to convert the csv to acm deployments. It seems the AI added some new clusterrole entries, such as clusterrolebindings at https://github.com/openshift/assisted-service/blob/master/deploy/olm-catalog/manifests/assisted-service-operator.clusterserviceversion.yaml, however, the acm side didn't update these new clusterroles to the converter code. @jagray can you please help us update the acm's converter code? Also @mhrivnak can you please let up know if there's a way to identify all the new changes to the csv? do we just do a diff of the commit?
I've updated the automation. This seems to have picked up some role changes: https://github.com/stolostron/assisted-service-chart/commit/103d0e9c4c73cc9ae87e1e2984640fbd4afb559f. The workflow had been disabled because there hadn't been activity in the repo for 60 days, which I wasn't aware happened.
Hi @thnguyen Do can you please try out our latest image to see if the above changes made by Jakob is working or not?
The automation is best, but a diff of CSV would of course show any changes. Looks like this should be resolved now.
@izhang, please change the status to ON_QA if the fix is already in and ready to test. Please also include the upstream/downstream build that contains the fix. Thank you.
@jagray Hi Jakob, Do you know the specific build for this issue?
I only know the change was committed on Feb 7, 2022. Any build after that date will have the change.
I have version 2.5.0-DOWNSTREAM-2022-02-10-07-31-45 of ACM deployed and I no longer see the infrastructure operator crashlooping FWIW
Validated on 2.5.0-DOWNSTREAM-2022-02-14-13-53-12.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management 2.5 security updates, images, and bug fixes), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:4956
This comment was flagged a spam, view the edit history to see the original text if required.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days