+++ This bug was initially created as a clone of Bug #1888073 +++ i installed a 4.6 cluster and installed OCS and ACM operators. olm-operator now uses a consistent 1 cpu of actual usage. when i go into the logs, it is reconciling in a super tight loop. i see this about 5-6 per second for both OCS and ACM 2020-10-13T13:44:47.147Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.195Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.338Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.400Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.487Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.588Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.653Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.734Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.819Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.874Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} --- Additional comment from sjenning on 2020-10-14 02:56:07 UTC --- Saw this as well. Indicative of a racing change on the resource: 2020-10-14T02:54:25.816Z DEBUG controllers.operator reconciling operator {"request": "/advanced-cluster-management.open-cluster-management"} 2020-10-14T02:54:25.891Z ERROR controllers.operator Could not update Operator status {"request": "/advanced-cluster-management.open-cluster-management", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"advanced-cluster-management.open-cluster-management\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(*zapLogger).Error /build/vendor/github.com/go-logr/zapr/zapr.go:128 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile /build/pkg/controller/operators/operator_controller.go:148 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 2020-10-14T02:54:25.892Z ERROR controller-runtime.controller Reconciler error {"controller": "operator", "request": "/advanced-cluster-management.open-cluster-management", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"advanced-cluster-management.open-cluster-management\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(*zapLogger).Error /build/vendor/github.com/go-logr/zapr/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 --- Additional comment from sjenning on 2020-10-14 14:57:58 UTC --- I confirmed that if I do oc get operators.operators.coreos.com -w I see rapid changes to the operators in the tight reconciliation loop. I also change the apiserver audit profile to WriteRequestBodies and confirmed that all the updates are coming from the system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount and the "userAgent":"olm/v0.0.0 (linux/amd64) kubernetes/$Format" Looking at the audit log, it looks like there is constant churn on status.components.refs --- Additional comment from sjenning on 2020-10-14 18:40:53 UTC --- Confirmed that is it the change to the operator itself that is triggering the Reconcile using https://github.com/operator-framework/operator-lifecycle-manager/pull/1810/files If it was caused by any of the Watchesif resources, we would see it being enqueued in the requests in mapComponentRequests https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/operator_controller.go#L69 https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/operator_controller.go#L218-L228 --- Additional comment from sjenning on 2020-10-15 18:38:18 UTC --- This is pretty bad. Raising severity to high.
verify it on 4.6. LGTM -- [root@preserve-olm-env 1889745]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.6.0-0.nightly-2020-11-05-171833 True False 22m Cluster version is 4.6.0-0.nightly-2020-11-05-171833 [root@preserve-olm-env 1889745]# oc get pod -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-67745dcff4-wxq7p 1/1 Running 0 15m olm-operator-6f9b978c56-d9bxt 1/1 Running 0 15m packageserver-7f9c7f74b-4cbn9 1/1 Running 0 15m packageserver-7f9c7f74b-vvvx7 1/1 Running 0 18m [root@preserve-olm-env 1889745]# oc exec olm-operator-6f9b978c56-d9bxt -n openshift-operator-lifecycle-manager -- olm --version OLM version: 0.16.1 git commit: aba96a92b6e17452ed239598c1d266ba8d6a1b51 [root@preserve-olm-env 1889745]# oc new-project test-storage Now using project "test-storage" on server "https://api.kui110625.qe.devcluster.openshift.com:6443". You can add applications to this project with the 'new-app' command. For example, try: oc new-app rails-postgresql-example to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application: kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname [root@preserve-olm-env 1889745]# cat og-ocs.yaml kind: OperatorGroup apiVersion: operators.coreos.com/v1 metadata: name: og-single namespace: test-storage spec: targetNamespaces: - test-storage [root@preserve-olm-env 1889745]# oc apply -f og-ocs.yaml operatorgroup.operators.coreos.com/og-single created [root@preserve-olm-env 1889745]# cat sub-ocs.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ocs-operator namespace: test-storage spec: channel: stable-4.5 installPlanApproval: Automatic name: ocs-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: ocs-operator.v4.5.1 [root@preserve-olm-env 1889745]# oc apply -f sub-ocs.yaml subscription.operators.coreos.com/ocs-operator created [root@preserve-olm-env 1889745]# oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.5.1 OpenShift Container Storage 4.5.1 Installing [root@preserve-olm-env 1889745]# oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.5.1 OpenShift Container Storage 4.5.1 Succeeded [root@preserve-olm-env 1889745]# oc new-project open-cluster-management Now using project "open-cluster-management" on server "https://api.kui110625.qe.devcluster.openshift.com:6443". You can add applications to this project with the 'new-app' command. For example, try: oc new-app rails-postgresql-example to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application: kubectl create deployment hello-node --image=k8s.gcr.io/serve_hostname [root@preserve-olm-env 1889745]# cat og-acm.yaml kind: OperatorGroup apiVersion: operators.coreos.com/v1 metadata: name: og-single namespace: open-cluster-management spec: targetNamespaces: - open-cluster-management [root@preserve-olm-env 1889745]# oc apply -f og-acm.yaml operatorgroup.operators.coreos.com/og-single created [root@preserve-olm-env 1889745]# cat sub-acm.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: advanced-cluster-management namespace: open-cluster-management spec: channel: release-2.0 installPlanApproval: Automatic name: advanced-cluster-management source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: advanced-cluster-management.v2.0.3 [root@preserve-olm-env 1889745]# oc apply -f sub-acm.yaml subscription.operators.coreos.com/advanced-cluster-management created [root@preserve-olm-env 1889745]# oc get csv -n open-cluster-management NAME DISPLAY VERSION REPLACES PHASE advanced-cluster-management.v2.0.3 Advanced Cluster Management for Kubernetes 2.0.3 advanced-cluster-management.v2.0.2 Installing advanced-cluster-management.v2.0.4 Advanced Cluster Management for Kubernetes 2.0.4 advanced-cluster-management.v2.0.3 [root@preserve-olm-env 1889745]# oc get csv -n open-cluster-management NAME DISPLAY VERSION REPLACES PHASE advanced-cluster-management.v2.0.3 Advanced Cluster Management for Kubernetes 2.0.3 advanced-cluster-management.v2.0.2 Replacing advanced-cluster-management.v2.0.4 Advanced Cluster Management for Kubernetes 2.0.4 advanced-cluster-management.v2.0.3 Installing [root@preserve-olm-env 1889745]# oc get csv -n open-cluster-management NAME DISPLAY VERSION REPLACES PHASE advanced-cluster-management.v2.0.4 Advanced Cluster Management for Kubernetes 2.0.4 advanced-cluster-management.v2.0.3 Succeeded [root@preserve-olm-env 1889745]# oc logs --since=5s olm-operator-6f9b978c56-d9bxt -n openshift-operator-lifecycle-manager|grep DEBUG|grep "ocs-operator.test-storage"|grep "reconciling operator" 2020-11-06T02:50:23.161Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} 2020-11-06T02:50:23.210Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} 2020-11-06T02:50:25.325Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} 2020-11-06T02:50:25.423Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} 2020-11-06T02:50:27.299Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} 2020-11-06T02:50:27.395Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.test-storage"} [root@preserve-olm-env 1889745]# oc logs --since=180s olm-operator-6f9b978c56-d9bxt -n openshift-operator-lifecycle-manager|grep ERROR|grep "advanced-cluster-management.open-cluster-management" [root@preserve-olm-env 1889745]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.4 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4987