i installed a 4.6 cluster and installed OCS and ACM operators. olm-operator now uses a consistent 1 cpu of actual usage. when i go into the logs, it is reconciling in a super tight loop. i see this about 5-6 per second for both OCS and ACM 2020-10-13T13:44:47.147Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.195Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.338Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.400Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.487Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.588Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.653Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.734Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.819Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-13T13:44:47.874Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/ocs-operator.openshift-storage"}
Saw this as well. Indicative of a racing change on the resource: 2020-10-14T02:54:25.816Z DEBUG controllers.operator reconciling operator {"request": "/advanced-cluster-management.open-cluster-management"} 2020-10-14T02:54:25.891Z ERROR controllers.operator Could not update Operator status {"request": "/advanced-cluster-management.open-cluster-management", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"advanced-cluster-management.open-cluster-management\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(*zapLogger).Error /build/vendor/github.com/go-logr/zapr/zapr.go:128 github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile /build/pkg/controller/operators/operator_controller.go:148 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:256 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 2020-10-14T02:54:25.892Z ERROR controller-runtime.controller Reconciler error {"controller": "operator", "request": "/advanced-cluster-management.open-cluster-management", "error": "Operation cannot be fulfilled on operators.operators.coreos.com \"advanced-cluster-management.open-cluster-management\": the object has been modified; please apply your changes to the latest version and try again"} github.com/go-logr/zapr.(*zapLogger).Error /build/vendor/github.com/go-logr/zapr/zapr.go:128 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker /build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1 /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 k8s.io/apimachinery/pkg/util/wait.BackoffUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 k8s.io/apimachinery/pkg/util/wait.JitterUntil /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 k8s.io/apimachinery/pkg/util/wait.Until /build/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
I confirmed that if I do oc get operators.operators.coreos.com -w I see rapid changes to the operators in the tight reconciliation loop. I also change the apiserver audit profile to WriteRequestBodies and confirmed that all the updates are coming from the system:serviceaccount:openshift-operator-lifecycle-manager:olm-operator-serviceaccount and the "userAgent":"olm/v0.0.0 (linux/amd64) kubernetes/$Format" Looking at the audit log, it looks like there is constant churn on status.components.refs
Confirmed that is it the change to the operator itself that is triggering the Reconcile using https://github.com/operator-framework/operator-lifecycle-manager/pull/1810/files If it was caused by any of the Watchesif resources, we would see it being enqueued in the requests in mapComponentRequests https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/operator_controller.go#L69 https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/operator_controller.go#L218-L228
This is pretty bad. Raising severity to high.
Verify it. LGTM -- [root@preserve-olm-env 1888073]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-10-21-001511 True False 6h4m Cluster version is 4.7.0-0.nightly-2020-10-21-001511 [root@preserve-olm-env 1888073]# oc get pod -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-764f988744-qsrrf 1/1 Running 0 6h1m olm-operator-5b867f9884-rbplb 1/1 Running 0 6h1m packageserver-68c5c96db8-6cf8p 1/1 Running 0 5h59m packageserver-68c5c96db8-8gf8w 1/1 Running 0 6h1m [root@preserve-olm-env 1888073]# oc exec olm-operator-5b867f9884-rbplb -n openshift-operator-lifecycle-manager -- olm --version OLM version: 0.16.1 git commit: 8979865647c29e2541c8efd65162cdba1f242f70 [root@preserve-olm-env 1888073]# cat og-ocs.yaml kind: OperatorGroup apiVersion: operators.coreos.com/v1 metadata: name: og-single namespace: openshift-storage spec: targetNamespaces: - openshift-storage [root@preserve-olm-env 1888073]# oc apply -f og-ocs.yaml operatorgroup.operators.coreos.com/og-single created [root@preserve-olm-env 1888073]# cat sub-ocs.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ocs-operator namespace: openshift-storage spec: channel: stable-4.5 installPlanApproval: Automatic name: ocs-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: ocs-operator.v4.5.1 [root@preserve-olm-env 1888073]# oc apply -f sub-ocs.yaml subscription.operators.coreos.com/ocs-operator created [root@preserve-olm-env 1888073]# oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.5.1 OpenShift Container Storage 4.5.1 Succeeded [root@preserve-olm-env 1888073]# cat og-acm.yaml kind: OperatorGroup apiVersion: operators.coreos.com/v1 metadata: name: og-single namespace: open-cluster-management spec: targetNamespaces: - open-cluster-management [root@preserve-olm-env 1888073]# oc apply -f og-acm.yaml operatorgroup.operators.coreos.com/og-single created [root@preserve-olm-env 1888073]# cat sub-acm.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: advanced-cluster-management namespace: open-cluster-management spec: channel: release-2.0 installPlanApproval: Automatic name: advanced-cluster-management source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: advanced-cluster-management.v2.0.3 [root@preserve-olm-env 1888073]# oc apply -f sub-acm.yaml subscription.operators.coreos.com/advanced-cluster-management created [root@preserve-olm-env 1888073]# oc get csv -n open-cluster-management NAME DISPLAY VERSION REPLACES PHASE advanced-cluster-management.v2.0.3 Advanced Cluster Management for Kubernetes 2.0.3 advanced-cluster-management.v2.0.2 Succeeded [root@preserve-olm-env 1888073]# oc logs --since=5s olm-operator-5b867f9884-rbplb -n openshift-operator-lifecycle-manager|grep DEBUG|grep "ocs-operator.openshift-storage"|grep "reconciling operator" 2020-10-22T05:45:36.629Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:36.676Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:38.135Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:38.215Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:39.622Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:39.695Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:41.137Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} 2020-10-22T05:45:41.201Z DEBUG controllers.operator reconciling operator {"request": "/ocs-operator.openshift-storage"} [root@preserve-olm-env 1888073]# oc logs --since=180s olm-operator-5b867f9884-rbplb -n openshift-operator-lifecycle-manager|grep ERROR|grep "advanced-cluster-management.open-cluster-management" [root@preserve-olm-env 1888073]# --
*** Bug 1917537 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633