Created attachment 1918018[details]
olm-operator log
Created attachment 1918018[details]
olm-operator log
> Description of problem:
After installing ODF 4.12 on a fresh OCP 4.12 cluster we experience a massive CPU consumption by the olm-operator of the openshift-operator-lifecycle-manager namespace.
This issue is also observed for OCP 4.11 + ODF 4.11 on a s390x environment and on x86.
After ODF installation the olm-operator reports a CPU usage of 700-1200 CPU(cores) -- an increase of >500:
----------------------------------------------------------------
$ oc adm top pods -n openshift-operator-lifecycle-manager
NAME CPU(cores) MEMORY(bytes)
catalog-operator-76d8bd4744-q7grv 1m 155Mi
olm-operator-59f6f9d47c-4gnh7 766m 257Mi
package-server-manager-845c445bcc-q9mh4 0m 26Mi
packageserver-7bc9dcb955-vfw54 3m 190Mi
packageserver-7bc9dcb955-wqpzj 6m 182Mi
----------------------------------------------------------------
The logs of the olm-operator reveal a sync issue:
----------------------------------------------------------------
$ oc logs olm-operator-59f6f9d47c-4gnh7 -n openshift-operator-lifecycle-manager
{"level":"error","ts":1665648524.4145186,"logger":"controllers.operator","msg":"Could not update Operator status","request":"/ocs-operator.openshift-storage","error":"Operation cannot be fulfilled on operators.operators.coreos.com \"ocs-operator.openshift-storage\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}
E1013 09:04:29.346569 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
E1013 09:07:53.629534 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
E1013 09:21:30.863620 1 queueinformer_operator.go:290] sync {"update" "openshift-operator-lifecycle-manager/packageserver"} failed: could not update operatorgroups olm.providedAPIs annotation: Operation cannot be fulfilled on operatorgroups.operators.coreos.com "olm-operators": the object has been modified; please apply your changes to the latest version and try again
----------------------------------------------------------------
> Version of all relevant components (if applicable):
4.12: We have observed this issue with OCP 4.12.0-ec.2 and odf-operator.4.12.0.
4.11: We have observed this issue with OCP 4.11.7/4.11.6 and odf-operator.v4.11.0
> Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)?
The high cpu requests of the olm-operator have a severe impact on customers.
The cpu requests translate into one of the CPUs being fully utilized.
This raises the costs of operating ODF on our platform immensely.
On IBM Z the number of consumed CPUs is the key driver for HW costs and customer are extremely sensitive with that regards.
> Is there any workaround available to the best of your knowledge?
There is no known workaround.
> Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)?
1
> Can this issue reproducible?
Yes, by installing OCP 4.11.7 (or 4.12.0-ec.2) on s390x and x86 and installing odf 4.11 (or odf 4.12) on top of it.
> Can this issue reproduce from the UI?
-
> If this is a regression, please provide more details to justify this:
-
> Steps to Reproduce:
1. Install OCP 4.12.0-ec.2 on s390x or x86
2. Install ODF 4.12
3. Observe the olm-operator cpu requests
> Actual results:
The olm-operator consumes up to 1000m continuously, even after a seemingly successful installation and letting the cluster idle for hours/days.
> Expected results:
The olm-operator should not use that much cpu requests.
> Additional info:
Please see the logs of the olm-operator for ODF 4.12 (s390x).
The high cpu requests of the olm-operator have a severe impact on customers.
The cpu requests translate into one of the CPUs being fully utilized.
This raises the costs of operating ODF on our platform immensely.
On IBM Z the number of consumed CPUs is the key driver for HW costs and customer are extremely sensitive with that regards.
Thanks for the confirmation, Alex. I really appreciate the quick response.
Closing the bug in the odf as the fix is in the OLM and Alex is working on it.