Bug 1932182 - catalog operator causing CPU spikes and bad etcd performance
Summary: catalog operator causing CPU spikes and bad etcd performance
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Ben Luddy
QA Contact: xzha
URL:
Whiteboard:
Depends On:
Blocks: 1938405
TreeView+ depends on / blocked
 
Reported: 2021-02-24 07:44 UTC by Jian Zhang
Modified: 2022-10-11 07:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: When less than one minute remains before an upcoming catalog update polling attempt, the interval jitter function truncates the resync interval down to zero. Consequence: The catalog operator enters a hot-loop, wasting CPU cycles. Fix: Increase precision of the jitter function used to calculate resync delays. Result: The catalog operator remains mostly idle until the next catalog update poll.
Clone Of:
Environment:
Last Closed: 2021-07-27 22:48:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 2025 0 None open Bug 1932182: Support jittering relatively small resync intervals. 2021-03-09 22:19:55 UTC
Red Hat Knowledge Base (Solution) 5759731 0 None None None 2021-02-24 13:08:33 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:48:17 UTC

Comment 7 xzha 2021-03-15 08:39:14 UTC
verify

zhaoxia@wangshanshandeMacBook-Pro test % oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-03-14-134919   True        False         4h36m   Cluster version is 4.8.0-0.nightly-2021-03-14-134919
[root@preserve-olm-env ~]# oc adm release info registry.ci.openshift.org/ocp/release:4.8.0-0.nightly-2021-03-14-134919 --commits|grep operator-lifecycle-manager
  operator-lifecycle-manager                     https://github.com/operator-framework/operator-lifecycle-manager            45512f1de672e399b95220c9088836e289ee1cfe


zhaoxia@wangshanshandeMacBook-Pro test % oc get all
NAME                                                                  READY   STATUS              RESTARTS   AGE
pod/03a576bf8e62913785f8817df821a649bd2219ec8f1386cd0c0908e75avdtvb   0/1     Completed           0          59m
pod/24c6f4b3f063ca9db570182fb4c4784b9b9390003ed9a31ea4a0003270chhh5   0/1     Completed           0          4h46m
pod/8cc7f4add666d4b2bd624f5b2ebc5f0dc940bb77327e333b208b8453f3bd2r4   0/1     Completed           0          60m
pod/c4f671fb78d53d18fafda972f3b9d53e629a16de30d8935d4d90cf3a10gfqbl   0/1     Completed           0          4h44m
pod/cde075aec9bc4f68c1c637351c28307630eadde39635e491c6028c8260kp7nk   0/1     Completed           0          4h45m
pod/certified-operators-kjndk                                         1/1     Running             0          5h24m
pod/community-operators-8q9h8                                         1/1     Running             0          62m
pod/db279975a990e2e6860fec8eb66ca4b437a990fe99c56517303ec4a8efqg85c   0/1     Completed           0          4h45m
pod/eff5a04e02ff20331620c17b99eeb682586f23da740269bc78e96d58bfdw6kh   0/1     Completed           0          5h1m
pod/etcd-tesit-1-c9wg5                                                1/1     Running             0          123m
pod/etcd-tesit-2-7c4bv                                                1/1     Running             0          123m
pod/etcd-tesit-3-xqqvw                                                1/1     Running             0          122m
pod/etcd-tesit-4-7rv5s                                                1/1     Running             0          122m
pod/etcd-tesit-4-gqtn8                                                0/1     ContainerCreating   0          1s
pod/etcd-tesit-5-bwfjv                                                1/1     Running             0          122m
pod/etcd-tesit-6-qx67q                                                1/1     Running             0          121m
pod/etcd-tesit-7-thh9x                                                1/1     Running             0          121m
pod/etcd-tesit-8-7lgq7                                                1/1     Running             0          120m
pod/etcd-test-22dt9                                                   1/1     Running             0          125m
pod/marketplace-operator-7bdbd77c-wv9pz                               1/1     Running             0          5h41m
pod/qe-app-registry-4dk9f                                             0/1     ImagePullBackOff    0          5h31m
pod/qe-app-registry-m62s8                                             0/1     ImagePullBackOff    0          5h44m
pod/redhat-marketplace-7r84d                                          0/1     ContainerCreating   0          1s
pod/redhat-marketplace-bdb75                                          1/1     Running             0          5h43m
pod/redhat-operators-k2vpc                                            1/1     Running             0          5h43m

NAME                                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/certified-operators            ClusterIP   172.30.181.107   <none>        50051/TCP           6h13m
service/community-operators            ClusterIP   172.30.183.191   <none>        50051/TCP           6h13m
service/etcd-tesit-1                   ClusterIP   172.30.197.212   <none>        50051/TCP           123m
service/etcd-tesit-2                   ClusterIP   172.30.8.145     <none>        50051/TCP           123m
service/etcd-tesit-3                   ClusterIP   172.30.101.53    <none>        50051/TCP           122m
service/etcd-tesit-4                   ClusterIP   172.30.17.36     <none>        50051/TCP           122m
service/etcd-tesit-5                   ClusterIP   172.30.209.0     <none>        50051/TCP           122m
service/etcd-tesit-6                   ClusterIP   172.30.168.226   <none>        50051/TCP           122m
service/etcd-tesit-7                   ClusterIP   172.30.151.35    <none>        50051/TCP           121m
service/etcd-tesit-8                   ClusterIP   172.30.207.56    <none>        50051/TCP           120m
service/etcd-test                      ClusterIP   172.30.94.24     <none>        50051/TCP           125m
service/marketplace-operator-metrics   ClusterIP   172.30.145.246   <none>        8383/TCP,8081/TCP   6h18m
service/qe-app-registry                ClusterIP   172.30.100.251   <none>        50051/TCP           5h46m
service/redhat-marketplace             ClusterIP   172.30.216.140   <none>        50051/TCP           6h13m
service/redhat-operators               ClusterIP   172.30.253.127   <none>        50051/TCP           6h13m

NAME                                   READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/marketplace-operator   1/1     1            1           6h18m

NAME                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/marketplace-operator-7bdbd77c   1         1         1       6h18m

NAME                                                                        COMPLETIONS   DURATION   AGE
job.batch/03a576bf8e62913785f8817df821a649bd2219ec8f1386cd0c0908e75abdfed   1/1           8s         59m
job.batch/24c6f4b3f063ca9db570182fb4c4784b9b9390003ed9a31ea4a0003270b1e41   1/1           10s        4h46m
job.batch/8cc7f4add666d4b2bd624f5b2ebc5f0dc940bb77327e333b208b8453f3b6423   1/1           8s         60m
job.batch/c4f671fb78d53d18fafda972f3b9d53e629a16de30d8935d4d90cf3a10e8e23   1/1           8s         4h44m
job.batch/cde075aec9bc4f68c1c637351c28307630eadde39635e491c6028c826010368   1/1           9s         4h45m
job.batch/db279975a990e2e6860fec8eb66ca4b437a990fe99c56517303ec4a8efc00a4   1/1           9s         4h45m
job.batch/eff5a04e02ff20331620c17b99eeb682586f23da740269bc78e96d58bfbff3b   1/1           27s        5h1m


monitor cpu:
https://user-images.githubusercontent.com/77608951/111125478-996f2800-85ac-11eb-9c64-b8420b3fecfb.png

There is no significant cpu consumption.

verified.

Comment 12 errata-xmlrpc 2021-07-27 22:48:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.