1903034 – OLM continuously printing debug logs

Bug 1903034 - OLM continuously printing debug logs

Summary: OLM continuously printing debug logs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	OLM
Sub Component:
Version:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Alexander Greene
QA Contact:	Jian Zhang
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1924764 (view as bug list)
Depends On:
Blocks:	1908463
TreeView+	depends on / blocked

Reported:	2020-12-01 07:29 UTC by Felipe M
Modified:	2024-03-25 17:18 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1908436 (view as bug list)
Environment:
Last Closed:	2021-02-24 15:36:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Includes files used to test effects of change (143.81 KB, application/gzip) 2020-12-09 23:13 UTC, Alexander Greene	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	operator-framework operator-lifecycle-manager pull 1901	None	closed	Bug 1903034: Reduce log noise from OLM	2021-02-09 09:10:43 UTC
Red Hat Bugzilla	1888073	medium	CLOSED	Operator controller continuously busy looping	2024-06-13 23:13:01 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:37:05 UTC

Description Felipe M 2020-12-01 07:29:02 UTC

Description of problem:
The OLM logs contain a lot of debug traces.

Version-Release number of selected component (if applicable):
OCP 4.6.4

How reproducible:
Getting the logs from the operator

Steps to Reproduce:
--

Actual results:

```
[...]
2020-11-16T14:59:38.274Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "subscription", "request": "openshift-operators/openshift-pipelines-operator-rh"}
2020-11-16T14:59:38.275Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "subscription", "request": "openshift-operators/jaeger-product"}
time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=redhat-openshift-pipelines-operator.v1.1.2 id=TqYh8 namespace=openshift-operators opgroup=global-operators phase=Succeeded
time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=elasticsearch-operator.4.6.0-202010311441.p0 id=Zkrs+ namespace=openshift-operators-redhat opgroup=openshift-operators-redhat phase=Succeeded
time="2020-11-16T14:59:38Z" level=info msg="checking elasticsearch-operator.4.6.0-202010311441.p0"
time="2020-11-16T14:59:38Z" level=info msg="checking jaeger-operator.v1.20.0"
time="2020-11-16T14:59:38Z" level=info msg="checking kiali-operator.v1.24.2"
time="2020-11-16T14:59:38Z" level=info msg="checking serverless-operator.v1.10.0"
time="2020-11-16T14:59:38Z" level=info msg="checking web-terminal.v1.1.0"
time="2020-11-16T14:59:38Z" level=info msg="checking servicemeshoperator.v2.0.0.2"
time="2020-11-16T14:59:38Z" level=info msg="checking redhat-openshift-pipelines-operator.v1.1.2"
2020-11-16T14:59:38.401Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-operators/jaeger-operator.v1.20.0"}
2020-11-16T14:59:38.435Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/servicemeshoperator.openshift-operators"}
2020-11-16T14:59:38.435Z	DEBUG	controllers.operator	reconciling operator	{"request": "/jaeger-product.openshift-operators"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-operators/web-terminal.v1.1.0"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "ocp-ops-view/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "kube-public/servicemeshoperator.v2.0.0.2"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-kube-apiserver/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.544Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/jaeger-product.openshift-operators"}
2020-11-16T14:59:38.544Z	DEBUG	controllers.operator	reconciling operator	{"request": "/kiali-ossm.openshift-operators"}
2020-11-16T14:59:38.565Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "iam-test/servicemeshoperator.v2.0.0.2"}
2020-11-16T14:59:38.609Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-network-operator/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.637Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/kiali-ossm.openshift-operators"}
2020-11-16T14:59:38.637Z	DEBUG	controllers.operator	reconciling operator	{"request": "/web-terminal.openshift-operators"}
```

Expected results:
Only non-debug logs.

Additional info:
Customer had this problem for a while and their logs are filling up quickly. They updated to 4.6.4 due to BZ#1888073 but problem persists.

Comment 8 Alexander Greene 2020-12-09 23:08:02 UTC

Hello, I am excited to provide an update and announce that a PR is available that should significantly reduce the number of logs generated by OLM. I have verified the effects of this change by conducting the following test.

TEST CASE:
Environment - A cluster with 100 namespaces and a ClusterScoped operator installed. OLM will copy the ClusterScoped operator into everynamespace, leaving OLM with 100 CSVs to reconcile.
Data Collection Proceedure -  I tracked the size of the OLM file over 10 minutes.

Case 1: No Fix
```
120K olm.log at 0 minutes
132K olm.log at 1 minutes
364K olm.log at 2 minutes
456K olm.log at 3 minutes
520K olm.log at 4 minutes
540K olm.log at 5 minutes
540K olm.log at 6 minutes
540K olm.log at 7 minutes
548K olm.log at 8 minutes
548K olm.log at 9 minutes
```

I reviewed the logs generated by OLM and found that three messages were constantly being logged:
1. DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion"... which appeared 604 times
2. level=info msg="csv in operatorgroup" ... which appeared 561 times
3. level=info msg="syncing APIService" .. which appeared 113 times

CASE 2: Includes Fix
For the 1st log mentioned above, I found that three new controllers recently introduced to OLM used a logger that was configured for developer mode which I have disabled in the provided fix.
For the 2nd and 3rd log mentioned above, I reclassified them as debugs logs since they do not provide information that is useful outside of debugging.
The changes above reduced the size of the log file significantly:
```
40K olm.log at 0 minutes
44K olm.log at 1 minutes
76K olm.log at 2 minutes
76K olm.log at 3 minutes
76K olm.log at 4 minutes
80K olm.log at 5 minutes
80K olm.log at 6 minutes
80K olm.log at 7 minutes
80K olm.log at 8 minutes
80K olm.log at 9 minutes
```

Comment 9 Alexander Greene 2020-12-09 23:13:58 UTC

Created attachment 1738087 [details]
Includes files used to test effects of change

Please look at the attached olm-log-noise-bug.tar.gz for files used in performing this testcase along with the results.

Comment 10 Alexander Greene 2020-12-09 23:27:42 UTC

Follow up, I also installed two additional ClusterScoped operators on a cluster with the fix and the additional operators had a minimal effect on file size:

```
$ oc get csvs -n operators
NAME                         DISPLAY                 VERSION   REPLACES   PHASE
chaosblade-operator.v0.5.1   Chaosblade Operator     0.5.1                Succeeded
credstash-operator.v1.13.0   credstash-operator      1.13.0               Succeeded
sparkoperator.v1.0.2         Apache Spark Operator   1.0.2                Succeeded

$ oc logs -n olm olm-operator-865b874cbc-gd6tz  > olm.log

$ ls -sh olm.log 
156K olm.log
```

Comment 12 Jian Zhang 2020-12-14 07:44:10 UTC

Cluster version is 4.7.0-0.nightly-2020-12-13-224840
[root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-66ccb94685-cpfqj -- olm --version
OLM version: 0.17.0
git commit: 4b66803055a8ab611447c33ed86e755ad39cb313


1, Install two cluster-scoped operators, and there are no DEBUG level logs in the olm logs. 

[root@preserve-olm-env data]# oc get sub -A
NAMESPACE             NAME                              PACKAGE                           SOURCE                CHANNEL
openshift-operators   akka-cluster-operator-certified   akka-cluster-operator-certified   certified-operators   beta
openshift-operators   etcd                              etcd                              community-operators   clusterwide-alpha
[root@preserve-olm-env data]# oc logs catalog-operator-66ccb94685-cpfqj > olm.1214
[root@preserve-olm-env data]# ls -lh olm.1214
-rw-r--r--. 1 root root 81K Dec 14 07:29 olm.1214
[root@preserve-olm-env data]# 
[root@preserve-olm-env data]# oc get pods
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-66ccb94685-cpfqj   1/1     Running   0          72m
olm-operator-d6dc885f4-lrx28        1/1     Running   0          72m
packageserver-c4dcc95bb-22wfp       1/1     Running   0          65m
packageserver-c4dcc95bb-rr67k       1/1     Running   0          65m
[root@preserve-olm-env data]# cat olm.1214 |grep -i debug

LGTM, verify it.

Comment 14 Anik 2021-02-09 18:57:12 UTC

*** Bug 1924764 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2021-02-24 15:36:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.