Bug 1903034 - OLM continuously printing debug logs
Summary: OLM continuously printing debug logs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.6.z
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.0
Assignee: Alexander Greene
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1924764 (view as bug list)
Depends On:
Blocks: 1908463
TreeView+ depends on / blocked
 
Reported: 2020-12-01 07:29 UTC by Felipe M
Modified: 2021-02-24 15:37 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1908436 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:36:34 UTC
Target Upstream Version:


Attachments (Terms of Use)
Includes files used to test effects of change (143.81 KB, application/gzip)
2020-12-09 23:13 UTC, Alexander Greene
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github operator-framework operator-lifecycle-manager pull 1901 0 None closed Bug 1903034: Reduce log noise from OLM 2021-02-09 09:10:43 UTC
Red Hat Bugzilla 1888073 0 medium CLOSED Operator controller continuously busy looping 2021-02-24 15:26:09 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:37:05 UTC

Description Felipe M 2020-12-01 07:29:02 UTC
Description of problem:
The OLM logs contain a lot of debug traces.

Version-Release number of selected component (if applicable):
OCP 4.6.4

How reproducible:
Getting the logs from the operator

Steps to Reproduce:
--

Actual results:

```
[...]
2020-11-16T14:59:38.274Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "subscription", "request": "openshift-operators/openshift-pipelines-operator-rh"}
2020-11-16T14:59:38.275Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "subscription", "request": "openshift-operators/jaeger-product"}
time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=redhat-openshift-pipelines-operator.v1.1.2 id=TqYh8 namespace=openshift-operators opgroup=global-operators phase=Succeeded
time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=elasticsearch-operator.4.6.0-202010311441.p0 id=Zkrs+ namespace=openshift-operators-redhat opgroup=openshift-operators-redhat phase=Succeeded
time="2020-11-16T14:59:38Z" level=info msg="checking elasticsearch-operator.4.6.0-202010311441.p0"
time="2020-11-16T14:59:38Z" level=info msg="checking jaeger-operator.v1.20.0"
time="2020-11-16T14:59:38Z" level=info msg="checking kiali-operator.v1.24.2"
time="2020-11-16T14:59:38Z" level=info msg="checking serverless-operator.v1.10.0"
time="2020-11-16T14:59:38Z" level=info msg="checking web-terminal.v1.1.0"
time="2020-11-16T14:59:38Z" level=info msg="checking servicemeshoperator.v2.0.0.2"
time="2020-11-16T14:59:38Z" level=info msg="checking redhat-openshift-pipelines-operator.v1.1.2"
2020-11-16T14:59:38.401Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-operators/jaeger-operator.v1.20.0"}
2020-11-16T14:59:38.435Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/servicemeshoperator.openshift-operators"}
2020-11-16T14:59:38.435Z	DEBUG	controllers.operator	reconciling operator	{"request": "/jaeger-product.openshift-operators"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-operators/web-terminal.v1.1.0"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "ocp-ops-view/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "kube-public/servicemeshoperator.v2.0.0.2"}
2020-11-16T14:59:38.520Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-kube-apiserver/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.544Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/jaeger-product.openshift-operators"}
2020-11-16T14:59:38.544Z	DEBUG	controllers.operator	reconciling operator	{"request": "/kiali-ossm.openshift-operators"}
2020-11-16T14:59:38.565Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "iam-test/servicemeshoperator.v2.0.0.2"}
2020-11-16T14:59:38.609Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion", "request": "openshift-network-operator/kiali-operator.v1.24.2"}
2020-11-16T14:59:38.637Z	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "operator", "request": "/kiali-ossm.openshift-operators"}
2020-11-16T14:59:38.637Z	DEBUG	controllers.operator	reconciling operator	{"request": "/web-terminal.openshift-operators"}
```

Expected results:
Only non-debug logs.

Additional info:
Customer had this problem for a while and their logs are filling up quickly. They updated to 4.6.4 due to BZ#1888073 but problem persists.

Comment 8 Alexander Greene 2020-12-09 23:08:02 UTC
Hello, I am excited to provide an update and announce that a PR is available that should significantly reduce the number of logs generated by OLM. I have verified the effects of this change by conducting the following test.

TEST CASE:
Environment - A cluster with 100 namespaces and a ClusterScoped operator installed. OLM will copy the ClusterScoped operator into everynamespace, leaving OLM with 100 CSVs to reconcile.
Data Collection Proceedure -  I tracked the size of the OLM file over 10 minutes.

Case 1: No Fix
```
120K olm.log at 0 minutes
132K olm.log at 1 minutes
364K olm.log at 2 minutes
456K olm.log at 3 minutes
520K olm.log at 4 minutes
540K olm.log at 5 minutes
540K olm.log at 6 minutes
540K olm.log at 7 minutes
548K olm.log at 8 minutes
548K olm.log at 9 minutes
```

I reviewed the logs generated by OLM and found that three messages were constantly being logged:
1. DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "clusterserviceversion"... which appeared 604 times
2. level=info msg="csv in operatorgroup" ... which appeared 561 times
3. level=info msg="syncing APIService" .. which appeared 113 times

CASE 2: Includes Fix
For the 1st log mentioned above, I found that three new controllers recently introduced to OLM used a logger that was configured for developer mode which I have disabled in the provided fix.
For the 2nd and 3rd log mentioned above, I reclassified them as debugs logs since they do not provide information that is useful outside of debugging.
The changes above reduced the size of the log file significantly:
```
40K olm.log at 0 minutes
44K olm.log at 1 minutes
76K olm.log at 2 minutes
76K olm.log at 3 minutes
76K olm.log at 4 minutes
80K olm.log at 5 minutes
80K olm.log at 6 minutes
80K olm.log at 7 minutes
80K olm.log at 8 minutes
80K olm.log at 9 minutes
```

Comment 9 Alexander Greene 2020-12-09 23:13:58 UTC
Created attachment 1738087 [details]
Includes files used to test effects of change

Please look at the attached olm-log-noise-bug.tar.gz for files used in performing this testcase along with the results.

Comment 10 Alexander Greene 2020-12-09 23:27:42 UTC
Follow up, I also installed two additional ClusterScoped operators on a cluster with the fix and the additional operators had a minimal effect on file size:

```
$ oc get csvs -n operators
NAME                         DISPLAY                 VERSION   REPLACES   PHASE
chaosblade-operator.v0.5.1   Chaosblade Operator     0.5.1                Succeeded
credstash-operator.v1.13.0   credstash-operator      1.13.0               Succeeded
sparkoperator.v1.0.2         Apache Spark Operator   1.0.2                Succeeded

$ oc logs -n olm olm-operator-865b874cbc-gd6tz  > olm.log

$ ls -sh olm.log 
156K olm.log
```

Comment 12 Jian Zhang 2020-12-14 07:44:10 UTC
Cluster version is 4.7.0-0.nightly-2020-12-13-224840
[root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-66ccb94685-cpfqj -- olm --version
OLM version: 0.17.0
git commit: 4b66803055a8ab611447c33ed86e755ad39cb313


1, Install two cluster-scoped operators, and there are no DEBUG level logs in the olm logs. 

[root@preserve-olm-env data]# oc get sub -A
NAMESPACE             NAME                              PACKAGE                           SOURCE                CHANNEL
openshift-operators   akka-cluster-operator-certified   akka-cluster-operator-certified   certified-operators   beta
openshift-operators   etcd                              etcd                              community-operators   clusterwide-alpha
[root@preserve-olm-env data]# oc logs catalog-operator-66ccb94685-cpfqj > olm.1214
[root@preserve-olm-env data]# ls -lh olm.1214
-rw-r--r--. 1 root root 81K Dec 14 07:29 olm.1214
[root@preserve-olm-env data]# 
[root@preserve-olm-env data]# oc get pods
NAME                                READY   STATUS    RESTARTS   AGE
catalog-operator-66ccb94685-cpfqj   1/1     Running   0          72m
olm-operator-d6dc885f4-lrx28        1/1     Running   0          72m
packageserver-c4dcc95bb-22wfp       1/1     Running   0          65m
packageserver-c4dcc95bb-rr67k       1/1     Running   0          65m
[root@preserve-olm-env data]# cat olm.1214 |grep -i debug

LGTM, verify it.

Comment 14 Anik 2021-02-09 18:57:12 UTC
*** Bug 1924764 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2021-02-24 15:36:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.