Bug 1903034
| Summary: | OLM continuously printing debug logs | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Felipe M <fmarting> | ||||
| Component: | OLM | Assignee: | Alexander Greene <agreene> | ||||
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | dageoffr, dahernan, krizza | ||||
| Version: | 4.6.z | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.7.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1908436 (view as bug list) | Environment: | |||||
| Last Closed: | 2021-02-24 15:36:34 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1908463 | ||||||
| Attachments: |
|
||||||
Hello, I am excited to provide an update and announce that a PR is available that should significantly reduce the number of logs generated by OLM. I have verified the effects of this change by conducting the following test.
TEST CASE:
Environment - A cluster with 100 namespaces and a ClusterScoped operator installed. OLM will copy the ClusterScoped operator into everynamespace, leaving OLM with 100 CSVs to reconcile.
Data Collection Proceedure - I tracked the size of the OLM file over 10 minutes.
Case 1: No Fix
```
120K olm.log at 0 minutes
132K olm.log at 1 minutes
364K olm.log at 2 minutes
456K olm.log at 3 minutes
520K olm.log at 4 minutes
540K olm.log at 5 minutes
540K olm.log at 6 minutes
540K olm.log at 7 minutes
548K olm.log at 8 minutes
548K olm.log at 9 minutes
```
I reviewed the logs generated by OLM and found that three messages were constantly being logged:
1. DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion"... which appeared 604 times
2. level=info msg="csv in operatorgroup" ... which appeared 561 times
3. level=info msg="syncing APIService" .. which appeared 113 times
CASE 2: Includes Fix
For the 1st log mentioned above, I found that three new controllers recently introduced to OLM used a logger that was configured for developer mode which I have disabled in the provided fix.
For the 2nd and 3rd log mentioned above, I reclassified them as debugs logs since they do not provide information that is useful outside of debugging.
The changes above reduced the size of the log file significantly:
```
40K olm.log at 0 minutes
44K olm.log at 1 minutes
76K olm.log at 2 minutes
76K olm.log at 3 minutes
76K olm.log at 4 minutes
80K olm.log at 5 minutes
80K olm.log at 6 minutes
80K olm.log at 7 minutes
80K olm.log at 8 minutes
80K olm.log at 9 minutes
```
Created attachment 1738087 [details]
Includes files used to test effects of change
Please look at the attached olm-log-noise-bug.tar.gz for files used in performing this testcase along with the results.
Follow up, I also installed two additional ClusterScoped operators on a cluster with the fix and the additional operators had a minimal effect on file size: ``` $ oc get csvs -n operators NAME DISPLAY VERSION REPLACES PHASE chaosblade-operator.v0.5.1 Chaosblade Operator 0.5.1 Succeeded credstash-operator.v1.13.0 credstash-operator 1.13.0 Succeeded sparkoperator.v1.0.2 Apache Spark Operator 1.0.2 Succeeded $ oc logs -n olm olm-operator-865b874cbc-gd6tz > olm.log $ ls -sh olm.log 156K olm.log ``` Cluster version is 4.7.0-0.nightly-2020-12-13-224840 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-66ccb94685-cpfqj -- olm --version OLM version: 0.17.0 git commit: 4b66803055a8ab611447c33ed86e755ad39cb313 1, Install two cluster-scoped operators, and there are no DEBUG level logs in the olm logs. [root@preserve-olm-env data]# oc get sub -A NAMESPACE NAME PACKAGE SOURCE CHANNEL openshift-operators akka-cluster-operator-certified akka-cluster-operator-certified certified-operators beta openshift-operators etcd etcd community-operators clusterwide-alpha [root@preserve-olm-env data]# oc logs catalog-operator-66ccb94685-cpfqj > olm.1214 [root@preserve-olm-env data]# ls -lh olm.1214 -rw-r--r--. 1 root root 81K Dec 14 07:29 olm.1214 [root@preserve-olm-env data]# [root@preserve-olm-env data]# oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-66ccb94685-cpfqj 1/1 Running 0 72m olm-operator-d6dc885f4-lrx28 1/1 Running 0 72m packageserver-c4dcc95bb-22wfp 1/1 Running 0 65m packageserver-c4dcc95bb-rr67k 1/1 Running 0 65m [root@preserve-olm-env data]# cat olm.1214 |grep -i debug LGTM, verify it. *** Bug 1924764 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Description of problem: The OLM logs contain a lot of debug traces. Version-Release number of selected component (if applicable): OCP 4.6.4 How reproducible: Getting the logs from the operator Steps to Reproduce: -- Actual results: ``` [...] 2020-11-16T14:59:38.274Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "subscription", "request": "openshift-operators/openshift-pipelines-operator-rh"} 2020-11-16T14:59:38.275Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "subscription", "request": "openshift-operators/jaeger-product"} time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=redhat-openshift-pipelines-operator.v1.1.2 id=TqYh8 namespace=openshift-operators opgroup=global-operators phase=Succeeded time="2020-11-16T14:59:38Z" level=info msg="csv in operatorgroup" csv=elasticsearch-operator.4.6.0-202010311441.p0 id=Zkrs+ namespace=openshift-operators-redhat opgroup=openshift-operators-redhat phase=Succeeded time="2020-11-16T14:59:38Z" level=info msg="checking elasticsearch-operator.4.6.0-202010311441.p0" time="2020-11-16T14:59:38Z" level=info msg="checking jaeger-operator.v1.20.0" time="2020-11-16T14:59:38Z" level=info msg="checking kiali-operator.v1.24.2" time="2020-11-16T14:59:38Z" level=info msg="checking serverless-operator.v1.10.0" time="2020-11-16T14:59:38Z" level=info msg="checking web-terminal.v1.1.0" time="2020-11-16T14:59:38Z" level=info msg="checking servicemeshoperator.v2.0.0.2" time="2020-11-16T14:59:38Z" level=info msg="checking redhat-openshift-pipelines-operator.v1.1.2" 2020-11-16T14:59:38.401Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "openshift-operators/jaeger-operator.v1.20.0"} 2020-11-16T14:59:38.435Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/servicemeshoperator.openshift-operators"} 2020-11-16T14:59:38.435Z DEBUG controllers.operator reconciling operator {"request": "/jaeger-product.openshift-operators"} 2020-11-16T14:59:38.520Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "openshift-operators/web-terminal.v1.1.0"} 2020-11-16T14:59:38.520Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "ocp-ops-view/kiali-operator.v1.24.2"} 2020-11-16T14:59:38.520Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "kube-public/servicemeshoperator.v2.0.0.2"} 2020-11-16T14:59:38.520Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "openshift-kube-apiserver/kiali-operator.v1.24.2"} 2020-11-16T14:59:38.544Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/jaeger-product.openshift-operators"} 2020-11-16T14:59:38.544Z DEBUG controllers.operator reconciling operator {"request": "/kiali-ossm.openshift-operators"} 2020-11-16T14:59:38.565Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "iam-test/servicemeshoperator.v2.0.0.2"} 2020-11-16T14:59:38.609Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "clusterserviceversion", "request": "openshift-network-operator/kiali-operator.v1.24.2"} 2020-11-16T14:59:38.637Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "operator", "request": "/kiali-ossm.openshift-operators"} 2020-11-16T14:59:38.637Z DEBUG controllers.operator reconciling operator {"request": "/web-terminal.openshift-operators"} ``` Expected results: Only non-debug logs. Additional info: Customer had this problem for a while and their logs are filling up quickly. They updated to 4.6.4 due to BZ#1888073 but problem persists.