Bug 2034493
Summary: | Change cluster version operator log level | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | yaoli | |
Component: | Cluster Version Operator | Assignee: | W. Trevor King <wking> | |
Status: | CLOSED ERRATA | QA Contact: | liujia <jiajliu> | |
Severity: | medium | Docs Contact: | ||
Priority: | low | |||
Version: | 4.8 | CC: | aos-bugs, jiajliu, rsandu, sdodson, wking | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | 4.10.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2038275 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:35:34 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2038275 |
Description
yaoli
2021-12-21 06:37:13 UTC
There is no knob for this today; growing one would be an RFE [1]. We have had bugs in the past around blanket reductions in CVO log volume, e.g. bug 1984414 and bug 1886900. Repeating the analysis from bug 1886900 on a recent 4.10 CI job [2]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws/1473280349592096768/artifacts/e2e-aws/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-57cb78fc99-4kswj_cluster-version-operator.log | sed -n 's/.* \([^ ]*[.]go:[0-9]*\)].*/\1/p' | sort | uniq -c | sort -n | tail -n7 184 task_graph.go:474 242 apps.go:38 760 sync_worker.go:393 1053 task_graph.go:477 7569 request.go:597 9940 sync_worker.go:771 9941 sync_worker.go:759 Getting the relevant CVO commit: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws/1473280349592096768/artifacts/e2e-aws/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-57cb78fc99-4kswj_cluster-version-operator.log | head -n1 I1221 13:29:51.914804 1 start.go:23] ClusterVersionOperator 4.10.0-202112180033.p0.gede5b83.assembly.stream-ede5b83 Checking those noisy entries: $ git cat-file -p ede5b83:pkg/cvo/sync_worker.go | grep -n . | grep '^\(759\|771\):' 759: klog.V(4).Infof("Running sync for %s", task) 771: klog.V(4).Infof("Done syncing for %s", task) So logging the start and end of each manifest task is nothing new for us, and it's useful to confirm that the CVO is walking the manifest task-node graph appropriately. $ git cat-file -p ede5b83:./vendor/k8s.io/client-go/rest/request.go | grep -n . | grep -B7 '^597:' 589: switch { 590: case len(retryInfo) > 0: 591: message = fmt.Sprintf("Waited for %v, %s - request: %s:%s", latency, retryInfo, r.verb, r.URL().String()) 592: default: 593: message = fmt.Sprintf("Waited for %v due to client-side throttling, not priority and fairness, request: %s:%s", latency, r.verb, r.URL().String()) 594: } 596: if latency > longThrottleLatency { 597: klog.V(3).Info(message) We've received requests to reduce that client-side throttling volume in the past, and are completely fine with dropping it, but there's currently no library knob that allows us to drop it [3]. So options: a. File an RFE ticket asking for a CVO knob to allow folks to tune log verbosity. b. File an upstream RFE asking for configurable logging in k8s.io/client-go/rest, to unblock [3]. c. Comment in this bug explaining that in your particular case, the unwanted volume was really from some other type of entry that we didn't see as much of in the CI run I picked for analysis. d. I'm sure there are other options besides the three I'm listing above; feel free to get creative and not be constrained to the option's I'm presenting ;). [1]: https://issues.redhat.com/projects/RFE/issues [2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.10-e2e-aws/1473280349592096768 [3]: https://issues.redhat.com/browse/OTA-414 Hello Thank you for your confirm, I will open a RFE for the question, because of the customer feed back them ocp 4.8 cluster one day will produce 800 000 logs from the cluster version operator, those logs will make the EFK logging system pressureļ¼so I think only keep the Error level log can mitigate the issue. Thank you, I will close the bug and open a RFE Regards Li Yao Scott had the good idea of adjusting the default log level to work around the klog.V(3) noise, so I'll re-open while we think about that. Reproduced on 4.10.0-0.nightly-2022-01-05-181126 # ./oc -n openshift-cluster-version get deployment cluster-version-operator -oyaml|grep -e "--v" - --v=5 Cut out 1 hour's logs from cvo log for statistics: # ./oc -n openshift-cluster-version logs cluster-version-operator-bbfbc5fc5-b889s|grep "I0107 07:[0-5][0-9]" > cvo1.log # ll cvo1.log -rw-r--r--. 1 root root 6755882 Jan 7 07:50 cvo1.log # grep "due to client-side throttling" cvo1.log |wc -l 9590 Verified on 4.10.0-0.nightly-2022-01-06-183041(Also cutting out 1 hour's logs from cvo log for statistics.) 1. The log level is updated to --v=2 # ./oc -n openshift-cluster-version get deployment cluster-version-operator -oyaml|grep -e "--v" - --v=2 2. The total size reduced about 34% for 1 hour's log # ./oc -n openshift-cluster-version logs cluster-version-operator-5d97cf9bd8-h2m9t|grep "I0107 07:[0-5][0-9]" > cvo2.log # ll cvo2.log -rw-r--r--. 1 root root 4486883 Jan 7 07:50 cvo2.log 3. Most of client-side throttling noise dropped. # grep "due to client-side throttling" cvo2.log |wc -l 14 *** Bug 1997508 has been marked as a duplicate of this bug. *** For folks interested in backports, click 'Show advanced fields', and from Blocks head to bug 2038275 (shipped in 4.9.15). From there, follow Blocks again to bug 2038936 (shipped in 4.8.28). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |