Bug 1886900 - Cluster-version operator fills logs with "Manifest: ..." spew
Summary: Cluster-version operator fills logs with "Manifest: ..." spew
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
Depends On:
Blocks: 1886947
TreeView+ depends on / blocked
 
Reported: 2020-10-09 17:19 UTC by W. Trevor King
Modified: 2021-02-24 15:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-02-24 15:24:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 469 0 None closed Bug 1886900: pkg/cvo/sync_worker: Drop "Manifest: ..." logging 2021-01-19 06:10:49 UTC
Github openshift cluster-version-operator pull 472 0 None closed Bug 1886900: pkg/cvo/sync_worker: Demote "Status change: ..." logging to v6 2021-01-19 06:10:49 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:25:21 UTC

Description W. Trevor King 2020-10-09 17:19:57 UTC
Since bug 1873900's [1], we've had lots of log noise like:

  I1009 15:40:43.074056       1 sync_worker.go:702] Manifest: {"apiVersion...

This line is descended from logging we grew way back in [2].  But we triggered it recently with the bump to v5 in [1].  We should drop the line, because it's noisy spew in the log files, and we can get the manifest content via:

  $ oc adm release extract --to=manifests $PULLSPEC

[1]: https://github.com/openshift/cluster-version-operator/pull/448
[2]: https://github.com/openshift/cluster-version-operator/pull/14

Comment 1 W. Trevor King 2020-10-09 22:06:00 UTC
Verfication notes, comparing a 4.5 job [1] with a recent 4.6 job [2].  Here's 4.5:

$ mkdir /tmp/4.5
$ cd /tmp/4.5
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/1314579507780259840/artifacts/e2e-aws/must-gather.tar | tar xz
$ mv */namespaces .
'quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-7b377b7e0f2d1d27fd61af0d6386b657a9c8201a9935dbffba5b8bd5518787b0/namespaces' -> './namespaces'
$ ls -lS $(find namespaces -name '*.log') | head
-rwxr-xr-x. 1 wking wking 5270980 Oct  9 08:56 namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-162-78.us-west-1.compute.internal/kube-controller-manager/kube-controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking 3333279 Oct  9 08:55 namespaces/openshift-etcd-operator/pods/etcd-operator-589f65dfcf-zbl8t/operator/operator/logs/current.log
-rwxr-xr-x. 1 wking wking 3080766 Oct  9 08:58 namespaces/openshift-monitoring/pods/prometheus-k8s-1/prometheus/prometheus/logs/current.log
-rwxr-xr-x. 1 wking wking 2622626 Oct  9 08:55 namespaces/openshift-cluster-version/pods/cluster-version-operator-c457ddd7d-lzd86/cluster-version-operator/cluster-version-operator/logs/current.log
-rwxr-xr-x. 1 wking wking 1365449 Oct  9 08:55 namespaces/openshift-cluster-version/pods/cluster-version-operator-c457ddd7d-lzd86/cluster-version-operator/cluster-version-operator/logs/previous.log
-rwxr-xr-x. 1 wking wking 1120686 Oct  9 08:59 namespaces/openshift-controller-manager/pods/controller-manager-4v69t/controller-manager/controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking 1020273 Oct  9 08:57 namespaces/openshift-kube-scheduler/pods/openshift-kube-scheduler-ip-10-0-162-78.us-west-1.compute.internal/kube-scheduler/kube-scheduler/logs/current.log
-rwxr-xr-x. 1 wking wking 1017372 Oct  9 08:56 namespaces/openshift-image-registry/pods/image-registry-dcb8964bf-5724k/registry/registry/logs/current.log
-rwxr-xr-x. 1 wking wking 1013980 Oct  9 08:55 namespaces/openshift-machine-api/pods/machine-api-controllers-7c84d85f78-5jtmc/nodelink-controller/nodelink-controller/logs/current.log
-rwxr-xr-x. 1 wking wking  941678 Oct  9 08:59 namespaces/openshift-operator-lifecycle-manager/pods/packageserver-5469f4cfc5-pkp7s/packageserver/packageserver/logs/current.log

So we were noisy, especially if you add those two log files together, but comfortably behind kube-controller-manager.  With 4.6's bump to v5 [3], the 4.6 job had:

$ mkdir /tmp/4.6
$ cd /tmp/4.6
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1314591016124157952/artifacts/e2e-aws/must-gather.tar | tar xz
$ mv */namespaces .
‘quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-fa63640328598f72567027e9cd0d50f00d4ec058dacc61f3be3c6cca7fbefac5/namespaces’ -> ‘./namespaces’
$ ls -lS $(find namespaces -name '*.log') | head
-rwxr-xr-x. 1 wking wking 38577943 Oct  9 09:40 namespaces/openshift-cluster-version/pods/cluster-version-operator-569b894c55-c6shd/cluster-version-operator/cluster-version-operator/logs/current.log
-rwxr-xr-x. 1 wking wking  5352172 Oct  9 09:41 namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ip-10-0-129-107.us-west-2.compute.internal/kube-controller-manager/kube-controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking  4397464 Oct  9 09:41 namespaces/openshift-operator-lifecycle-manager/pods/olm-operator-546c5b9d98-t4nsq/olm-operator/olm-operator/logs/current.log
-rwxr-xr-x. 1 wking wking  1165889 Oct  9 09:41 namespaces/openshift-controller-manager/pods/controller-manager-wmxqk/controller-manager/controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking   963737 Oct  9 09:41 namespaces/openshift-image-registry/pods/image-registry-58c49c4bf7-46krz/registry/registry/logs/current.log
-rwxr-xr-x. 1 wking wking   957577 Oct  9 09:41 namespaces/openshift-kube-apiserver/pods/kube-apiserver-ip-10-0-129-107.us-west-2.compute.internal/kube-apiserver/kube-apiserver/logs/current.log
-rwxr-xr-x. 1 wking wking   955558 Oct  9 09:40 namespaces/openshift-machine-api/pods/machine-api-controllers-57968448cb-v9dmz/nodelink-controller/nodelink-controller/logs/current.log
-rwxr-xr-x. 1 wking wking   936896 Oct  9 09:41 namespaces/openshift-operator-lifecycle-manager/pods/packageserver-64587ffd78-tcprx/packageserver/packageserver/logs/current.log
-rwxr-xr-x. 1 wking wking   911007 Oct  9 09:41 namespaces/openshift-operator-lifecycle-manager/pods/packageserver-64587ffd78-4lk6s/packageserver/packageserver/logs/current.log
-rwxr-xr-x. 1 wking wking   881131 Oct  9 09:41 namespaces/openshift-sdn/pods/ovs-j6d68/openvswitch/openvswitch/logs/current.log

Yikes :p.  Comparing noisy lines:

$ cd /tmp
$ for X in 4.5 4.6; do echo "$X"; sed -n 's/.* \([^ ]*[.]go:[0-9]*\)].*/\1/p' "${X}"/namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | sort | uniq -c | sort -n | tail -n7; done
4.5
    109 cvo.go:463
    140 reflector.go:153
    144 reflector.go:185
    483 task_graph.go:575
   4102 sync_worker.go:653
   4102 sync_worker.go:666
   4310 request.go:565
4.6
   1234 sync_worker.go:376
   4927 sync_worker.go:464
   5374 request.go:581
   5574 sync_worker.go:458
   6792 sync_worker.go:714
   6796 sync_worker.go:701
   6796 sync_worker.go:702

Lots of new sync-worker noise.  Getting the commit for the 4.6 job:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1314591016124157952/artifacts/release-images-latest/release-images-latest | jq -r '.spec.tags[] | select(.name == "cluster-version-operator").annotations["io.openshift.build.commit.id"]'
2c9d4a2200216ad417206b24e7bf114ec22f4422

And checking the noisy lines and the ones I've dropped (there is some overlap):

$ git cat-file -p 2c9d4a2200:pkg/cvo/sync_worker.go | grep -n . | grep '^\(376\|464458\|701\|702\|714\):\|"Manifest:\|"Status change:\|"Status report channel was full'
376:				klog.V(5).Infof("Dropping status report from earlier in sync loop")
464:			klog.Infof("Status report channel was full %#v", update)
701:			klog.V(4).Infof("Running sync for %s", task)
702:			klog.V(5).Infof("Manifest: %s", string(task.Manifest.Raw))
714:			klog.V(4).Infof("Done syncing for %s", task)

701/714 are important for debugging stuck manifests, so we want to keep them.  In 4.7 presubmits, with both my dropping PRs in place [4]:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/472/pull-ci-openshift-cluster-version-operator-master-e2e/1314664392163332096/artifacts/e2e/gather-must-gather/must-gather.tar | tar xz
$ mv */namespaces .
‘registry-build01-ci-openshift-org-ci-op-t0srpcxz-stable-sha256-34e15c6ae8185a8824fc6db4e95552b168a8256987a7aca5d90936a92768f398/namespaces’ -> ‘./namespaces’
$ ls -lS $(find namespaces -name '*.log') | head
-rwxr-xr-x. 1 wking wking 6194477 Oct  9 14:42 namespaces/openshift-cluster-version/pods/cluster-version-operator-69687954f8-2bwsc/cluster-version-operator/cluster-version-operator/logs/current.log
-rwxr-xr-x. 1 wking wking 5469366 Oct  9 14:43 namespaces/openshift-kube-controller-manager/pods/kube-controller-manager-ci-op-t0srpcxz-11cd2-mj7fv-master-1/kube-controller-manager/kube-controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking 3992209 Oct  9 14:44 namespaces/openshift-operator-lifecycle-manager/pods/olm-operator-7cdbc86f56-fmkk5/olm-operator/olm-operator/logs/current.log
-rwxr-xr-x. 1 wking wking 2267782 Oct  9 14:43 namespaces/openshift-kube-apiserver/pods/kube-apiserver-ci-op-t0srpcxz-11cd2-mj7fv-master-0/kube-apiserver/kube-apiserver/logs/current.log
-rwxr-xr-x. 1 wking wking 1261733 Oct  9 14:42 namespaces/openshift-authentication-operator/pods/authentication-operator-df94df968-pstd2/authentication-operator/authentication-operator/logs/current.log
-rwxr-xr-x. 1 wking wking 1213395 Oct  9 14:42 namespaces/openshift-machine-api/pods/machine-api-controllers-84fcd67f76-5gs9d/nodelink-controller/nodelink-controller/logs/current.log
-rwxr-xr-x. 1 wking wking 1116802 Oct  9 14:44 namespaces/openshift-controller-manager/pods/controller-manager-2sdtp/controller-manager/controller-manager/logs/current.log
-rwxr-xr-x. 1 wking wking 1098037 Oct  9 14:44 namespaces/openshift-operator-lifecycle-manager/pods/packageserver-6fb5f8db49-q9lgt/packageserver/packageserver/logs/current.log
-rwxr-xr-x. 1 wking wking 1051920 Oct  9 14:43 namespaces/openshift-image-registry/pods/image-registry-5cd54574fc-v5rw9/registry/registry/logs/current.log
-rwxr-xr-x. 1 wking wking 1032457 Oct  9 14:44 namespaces/openshift-operator-lifecycle-manager/pods/packageserver-6fb5f8db49-hbpp7/packageserver/packageserver/logs/current.log
$ sed -n 's/.* \([^ ]*[.]go:[0-9]*\)].*/\1/p' namespaces/openshift-cluster-version/pods/cluster-version-operator-*/cluster-version-operator/cluster-version-operator/logs/current.log | sort | uniq -c | sort -n | tail -n7
    502 task_graph.go:483
    658 warnings.go:67
   1370 task_graph.go:486
   2765 sync_worker.go:376
   6570 request.go:581
   8375 sync_worker.go:713
   8390 sync_worker.go:701
$ grep -n . pkg/cvo/sync_worker.go | grep '^376:'
376:				klog.V(5).Infof("Dropping status report from earlier in sync loop")

Hmm.  We're at least back in the pack of "normal-sized logs", and I'm not convinced we want to drop that one yet.  I think that's a sufficient reduction to mark this verified as it stands.

[1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/1314579507780259840
[2]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.6/1314591016124157952
[3]: https://github.com/openshift/cluster-version-operator/pull/448
[4]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/472/pull-ci-openshift-cluster-version-operator-master-e2e/1314664392163332096

Comment 3 Lalatendu Mohanty 2020-10-12 11:22:19 UTC
Increasing the severity of the bug to high as this increase in log size can lead to capacity issues.

Comment 4 liujia 2020-10-26 08:03:29 UTC
Verified on 4.7.0-0.nightly-2020-10-24-155529

# cat cvo.log |grep "Done syncing for" | wc -l
9984
# cat cvo.log |grep "Manifest:" | wc -l
0
# cat cvo.log |grep "Running sync for" | wc -l
9984

Comment 7 errata-xmlrpc 2021-02-24 15:24:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.