Bug 1984414 - Excessive resource diff logging during updates
Summary: Excessive resource diff logging during updates
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.0
Assignee: Vadim Rutkovsky
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-21 11:55 UTC by Arjun Naik
Modified: 2021-10-18 17:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:40:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 628 0 None open Bug 1984414: Log resource diffs on update only in reconcile mode 2021-07-21 12:01:45 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:40:43 UTC

Description Arjun Naik 2021-07-21 11:55:36 UTC
Description of problem:
Due to incorrect diffing in the CVO payload resources are constantly updated with a log of the diff between the expected resource in the payload and the actual resource on cluster. The resource diffs are also logged when there is an expected change during updates. This makes the logs less useful 


How reproducible:
Update to 4.8.0 from 4.7.20

Actual results:
The CVO logs contains a lot of diff lines

I0708 19:43:34.013506       1 generic.go:81] Updating PrometheusRule openshift-cluster-version/cluster-version-operator due to diff:   &unstructured.Unstructured{
I0708 19:43:34.128546       1 apiext.go:66] Updating CRD clusterresourcequotas.quota.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.183628       1 apiext.go:66] Updating CRD rolebindingrestrictions.authorization.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.224653       1 apiext.go:66] Updating CRD operatorhubs.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.267313       1 apiext.go:66] Updating CRD rangeallocations.security.internal.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.333429       1 apiext.go:66] Updating CRD securitycontextconstraints.security.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.626818       1 apiext.go:66] Updating CRD proxies.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.667498       1 apiext.go:66] Updating CRD apiservers.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.718848       1 apiext.go:66] Updating CRD authentications.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.864000       1 apiext.go:66] Updating CRD builds.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.925033       1 apiext.go:66] Updating CRD configs.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.019190       1 apiext.go:66] Updating CRD consoles.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.120559       1 apiext.go:66] Updating CRD dnses.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.223597       1 apiext.go:66] Updating CRD featuregates.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.329532       1 apiext.go:66] Updating CRD images.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.419314       1 apiext.go:66] Updating CRD imagecontentsourcepolicies.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.538190       1 apiext.go:66] Updating CRD infrastructures.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.620048       1 apiext.go:66] Updating CRD ingresses.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.721331       1 apiext.go:66] Updating CRD networks.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.835839       1 apiext.go:66] Updating CRD oauths.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.919724       1 core.go:40] Updating Namespace openshift-config-managed due to diff:   &v1.Namespace{
I0708 19:43:37.016237       1 core.go:40] Updating Namespace openshift-config due to diff:   &v1.Namespace{
I0708 19:43:37.123508       1 apiext.go:66] Updating CRD projects.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.221428       1 apiext.go:66] Updating CRD schedulers.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.342817       1 rbac.go:64] Updating ClusterRole system:openshift:cluster-config-operator:cluster-reader due to diff:   &v1.ClusterRole{
I0708 19:43:37.777741       1 apiext.go:66] Updating CRD configs.samples.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.866508       1 core.go:40] Updating Namespace openshift-etcd-operator due to diff:   &v1.Namespace{
I0708 19:43:37.982816       1 apiext.go:66] Updating CRD etcds.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:38.470311       1 apps.go:38] Updating Deployment openshift-etcd-operator/etcd-operator due to diff:   &v1.Deployment{
I0708 19:48:00.712707       1 core.go:40] Updating Namespace openshift-kube-apiserver-operator due to diff:   &v1.Namespace{
I0708 19:48:01.172035       1 apiext.go:66] Updating CRD kubeapiservers.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:48:01.541882       1 apps.go:38] Updating Deployment openshift-kube-apiserver-operator/kube-apiserver-operator due to diff:   &v1.Deployment{


Expected results:
The CVO logs should not contain those lines during updates.

Comment 2 W. Trevor King 2021-07-23 20:28:12 UTC
Bug work should be done here, but it's Arjun's PR, so retroactively assigning to Arjun.

Comment 3 Yang Yang 2021-08-10 07:03:57 UTC
Verifying it by upgrading to 4.9.0-0.nightly-2021-08-07-175228 from 4.8.4

Checking the CVO logs:

# grep "Updating .* due to diff" *
2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log:I0810 03:29:58.258187       1 apps.go:38] Updating Deployment openshift-cluster-version/cluster-version-operator due to diff:   &v1.Deployment{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:46:58.632612       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:47:14.380679       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:28.899752       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:44.599673       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:03.368863       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:19.069436       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:33.635413       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:53.736112       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:12.301683       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:27.952481       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:04:42.570408       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:05:02.472065       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:17.060403       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:32.739151       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:11:47.303756       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:12:07.208489       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:21.878653       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:37.427271       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:18:56.344382       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:19:11.996115       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:26.615730       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:42.367613       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:01.088580       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:16.685415       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:31.349948       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:47.052702       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{

Arjun, I appreciate if you can help with the following questions:

1)  The diff message seems not human readable. Do you think it deserve an improvement? 

10009 I0810 04:50:28.899752       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
10010         Object: map[string]interface{}{
10011                 "apiVersion": string("monitoring.coreos.com/v1"),
10012                 "kind":       string("PrometheusRule"),
10013                 "metadata":   map[string]interface{}{"annotations": map[string]interface{}{"exclude.releas      e.openshift.io/internal-openshift-hosted": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true")}, "creationTimestamp": string("2021-08-10T02:48:07Z"), "generation": int64(2), "labels": map[string]interface{}      {"prometheus": string("k8s"), "role": string("alert-rules")}, ...},
10014                 "spec": map[string]interface{}{
10015                         "groups": []interface{}{
10016                                 ... // 2 identical elements
10017                                 map[string]interface{}{"name": string("machine-not-yet-deleted"), "rules":       []interface{}{map[string]interface{}{"alert": string("MachineNotYetDeleted"), "annotations": map[string]i      nterface{}{"message": string("machine {{ $labels.name }} has been in Deleting phase for more t"...)}, "exp      r": string("(mapi_machine_created_timestamp_seconds{phase=\"Deleting\"}) > 0\n"), "for": string("360m"), .      ..}}},
10018                                 map[string]interface{}{"name": string("machine-api-operator-metrics-collec      tor-up"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineAPIOperatorMetricsCollecti      onFailing"), "annotations": map[string]interface{}{"message": string("machine api operator metrics collection is failing. For more det"...)}, "expr": string("mapi_mao_collector_up == 0\n"), "for": string("5m"), .      ..}}},
10019                                 map[string]interface{}{
10020                                         "name": string("machine-health-check-unterminated-short-circuit"),
10021                                         "rules": []interface{}{
10022                                                 map[string]interface{}{
10023                                                         "alert": string("MachineHealthCheckUnterminatedShortCircuit"),
10024 -                                                       "annotation": map[string]interface{}{
10025 -                                                               "message": string("machine health check {{       $labels.name }} has been disabled by short circuit for more than 30 minutes"),
10026 -                                                       },
10027                                                         "expr":   string("mapi_machinehealthcheck_short_circuit == 1\n"),
10028                                                         "for":    string("30m"),
10029                                                         "labels": map[string]interface{}{"severity": string("warning")},
10030                                                 },
10031                                         },
10032                                 },
10033                         },
10034                 },
10035         },
10036   }

2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true?

# grep "Reconciling:true" 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log | head -n1
I0810 03:11:38.601372       1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Done:676, Total:676, Completed:1, Reconciling:true, Initial:false, VersionHash:"Ca4Rg3WnQAk=", LastProgress:time.Time{wall:0xc03c9978d537681d, ext:513779096593, loc:(*time.Location)(0x2ab2a20)}, Actual:v1.Release{Version:"4.8.4", Image:"quay.io/openshift-release-dev/ocp-release@sha256:841535acc09ca8412cd17e8f7702eceda1cac688ccc281278f108675c30de270", URL:"https://access.redhat.com/errata/RHSA-2021:2983", Channels:[]string(nil)}, Verified:false}

# grep "Reconciling:true" 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log | head -n1
I0810 04:43:48.161640       1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:2, Step:"", Failure:error(nil), Done:729, Total:729, Completed:1, Reconciling:true, Initial:false, VersionHash:"_6QM_p4p2-w=", LastProgress:time.Time{wall:0xc03c9edf439eac63, ext:843309600346, loc:(*time.Location)(0x2ad0dc0)}, Actual:v1.Release{Version:"4.9.0-0.nightly-2021-08-07-175228", Image:"registry.ci.openshift.org/ocp/release@sha256:62f43b04f6af74ea568c135367b33f7cc8212ca4aa760388ea7884ebccbfb1f1", URL:"", Channels:[]string(nil)}, Verified:false}

3) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package?

Comment 4 Vadim Rutkovsky 2021-08-16 08:10:38 UTC
>1)  The diff message seems not human readable. Do you think it deserve an improvement?

I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually).

>2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true?

Yes

>) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package?

Not quite sure what you mean. `generic.go` handles unstructured resources, so diffs are more likely to occur there, but k8s changes may bring in hotloops in resourceapply as well

Comment 5 Yang Yang 2021-08-18 01:58:16 UTC
Many thanks Vadim!

>>1)  The diff message seems not human readable. Do you think it deserve an improvement?

>I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually).

Ah, I didn't notice the '-' indicator. That's clear enough.


The resource diff is logged in reconcile mode. Moving it to verified state.

Comment 8 errata-xmlrpc 2021-10-18 17:40:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.