Bug 1984414

Summary: Excessive resource diff logging during updates
Product: OpenShift Container Platform Reporter: Arjun Naik <anaik>
Component: Cluster Version OperatorAssignee: Vadim Rutkovsky <vrutkovs>
Status: CLOSED ERRATA QA Contact: Yang Yang <yanyang>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.8CC: aos-bugs, jokerman, vrutkovs, wking
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:40:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Arjun Naik 2021-07-21 11:55:36 UTC
Description of problem:
Due to incorrect diffing in the CVO payload resources are constantly updated with a log of the diff between the expected resource in the payload and the actual resource on cluster. The resource diffs are also logged when there is an expected change during updates. This makes the logs less useful 


How reproducible:
Update to 4.8.0 from 4.7.20

Actual results:
The CVO logs contains a lot of diff lines

I0708 19:43:34.013506       1 generic.go:81] Updating PrometheusRule openshift-cluster-version/cluster-version-operator due to diff:   &unstructured.Unstructured{
I0708 19:43:34.128546       1 apiext.go:66] Updating CRD clusterresourcequotas.quota.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.183628       1 apiext.go:66] Updating CRD rolebindingrestrictions.authorization.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.224653       1 apiext.go:66] Updating CRD operatorhubs.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.267313       1 apiext.go:66] Updating CRD rangeallocations.security.internal.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.333429       1 apiext.go:66] Updating CRD securitycontextconstraints.security.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:34.626818       1 apiext.go:66] Updating CRD proxies.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.667498       1 apiext.go:66] Updating CRD apiservers.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.718848       1 apiext.go:66] Updating CRD authentications.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.864000       1 apiext.go:66] Updating CRD builds.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:35.925033       1 apiext.go:66] Updating CRD configs.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.019190       1 apiext.go:66] Updating CRD consoles.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.120559       1 apiext.go:66] Updating CRD dnses.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.223597       1 apiext.go:66] Updating CRD featuregates.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.329532       1 apiext.go:66] Updating CRD images.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.419314       1 apiext.go:66] Updating CRD imagecontentsourcepolicies.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.538190       1 apiext.go:66] Updating CRD infrastructures.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.620048       1 apiext.go:66] Updating CRD ingresses.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.721331       1 apiext.go:66] Updating CRD networks.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.835839       1 apiext.go:66] Updating CRD oauths.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:36.919724       1 core.go:40] Updating Namespace openshift-config-managed due to diff:   &v1.Namespace{
I0708 19:43:37.016237       1 core.go:40] Updating Namespace openshift-config due to diff:   &v1.Namespace{
I0708 19:43:37.123508       1 apiext.go:66] Updating CRD projects.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.221428       1 apiext.go:66] Updating CRD schedulers.config.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.342817       1 rbac.go:64] Updating ClusterRole system:openshift:cluster-config-operator:cluster-reader due to diff:   &v1.ClusterRole{
I0708 19:43:37.777741       1 apiext.go:66] Updating CRD configs.samples.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:37.866508       1 core.go:40] Updating Namespace openshift-etcd-operator due to diff:   &v1.Namespace{
I0708 19:43:37.982816       1 apiext.go:66] Updating CRD etcds.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:43:38.470311       1 apps.go:38] Updating Deployment openshift-etcd-operator/etcd-operator due to diff:   &v1.Deployment{
I0708 19:48:00.712707       1 core.go:40] Updating Namespace openshift-kube-apiserver-operator due to diff:   &v1.Namespace{
I0708 19:48:01.172035       1 apiext.go:66] Updating CRD kubeapiservers.operator.openshift.io due to diff:   &v1.CustomResourceDefinition{
I0708 19:48:01.541882       1 apps.go:38] Updating Deployment openshift-kube-apiserver-operator/kube-apiserver-operator due to diff:   &v1.Deployment{


Expected results:
The CVO logs should not contain those lines during updates.

Comment 2 W. Trevor King 2021-07-23 20:28:12 UTC
Bug work should be done here, but it's Arjun's PR, so retroactively assigning to Arjun.

Comment 3 Yang Yang 2021-08-10 07:03:57 UTC
Verifying it by upgrading to 4.9.0-0.nightly-2021-08-07-175228 from 4.8.4

Checking the CVO logs:

# grep "Updating .* due to diff" *
2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log:I0810 03:29:58.258187       1 apps.go:38] Updating Deployment openshift-cluster-version/cluster-version-operator due to diff:   &v1.Deployment{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:46:58.632612       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:47:14.380679       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:28.899752       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:44.599673       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:03.368863       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:19.069436       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:33.635413       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:53.736112       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:12.301683       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:27.952481       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:04:42.570408       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:05:02.472065       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:17.060403       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:32.739151       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:11:47.303756       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:12:07.208489       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:21.878653       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:37.427271       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:18:56.344382       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:19:11.996115       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:26.615730       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:42.367613       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:01.088580       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:16.685415       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:31.349948       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:47.052702       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{

Arjun, I appreciate if you can help with the following questions:

1)  The diff message seems not human readable. Do you think it deserve an improvement? 

10009 I0810 04:50:28.899752       1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff:   &unstructured.Unstructured{
10010         Object: map[string]interface{}{
10011                 "apiVersion": string("monitoring.coreos.com/v1"),
10012                 "kind":       string("PrometheusRule"),
10013                 "metadata":   map[string]interface{}{"annotations": map[string]interface{}{"exclude.releas      e.openshift.io/internal-openshift-hosted": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true")}, "creationTimestamp": string("2021-08-10T02:48:07Z"), "generation": int64(2), "labels": map[string]interface{}      {"prometheus": string("k8s"), "role": string("alert-rules")}, ...},
10014                 "spec": map[string]interface{}{
10015                         "groups": []interface{}{
10016                                 ... // 2 identical elements
10017                                 map[string]interface{}{"name": string("machine-not-yet-deleted"), "rules":       []interface{}{map[string]interface{}{"alert": string("MachineNotYetDeleted"), "annotations": map[string]i      nterface{}{"message": string("machine {{ $labels.name }} has been in Deleting phase for more t"...)}, "exp      r": string("(mapi_machine_created_timestamp_seconds{phase=\"Deleting\"}) > 0\n"), "for": string("360m"), .      ..}}},
10018                                 map[string]interface{}{"name": string("machine-api-operator-metrics-collec      tor-up"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineAPIOperatorMetricsCollecti      onFailing"), "annotations": map[string]interface{}{"message": string("machine api operator metrics collection is failing. For more det"...)}, "expr": string("mapi_mao_collector_up == 0\n"), "for": string("5m"), .      ..}}},
10019                                 map[string]interface{}{
10020                                         "name": string("machine-health-check-unterminated-short-circuit"),
10021                                         "rules": []interface{}{
10022                                                 map[string]interface{}{
10023                                                         "alert": string("MachineHealthCheckUnterminatedShortCircuit"),
10024 -                                                       "annotation": map[string]interface{}{
10025 -                                                               "message": string("machine health check {{       $labels.name }} has been disabled by short circuit for more than 30 minutes"),
10026 -                                                       },
10027                                                         "expr":   string("mapi_machinehealthcheck_short_circuit == 1\n"),
10028                                                         "for":    string("30m"),
10029                                                         "labels": map[string]interface{}{"severity": string("warning")},
10030                                                 },
10031                                         },
10032                                 },
10033                         },
10034                 },
10035         },
10036   }

2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true?

# grep "Reconciling:true" 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log | head -n1
I0810 03:11:38.601372       1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Done:676, Total:676, Completed:1, Reconciling:true, Initial:false, VersionHash:"Ca4Rg3WnQAk=", LastProgress:time.Time{wall:0xc03c9978d537681d, ext:513779096593, loc:(*time.Location)(0x2ab2a20)}, Actual:v1.Release{Version:"4.8.4", Image:"quay.io/openshift-release-dev/ocp-release@sha256:841535acc09ca8412cd17e8f7702eceda1cac688ccc281278f108675c30de270", URL:"https://access.redhat.com/errata/RHSA-2021:2983", Channels:[]string(nil)}, Verified:false}

# grep "Reconciling:true" 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log | head -n1
I0810 04:43:48.161640       1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:2, Step:"", Failure:error(nil), Done:729, Total:729, Completed:1, Reconciling:true, Initial:false, VersionHash:"_6QM_p4p2-w=", LastProgress:time.Time{wall:0xc03c9edf439eac63, ext:843309600346, loc:(*time.Location)(0x2ad0dc0)}, Actual:v1.Release{Version:"4.9.0-0.nightly-2021-08-07-175228", Image:"registry.ci.openshift.org/ocp/release@sha256:62f43b04f6af74ea568c135367b33f7cc8212ca4aa760388ea7884ebccbfb1f1", URL:"", Channels:[]string(nil)}, Verified:false}

3) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package?

Comment 4 Vadim Rutkovsky 2021-08-16 08:10:38 UTC
>1)  The diff message seems not human readable. Do you think it deserve an improvement?

I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually).

>2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true?

Yes

>) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package?

Not quite sure what you mean. `generic.go` handles unstructured resources, so diffs are more likely to occur there, but k8s changes may bring in hotloops in resourceapply as well

Comment 5 Yang Yang 2021-08-18 01:58:16 UTC
Many thanks Vadim!

>>1)  The diff message seems not human readable. Do you think it deserve an improvement?

>I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually).

Ah, I didn't notice the '-' indicator. That's clear enough.


The resource diff is logged in reconcile mode. Moving it to verified state.

Comment 8 errata-xmlrpc 2021-10-18 17:40:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759