Description of problem: Due to incorrect diffing in the CVO payload resources are constantly updated with a log of the diff between the expected resource in the payload and the actual resource on cluster. The resource diffs are also logged when there is an expected change during updates. This makes the logs less useful How reproducible: Update to 4.8.0 from 4.7.20 Actual results: The CVO logs contains a lot of diff lines I0708 19:43:34.013506 1 generic.go:81] Updating PrometheusRule openshift-cluster-version/cluster-version-operator due to diff: &unstructured.Unstructured{ I0708 19:43:34.128546 1 apiext.go:66] Updating CRD clusterresourcequotas.quota.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:34.183628 1 apiext.go:66] Updating CRD rolebindingrestrictions.authorization.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:34.224653 1 apiext.go:66] Updating CRD operatorhubs.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:34.267313 1 apiext.go:66] Updating CRD rangeallocations.security.internal.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:34.333429 1 apiext.go:66] Updating CRD securitycontextconstraints.security.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:34.626818 1 apiext.go:66] Updating CRD proxies.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:35.667498 1 apiext.go:66] Updating CRD apiservers.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:35.718848 1 apiext.go:66] Updating CRD authentications.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:35.864000 1 apiext.go:66] Updating CRD builds.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:35.925033 1 apiext.go:66] Updating CRD configs.operator.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.019190 1 apiext.go:66] Updating CRD consoles.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.120559 1 apiext.go:66] Updating CRD dnses.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.223597 1 apiext.go:66] Updating CRD featuregates.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.329532 1 apiext.go:66] Updating CRD images.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.419314 1 apiext.go:66] Updating CRD imagecontentsourcepolicies.operator.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.538190 1 apiext.go:66] Updating CRD infrastructures.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.620048 1 apiext.go:66] Updating CRD ingresses.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.721331 1 apiext.go:66] Updating CRD networks.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.835839 1 apiext.go:66] Updating CRD oauths.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:36.919724 1 core.go:40] Updating Namespace openshift-config-managed due to diff: &v1.Namespace{ I0708 19:43:37.016237 1 core.go:40] Updating Namespace openshift-config due to diff: &v1.Namespace{ I0708 19:43:37.123508 1 apiext.go:66] Updating CRD projects.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:37.221428 1 apiext.go:66] Updating CRD schedulers.config.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:37.342817 1 rbac.go:64] Updating ClusterRole system:openshift:cluster-config-operator:cluster-reader due to diff: &v1.ClusterRole{ I0708 19:43:37.777741 1 apiext.go:66] Updating CRD configs.samples.operator.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:37.866508 1 core.go:40] Updating Namespace openshift-etcd-operator due to diff: &v1.Namespace{ I0708 19:43:37.982816 1 apiext.go:66] Updating CRD etcds.operator.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:43:38.470311 1 apps.go:38] Updating Deployment openshift-etcd-operator/etcd-operator due to diff: &v1.Deployment{ I0708 19:48:00.712707 1 core.go:40] Updating Namespace openshift-kube-apiserver-operator due to diff: &v1.Namespace{ I0708 19:48:01.172035 1 apiext.go:66] Updating CRD kubeapiservers.operator.openshift.io due to diff: &v1.CustomResourceDefinition{ I0708 19:48:01.541882 1 apps.go:38] Updating Deployment openshift-kube-apiserver-operator/kube-apiserver-operator due to diff: &v1.Deployment{ Expected results: The CVO logs should not contain those lines during updates.
Bug work should be done here, but it's Arjun's PR, so retroactively assigning to Arjun.
Verifying it by upgrading to 4.9.0-0.nightly-2021-08-07-175228 from 4.8.4 Checking the CVO logs: # grep "Updating .* due to diff" * 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log:I0810 03:29:58.258187 1 apps.go:38] Updating Deployment openshift-cluster-version/cluster-version-operator due to diff: &v1.Deployment{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:46:58.632612 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:47:14.380679 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:28.899752 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:44.599673 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:03.368863 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:19.069436 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:33.635413 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:53.736112 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:12.301683 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:27.952481 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:04:42.570408 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:05:02.472065 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:17.060403 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:32.739151 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:11:47.303756 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:12:07.208489 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:21.878653 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:37.427271 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:18:56.344382 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:19:11.996115 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:26.615730 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:42.367613 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:01.088580 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:16.685415 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:31.349948 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:47.052702 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ Arjun, I appreciate if you can help with the following questions: 1) The diff message seems not human readable. Do you think it deserve an improvement? 10009 I0810 04:50:28.899752 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 10010 Object: map[string]interface{}{ 10011 "apiVersion": string("monitoring.coreos.com/v1"), 10012 "kind": string("PrometheusRule"), 10013 "metadata": map[string]interface{}{"annotations": map[string]interface{}{"exclude.releas e.openshift.io/internal-openshift-hosted": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true")}, "creationTimestamp": string("2021-08-10T02:48:07Z"), "generation": int64(2), "labels": map[string]interface{} {"prometheus": string("k8s"), "role": string("alert-rules")}, ...}, 10014 "spec": map[string]interface{}{ 10015 "groups": []interface{}{ 10016 ... // 2 identical elements 10017 map[string]interface{}{"name": string("machine-not-yet-deleted"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineNotYetDeleted"), "annotations": map[string]i nterface{}{"message": string("machine {{ $labels.name }} has been in Deleting phase for more t"...)}, "exp r": string("(mapi_machine_created_timestamp_seconds{phase=\"Deleting\"}) > 0\n"), "for": string("360m"), . ..}}}, 10018 map[string]interface{}{"name": string("machine-api-operator-metrics-collec tor-up"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineAPIOperatorMetricsCollecti onFailing"), "annotations": map[string]interface{}{"message": string("machine api operator metrics collection is failing. For more det"...)}, "expr": string("mapi_mao_collector_up == 0\n"), "for": string("5m"), . ..}}}, 10019 map[string]interface{}{ 10020 "name": string("machine-health-check-unterminated-short-circuit"), 10021 "rules": []interface{}{ 10022 map[string]interface{}{ 10023 "alert": string("MachineHealthCheckUnterminatedShortCircuit"), 10024 - "annotation": map[string]interface{}{ 10025 - "message": string("machine health check {{ $labels.name }} has been disabled by short circuit for more than 30 minutes"), 10026 - }, 10027 "expr": string("mapi_machinehealthcheck_short_circuit == 1\n"), 10028 "for": string("30m"), 10029 "labels": map[string]interface{}{"severity": string("warning")}, 10030 }, 10031 }, 10032 }, 10033 }, 10034 }, 10035 }, 10036 } 2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true? # grep "Reconciling:true" 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log | head -n1 I0810 03:11:38.601372 1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Done:676, Total:676, Completed:1, Reconciling:true, Initial:false, VersionHash:"Ca4Rg3WnQAk=", LastProgress:time.Time{wall:0xc03c9978d537681d, ext:513779096593, loc:(*time.Location)(0x2ab2a20)}, Actual:v1.Release{Version:"4.8.4", Image:"quay.io/openshift-release-dev/ocp-release@sha256:841535acc09ca8412cd17e8f7702eceda1cac688ccc281278f108675c30de270", URL:"https://access.redhat.com/errata/RHSA-2021:2983", Channels:[]string(nil)}, Verified:false} # grep "Reconciling:true" 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log | head -n1 I0810 04:43:48.161640 1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:2, Step:"", Failure:error(nil), Done:729, Total:729, Completed:1, Reconciling:true, Initial:false, VersionHash:"_6QM_p4p2-w=", LastProgress:time.Time{wall:0xc03c9edf439eac63, ext:843309600346, loc:(*time.Location)(0x2ad0dc0)}, Actual:v1.Release{Version:"4.9.0-0.nightly-2021-08-07-175228", Image:"registry.ci.openshift.org/ocp/release@sha256:62f43b04f6af74ea568c135367b33f7cc8212ca4aa760388ea7884ebccbfb1f1", URL:"", Channels:[]string(nil)}, Verified:false} 3) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package?
>1) The diff message seems not human readable. Do you think it deserve an improvement? I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually). >2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true? Yes >) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package? Not quite sure what you mean. `generic.go` handles unstructured resources, so diffs are more likely to occur there, but k8s changes may bring in hotloops in resourceapply as well
Many thanks Vadim! >>1) The diff message seems not human readable. Do you think it deserve an improvement? >I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually). Ah, I didn't notice the '-' indicator. That's clear enough. The resource diff is logged in reconcile mode. Moving it to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759