Bug 1984414
Summary: | Excessive resource diff logging during updates | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Arjun Naik <anaik> |
Component: | Cluster Version Operator | Assignee: | Vadim Rutkovsky <vrutkovs> |
Status: | CLOSED ERRATA | QA Contact: | Yang Yang <yanyang> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | aos-bugs, jokerman, vrutkovs, wking |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:40:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Arjun Naik
2021-07-21 11:55:36 UTC
Bug work should be done here, but it's Arjun's PR, so retroactively assigning to Arjun. Verifying it by upgrading to 4.9.0-0.nightly-2021-08-07-175228 from 4.8.4 Checking the CVO logs: # grep "Updating .* due to diff" * 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log:I0810 03:29:58.258187 1 apps.go:38] Updating Deployment openshift-cluster-version/cluster-version-operator due to diff: &v1.Deployment{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:46:58.632612 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:47:14.380679 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:28.899752 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:50:44.599673 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:03.368863 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:54:19.069436 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:33.635413 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 04:57:53.736112 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:12.301683 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:01:27.952481 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:04:42.570408 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:05:02.472065 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:17.060403 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:08:32.739151 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:11:47.303756 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:12:07.208489 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:21.878653 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:15:37.427271 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:18:56.344382 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:19:11.996115 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:26.615730 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:22:42.367613 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:01.088580 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:26:16.685415 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:31.349948 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log:I0810 05:29:47.052702 1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff: &unstructured.Unstructured{ Arjun, I appreciate if you can help with the following questions: 1) The diff message seems not human readable. Do you think it deserve an improvement? 10009 I0810 04:50:28.899752 1 generic.go:109] Updating PrometheusRule openshift-machine-api/machine-api-operator-prometheus-rules due to diff: &unstructured.Unstructured{ 10010 Object: map[string]interface{}{ 10011 "apiVersion": string("monitoring.coreos.com/v1"), 10012 "kind": string("PrometheusRule"), 10013 "metadata": map[string]interface{}{"annotations": map[string]interface{}{"exclude.releas e.openshift.io/internal-openshift-hosted": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true")}, "creationTimestamp": string("2021-08-10T02:48:07Z"), "generation": int64(2), "labels": map[string]interface{} {"prometheus": string("k8s"), "role": string("alert-rules")}, ...}, 10014 "spec": map[string]interface{}{ 10015 "groups": []interface{}{ 10016 ... // 2 identical elements 10017 map[string]interface{}{"name": string("machine-not-yet-deleted"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineNotYetDeleted"), "annotations": map[string]i nterface{}{"message": string("machine {{ $labels.name }} has been in Deleting phase for more t"...)}, "exp r": string("(mapi_machine_created_timestamp_seconds{phase=\"Deleting\"}) > 0\n"), "for": string("360m"), . ..}}}, 10018 map[string]interface{}{"name": string("machine-api-operator-metrics-collec tor-up"), "rules": []interface{}{map[string]interface{}{"alert": string("MachineAPIOperatorMetricsCollecti onFailing"), "annotations": map[string]interface{}{"message": string("machine api operator metrics collection is failing. For more det"...)}, "expr": string("mapi_mao_collector_up == 0\n"), "for": string("5m"), . ..}}}, 10019 map[string]interface{}{ 10020 "name": string("machine-health-check-unterminated-short-circuit"), 10021 "rules": []interface{}{ 10022 map[string]interface{}{ 10023 "alert": string("MachineHealthCheckUnterminatedShortCircuit"), 10024 - "annotation": map[string]interface{}{ 10025 - "message": string("machine health check {{ $labels.name }} has been disabled by short circuit for more than 30 minutes"), 10026 - }, 10027 "expr": string("mapi_machinehealthcheck_short_circuit == 1\n"), 10028 "for": string("30m"), 10029 "labels": map[string]interface{}{"severity": string("warning")}, 10030 }, 10031 }, 10032 }, 10033 }, 10034 }, 10035 }, 10036 } 2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true? # grep "Reconciling:true" 2021-08-10T03:27:46+0000-cluster-version-operator-5cf576bd97-94rwj.log | head -n1 I0810 03:11:38.601372 1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:1, Step:"", Failure:error(nil), Done:676, Total:676, Completed:1, Reconciling:true, Initial:false, VersionHash:"Ca4Rg3WnQAk=", LastProgress:time.Time{wall:0xc03c9978d537681d, ext:513779096593, loc:(*time.Location)(0x2ab2a20)}, Actual:v1.Release{Version:"4.8.4", Image:"quay.io/openshift-release-dev/ocp-release@sha256:841535acc09ca8412cd17e8f7702eceda1cac688ccc281278f108675c30de270", URL:"https://access.redhat.com/errata/RHSA-2021:2983", Channels:[]string(nil)}, Verified:false} # grep "Reconciling:true" 2021-08-10T04:28:28+0000-cluster-version-operator-7d4d65b58f-2p7nd.log | head -n1 I0810 04:43:48.161640 1 status.go:161] Synchronizing errs=field.ErrorList{} status=&cvo.SyncWorkerStatus{Generation:2, Step:"", Failure:error(nil), Done:729, Total:729, Completed:1, Reconciling:true, Initial:false, VersionHash:"_6QM_p4p2-w=", LastProgress:time.Time{wall:0xc03c9edf439eac63, ext:843309600346, loc:(*time.Location)(0x2ad0dc0)}, Actual:v1.Release{Version:"4.9.0-0.nightly-2021-08-07-175228", Image:"registry.ci.openshift.org/ocp/release@sha256:62f43b04f6af74ea568c135367b33f7cc8212ca4aa760388ea7884ebccbfb1f1", URL:"", Channels:[]string(nil)}, Verified:false} 3) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package? >1) The diff message seems not human readable. Do you think it deserve an improvement? I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually). >2) How can I know when it's in reconcile mode? Does it go to reconcile mode once I get Reconciling:true? Yes >) Most of the diff message come from generic.go. Does it mean that it's less likely to touch the diff log in the API in the resourceapply package? Not quite sure what you mean. `generic.go` handles unstructured resources, so diffs are more likely to occur there, but k8s changes may bring in hotloops in resourceapply as well Many thanks Vadim! >>1) The diff message seems not human readable. Do you think it deserve an improvement? >I don't think its worth looking into. CVO hotloop logging is useful to file bugs, which are not occurring frequently (once per minor release usually). Ah, I didn't notice the '-' indicator. That's clear enough. The resource diff is logged in reconcile mode. Moving it to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |