Created attachment 1882186 [details] CVO log file Description of problem: In a cluster, we can see CVO hotloops on the resources $ grep -o 'Updating .*due to diff' cvo.log | sort | uniq -c 32 Updating CRD performanceprofiles.performance.openshift.io due to diff 32 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff 32 Updating OperatorGroup openshift-monitoring/openshift-cluster-monitoring due to diff 32 Updating OperatorGroup openshift-operator-lifecycle-manager/olm-operators due to diff 32 Updating OperatorGroup openshift-operators/global-operators due to diff 32 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff $ grep "Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff" -A20 cvo.log I0519 06:24:01.286135 1 generic.go:109] Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff: &unstructured.Unstructured{ Object: map[string]interface{}{ "apiVersion": string("admissionregistration.k8s.io/v1"), "kind": string("ValidatingWebhookConfiguration"), "metadata": map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true"), "service.beta.openshift.io/inject-cabundle": string("true")}, "creationTimestamp": string("2022-05-19T03:44:02Z"), "generation": int64(72), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("admissionregistration.k8s.io/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}, "f:include.release.openshift.io/single-node-developer": map[string]interface{}{}, ...}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"5d28ef1d-78f0-4a88-9c2c-1f0fb01e3c57"}`: map[string]interface{}{}}}, "f:webhooks": map[string]interface{}{".": map[string]interface{}{}, `k:{"name":"vwb.performance.openshift.io"}`: map[string]interface{}{".": map[string]interface{}{}, "f:admissionReviewVersions": map[string]interface{}{}, "f:clientConfig": map[string]interface{}{".": map[string]interface{}{}, "f:service": map[string]interface{}{".": map[string]interface{}{}, "f:name": map[string]interface{}{}, "f:namespace": map[string]interface{}{}, "f:path": map[string]interface{}{}, ...}}, "f:failurePolicy": map[string]interface{}{}, ...}}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("admissionregistration.k8s.io/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:webhooks": map[string]interface{}{`k:{"name":"vwb.performance.openshift.io"}`: map[string]interface{}{"f:clientConfig": map[string]interface{}{"f:caBundle": map[string]interface{}{}}}}}, "manager": string("Go-http-client"), ...}}, ...}, "webhooks": []interface{}{ map[string]interface{}{ "admissionReviewVersions": []interface{}{string("v1")}, "clientConfig": map[string]interface{}{ + "caBundle": string("LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURVVENDQWptZ0F3SUJBZ0lJQVBWdk9qMzhFNjB3RFFZSktvWklodmNOQVFFTEJRQXdOakUwTURJR0ExVUUKQXd3cmIzQmxibk5vYVdaMExYTmxjblpwWTJVdGMyVnlkbWx1WnkxemFXZHVaWEpBTVRZMU1qa3pNVGs0TlRBZQpGdzB5TWpBMU1Ua3dNelEyTWpSYUZ3MHlOREEzTVRjd016"...), "service": map[string]interface{}{"name": string("performance-addon-operator-service"), "namespace": string("openshift-cluster-node-tuning-operator"), "path": string("/validate-performance-openshift-io-v2-performanceprofile"), "port": int64(443)}, }, "failurePolicy": string("Fail"), "matchPolicy": string("Equivalent"), "name": string("vwb.performance.openshift.io"), + "namespaceSelector": map[string]interface{}{}, + "objectSelector": map[string]interface{}{}, "rules": []interface{}{map[string]interface{}{"apiGroups": []interface{}{string("performance.openshift.io")}, "apiVersions": []interface{}{string("v2")}, "operations": []interface{}{string("CREATE"), string("UPDATE")}, "resources": []interface{}{string("performanceprofiles")}, ...}}, "sideEffects": string("None"), "timeoutSeconds": int64(10), }, }, }, } $ cat manifests/0000_50_cluster-node-tuning-operator_45-webhook-configuration.yaml --- apiVersion: v1 kind: Service metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" include.release.openshift.io/ibm-cloud-managed: "true" service.beta.openshift.io/serving-cert-secret-name: performance-addon-operator-webhook-cert labels: name: performance-addon-operator-service name: performance-addon-operator-service namespace: openshift-cluster-node-tuning-operator spec: ports: - name: "443" port: 443 protocol: TCP targetPort: 4343 selector: name: cluster-node-tuning-operator type: ClusterIP --- apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: annotations: include.release.openshift.io/self-managed-high-availability: "true" include.release.openshift.io/single-node-developer: "true" include.release.openshift.io/ibm-cloud-managed: "true" service.beta.openshift.io/inject-cabundle: "true" name: performance-addon-operator webhooks: - admissionReviewVersions: - v1 clientConfig: service: name: performance-addon-operator-service namespace: openshift-cluster-node-tuning-operator path: /validate-performance-openshift-io-v2-performanceprofile port: 443 failurePolicy: Fail matchPolicy: Equivalent name: vwb.performance.openshift.io rules: - apiGroups: - performance.openshift.io apiVersions: - v2 operations: - CREATE - UPDATE resources: - performanceprofiles scope: '*' sideEffects: None timeoutSeconds: 10 Version-Release number of the following components: 4.11.0-0.nightly-2022-05-18-053037 How reproducible: 1/1 Steps to Reproduce: 1. Install a cluster 2. 3. Actual results: CVO hotloops on ValidatingWebhookConfiguration /performance-addon-operator Expected results: CVO doesn't hotloop on the resource. Additional info: Please attach logs from ansible-playbook with the -vvv flag
reproducing on 4.13.0-0.nightly-2023-02-01-112003 ╰─ grep -o 'Updating .*due to diff' cluster-version-operator-dd645f75f-wmkdv.log | sort | uniq -c 44 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff 44 Updating Namespace driver-toolkit due to diff 44 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff 44 Updating ValidatingWebhookConfiguration /controlplanemachineset.machine.openshift.io due to diff 44 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff ValidatingWebhookConfiguration hotlooping detected. checking audit logs: ╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T12:11:51.900533Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T12:11:51.908461Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T13:22:12.742398Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:07:12.716573Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:07:12.725835Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:10:25.442118Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:10:25.449630Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:13:38.315396Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:13:38.327977Z {"metadata":{},"code":200} system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:16:51.091818Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:16:51.103642Z {"metadata":{},"code":200} openshift-cluster-version and openshift-service-ca are fighting for the resource. verifying in 4.13.0-0.nightly-2023-02-06-123554: ╰─ grep -o 'Updating .*due to diff' cluster-version-operator-cb7688bb6-9f8p6.log | sort | uniq -c 121 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff 121 Updating Namespace driver-toolkit due to diff 121 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff other hotloops detected (tracked in another bugs), but no ValidatingWebhookConfiguration. good! in audit logs: ╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' <no output> as expected! now lets check who is responding to resource update first deleted ca-bundle still no webhook hotloop in cvo logs ╰─ grep -o 'Updating .*due to diff' cluster-version-operator-cb7688bb6-9f8p6.log | sort | uniq -c 172 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff 172 Updating Namespace driver-toolkit due to diff 172 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff only service-ca updated the resource - as expected. ╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)' system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:24:26.287783Z {"metadata":{},"code":200} now deleted the webhook - this time cvo updated the resource, and then service-ca injected caBundle. and no further cvo intervention system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T15:14:35.259737Z {"metadata":{},"code":200} system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations performance-addon-operator admissionregistration.k8s.io 2023-02-07T15:14:35.271870Z {"metadata":{},"code":200} in cvo log: ╰─ grep -ho 'Updating .*due to diff' *.log | sort | uniq -c 235 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff 235 Updating Namespace driver-toolkit due to diff 235 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff 1 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff looks good to me
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:1326