Bug 2089138 - CVO hotloops on ValidatingWebhookConfiguration /performance-addon-operator
Summary: CVO hotloops on ValidatingWebhookConfiguration /performance-addon-operator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.13.0
Assignee: David Hurta
QA Contact: Evgeni Vakhonin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-23 06:10 UTC by Yang Yang
Modified: 2023-05-17 22:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-17 22:46:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
CVO log file (11.11 MB, text/plain)
2022-05-23 06:10 UTC, Yang Yang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 893 0 None open Bug 2089138: CVO hotloops on ValidatingWebhookConfiguration 2023-01-26 14:59:11 UTC
Red Hat Product Errata RHSA-2023:1326 0 None None None 2023-05-17 22:46:45 UTC

Description Yang Yang 2022-05-23 06:10:16 UTC
Created attachment 1882186 [details]
CVO log file

Description of problem:

In a cluster, we can see CVO hotloops on the resources
$ grep -o 'Updating .*due to diff' cvo.log | sort | uniq -c
     32 Updating CRD performanceprofiles.performance.openshift.io due to diff
     32 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
     32 Updating OperatorGroup openshift-monitoring/openshift-cluster-monitoring due to diff
     32 Updating OperatorGroup openshift-operator-lifecycle-manager/olm-operators due to diff
     32 Updating OperatorGroup openshift-operators/global-operators due to diff
     32 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff

$ grep "Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff" -A20 cvo.log

I0519 06:24:01.286135       1 generic.go:109] Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("admissionregistration.k8s.io/v1"),
  		"kind":       string("ValidatingWebhookConfiguration"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true"), "include.release.openshift.io/single-node-developer": string("true"), "service.beta.openshift.io/inject-cabundle": string("true")}, "creationTimestamp": string("2022-05-19T03:44:02Z"), "generation": int64(72), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("admissionregistration.k8s.io/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}, "f:include.release.openshift.io/single-node-developer": map[string]interface{}{}, ...}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"5d28ef1d-78f0-4a88-9c2c-1f0fb01e3c57"}`: map[string]interface{}{}}}, "f:webhooks": map[string]interface{}{".": map[string]interface{}{}, `k:{"name":"vwb.performance.openshift.io"}`: map[string]interface{}{".": map[string]interface{}{}, "f:admissionReviewVersions": map[string]interface{}{}, "f:clientConfig": map[string]interface{}{".": map[string]interface{}{}, "f:service": map[string]interface{}{".": map[string]interface{}{}, "f:name": map[string]interface{}{}, "f:namespace": map[string]interface{}{}, "f:path": map[string]interface{}{}, ...}}, "f:failurePolicy": map[string]interface{}{}, ...}}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("admissionregistration.k8s.io/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:webhooks": map[string]interface{}{`k:{"name":"vwb.performance.openshift.io"}`: map[string]interface{}{"f:clientConfig": map[string]interface{}{"f:caBundle": map[string]interface{}{}}}}}, "manager": string("Go-http-client"), ...}}, ...},
  		"webhooks": []interface{}{
  			map[string]interface{}{
  				"admissionReviewVersions": []interface{}{string("v1")},
  				"clientConfig": map[string]interface{}{
+ 					"caBundle": string("LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURVVENDQWptZ0F3SUJBZ0lJQVBWdk9qMzhFNjB3RFFZSktvWklodmNOQVFFTEJRQXdOakUwTURJR0ExVUUKQXd3cmIzQmxibk5vYVdaMExYTmxjblpwWTJVdGMyVnlkbWx1WnkxemFXZHVaWEpBTVRZMU1qa3pNVGs0TlRBZQpGdzB5TWpBMU1Ua3dNelEyTWpSYUZ3MHlOREEzTVRjd016"...),
  					"service":  map[string]interface{}{"name": string("performance-addon-operator-service"), "namespace": string("openshift-cluster-node-tuning-operator"), "path": string("/validate-performance-openshift-io-v2-performanceprofile"), "port": int64(443)},
  				},
  				"failurePolicy":     string("Fail"),
  				"matchPolicy":       string("Equivalent"),
  				"name":              string("vwb.performance.openshift.io"),
+ 				"namespaceSelector": map[string]interface{}{},
+ 				"objectSelector":    map[string]interface{}{},
  				"rules":             []interface{}{map[string]interface{}{"apiGroups": []interface{}{string("performance.openshift.io")}, "apiVersions": []interface{}{string("v2")}, "operations": []interface{}{string("CREATE"), string("UPDATE")}, "resources": []interface{}{string("performanceprofiles")}, ...}},
  				"sideEffects":       string("None"),
  				"timeoutSeconds":    int64(10),
  			},
  		},
  	},
  }

$ cat manifests/0000_50_cluster-node-tuning-operator_45-webhook-configuration.yaml
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    service.beta.openshift.io/serving-cert-secret-name: performance-addon-operator-webhook-cert
  labels:
    name: performance-addon-operator-service
  name: performance-addon-operator-service
  namespace: openshift-cluster-node-tuning-operator
spec:
  ports:
    - name: "443"
      port: 443
      protocol: TCP
      targetPort: 4343
  selector:
    name: cluster-node-tuning-operator
  type: ClusterIP

---

apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
  annotations:
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    include.release.openshift.io/ibm-cloud-managed: "true"
    service.beta.openshift.io/inject-cabundle: "true"
  name: performance-addon-operator
webhooks:
  - admissionReviewVersions:
      - v1
    clientConfig:
      service:
        name: performance-addon-operator-service
        namespace: openshift-cluster-node-tuning-operator
        path: /validate-performance-openshift-io-v2-performanceprofile
        port: 443
    failurePolicy: Fail
    matchPolicy: Equivalent
    name: vwb.performance.openshift.io
    rules:
      - apiGroups:
          - performance.openshift.io
        apiVersions:
          - v2
        operations:
          - CREATE
          - UPDATE
        resources:
          - performanceprofiles
        scope: '*'
    sideEffects: None
    timeoutSeconds: 10

Version-Release number of the following components:
4.11.0-0.nightly-2022-05-18-053037

How reproducible:
1/1

Steps to Reproduce:
1. Install a cluster
2.
3.

Actual results:
CVO hotloops on ValidatingWebhookConfiguration /performance-addon-operator

Expected results:
CVO doesn't hotloop on the resource.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 3 Evgeni Vakhonin 2023-02-07 16:12:44 UTC
reproducing on 4.13.0-0.nightly-2023-02-01-112003

╰─ grep -o 'Updating .*due to diff' cluster-version-operator-dd645f75f-wmkdv.log | sort | uniq -c
     44 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
     44 Updating Namespace driver-toolkit due to diff
     44 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff
     44 Updating ValidatingWebhookConfiguration /controlplanemachineset.machine.openshift.io due to diff
     44 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff

ValidatingWebhookConfiguration hotlooping detected.

checking audit logs:
╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)'
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T12:11:51.900533Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T12:11:51.908461Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T13:22:12.742398Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:07:12.716573Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:07:12.725835Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:10:25.442118Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:10:25.449630Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:13:38.315396Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:13:38.327977Z {"metadata":{},"code":200}
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:16:51.091818Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:16:51.103642Z {"metadata":{},"code":200}

openshift-cluster-version and openshift-service-ca are fighting for the resource.



verifying in 4.13.0-0.nightly-2023-02-06-123554:

╰─ grep -o 'Updating .*due to diff' cluster-version-operator-cb7688bb6-9f8p6.log | sort | uniq -c
    121 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
    121 Updating Namespace driver-toolkit due to diff
    121 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff

other hotloops detected (tracked in another bugs), but no ValidatingWebhookConfiguration. good!

in audit logs:
╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)'
<no output>

as expected!

now lets check who is responding to resource update

first deleted ca-bundle
still no webhook hotloop in cvo logs
╰─ grep -o 'Updating .*due to diff' cluster-version-operator-cb7688bb6-9f8p6.log | sort | uniq -c
    172 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
    172 Updating Namespace driver-toolkit due to diff
    172 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff

only service-ca updated the resource - as expected.
╰─ oc adm node-logs `oc get no -l node-role.kubernetes.io/master -o jsonpath='{.items[*].metadata.name}'` --path=kube-apiserver/audit.log --raw|grep -h '"verb":"update".*"resource":"validatingwebhookconfigurations","name":"performance-addon-operator"' 2>/dev/null|jq -r '.user.username + " " + (.objectRef | .resource + " " + .namespace + " " + .name + " " + .apiGroup) + " " + .stageTimestamp + " " + (.responseStatus | tostring)'
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T14:24:26.287783Z {"metadata":{},"code":200}

now deleted the webhook - this time cvo updated the resource, and then service-ca injected caBundle. and no further cvo intervention 
system:serviceaccount:openshift-cluster-version:default validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T15:14:35.259737Z {"metadata":{},"code":200}
system:serviceaccount:openshift-service-ca:service-ca validatingwebhookconfigurations  performance-addon-operator admissionregistration.k8s.io 2023-02-07T15:14:35.271870Z {"metadata":{},"code":200}

in cvo log:
╰─ grep -ho 'Updating .*due to diff' *.log | sort | uniq -c
    235 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
    235 Updating Namespace driver-toolkit due to diff
    235 Updating PriorityLevelConfiguration /openshift-control-plane-operators due to diff
      1 Updating ValidatingWebhookConfiguration /performance-addon-operator due to diff

looks good to me

Comment 6 errata-xmlrpc 2023-05-17 22:46:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.13.0 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:1326


Note You need to log in before you can comment on or make changes to this bug.