CVO keeps updating services that lack spec.sessionAffinity: {"count":77,"path":"/api/v1/namespaces/openshift-cluster-samples-operator/services/metrics"} {"count":78,"path":"/api/v1/namespaces/openshift-cluster-version/services/cluster-version-operator"} {"count":50,"path":"/api/v1/namespaces/openshift-config-managed/configmaps/signatures-managed"} {"count":77,"path":"/api/v1/namespaces/openshift-console/services/downloads"} {"count":75,"path":"/api/v1/namespaces/openshift-image-registry/services/image-registry-operator"} {"count":78,"path":"/api/v1/namespaces/openshift-machine-config-operator/services/machine-config-daemon"} {"count":77,"path":"/api/v1/namespaces/openshift-marketplace/services/marketplace-operator-metrics"}
Stefan suggests possibly waiting until API-server support for server-side apply [1] goes GA and rerolling the CVO's apply logic to use that instead of client-side merging, which might help here. And bug 1879184 might end up with a [Late] CI guard based on the audit logs. But whatever is going on here is unlikely to be new in 4.6, so punting to 4.7. [1]: https://kubernetes.io/blog/2020/04/01/kubernetes-1.18-feature-server-side-apply-beta-2/
I don't think #c1 reflects what I meant. I meant that it is an infinite game with lots of chance for mistakes and failure in writing the perfect client-side merging funcs for all types. Instead the right solution is to triage these bugs, fix the manifests for now and add an e2e test that uncovers the issues before new manifests merge.
I fear rewriting services leads to load of the endpoints controllers and possibly to load on the networking stack updating iptables. Increasing severity until proven otherwise.
I double checked and it seems that kube-apiserver notices for this one that nothing changed and omits updating the object. So this BZ is purely about unnecessary load on the apiserver and etcd.
> So this BZ is purely about unnecessary load on the apiserver and etcd. Punting back to 4.7, because I don't see any new-in-4.6 regressions here, and it's really late in the 4.6 cycle to make new 4.6 blockers unless we have a solid story around why this is a critical issue. > ... and add an e2e test that uncovers the issues before new manifests merge. This is bug 1879184, right?
It's end of sprint, and this is not going to get fixed in the next few hours. Hopefully we will at least get the Late audit guard from bug 1879184 in next sprint, and then we'll see which team should fix this issue.
Adding UpcomingSprint as we have reached the end of the current sprint and pushing this bug to the next sprint.
Marking this as not a blocker as this is not a regression and we have the design for long time.
CVO's Service validation does not check sessionAffinity however it does check spec.type. The spec.type check is incorrect in that it should only be checking the field if it was set in the manifest otherwise CVO is continuously trying to clear the server set default of "ClusterIP".
Made the title more generic, because the issue is the presence of hotlooping, not a particular property. For example, if we had two separate properties that both triggered Session hotlooping, we'd want to fix both of them to close out this bug.
Reproduced with 4.8.0-fc.3-x86_64(take openshift-cluster-version/services/cluster-version-operator and openshift-marketplace/services/marketplace-operator-metrics for example) # cat manifests/0000_50_operator-marketplace_08_service.yaml|grep -A15 "spec:" spec: selector: name: marketplace-operator ports: - name: metrics port: 8383 protocol: TCP targetPort: 8383 - name: https-metrics port: 8081 protocol: TCP targetPort: 8081 # curl -s https://raw.githubusercontent.com/openshift/cluster-version-operator/master/install/0001_00_cluster-version-operator_03_service.yaml|grep -A7 "spec:" spec: type: ClusterIP selector: k8s-app: cluster-version-operator ports: - name: metrics port: 9099 # chosen to be in the internal open range Check from cvo logs to see there are continuous service update request every 3 minutes for above two resources. # ./oc logs cluster-version-operator-6b5f889866-d9t88|grep -E "request: PUT.*services.*marketplace-operator-metrics|request: PUT.*services.*cluster-version-operator"|head -n8 I0524 07:26:12.785812 1 request.go:591] Throttling request took 95.823647ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-marketplace/services/marketplace-operator-metrics I0524 07:26:14.986352 1 request.go:591] Throttling request took 96.07034ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cluster-version/services/cluster-version-operator I0524 07:29:31.949862 1 request.go:591] Throttling request took 96.718612ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-marketplace/services/marketplace-operator-metrics I0524 07:29:34.100938 1 request.go:591] Throttling request took 97.068142ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cluster-version/services/cluster-version-operator I0524 07:32:50.843349 1 request.go:591] Throttling request took 95.999508ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-marketplace/services/marketplace-operator-metrics I0524 07:32:53.043228 1 request.go:591] Throttling request took 94.302534ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cluster-version/services/cluster-version-operator I0524 07:36:10.015879 1 request.go:591] Throttling request took 95.76657ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-marketplace/services/marketplace-operator-metrics I0524 07:36:12.215440 1 request.go:591] Throttling request took 95.327531ms, request: PUT:https://api-int.jliu-48.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cluster-version/services/cluster-version-operator # ./oc logs cluster-version-operator-6b5f889866-d9t88|grep "request.go.*request: PUT.*services.*marketplace-operator-metrics"|wc -l 29 # ./oc logs cluster-version-operator-6b5f889866-d9t88|grep "request.go.*request: PUT.*services.*cluster-version-operator"|wc -l 31
Verified on 4.8.0-0.nightly-2021-05-21-233425 # ./oc logs cluster-version-operator-69b746856b-6v6nt|grep "request.go.*request: PUT.*services"|wc -l 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438