Description of problem: On Openshift 4.9 openshift-dns service is created with internalTrafficPolicy set to Cluster and OVN doesn't seem to have the same fix we introduced in OpenshiftSDN to have sdn poins query the local dns pod endpoint instead of the dns-default service. https://bugzilla.redhat.com/show_bug.cgi?id=1919737 The code seems to mention that this change will be removed once internalTrafficPolicy is implemented. Is the Cluster setting correctly or should we have service with internalTrafficPolicy set to Local? If this setting on the service is not suppose to be changed to possible other bad side effects on the cluster, can we have a similar implementation on OVN like it looks we are suppose to be working on it from the same bugzilla mentioned above? OpenShift release version: OCP 4.9 with OVNKubernetes Cluster Platform: All How reproducible: Unknown Steps to Reproduce (in detail): Unknown Impact of the problem: Sporadic DNS failures. Additional info: $ oc get svc dns-default -o yaml -n openshift-dns apiVersion: v1 kind: Service metadata: annotations: service.alpha.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1641922101 service.beta.openshift.io/serving-cert-secret-name: dns-default-metrics-tls service.beta.openshift.io/serving-cert-signed-by: openshift-service-serving-signer@1641922101 creationTimestamp: "2022-01-11T17:32:01Z" labels: dns.operator.openshift.io/owning-dns: default name: dns-default namespace: openshift-dns ownerReferences: - apiVersion: operator.openshift.io/v1 controller: true kind: DNS name: default uid: 8397e280-d4f3-44a5-8a8f-a978bbdbaa7e resourceVersion: "10030" uid: ce8dcef3-7ba6-45c4-9ae2-c00f494e155c spec: clusterIP: 172.32.0.10 clusterIPs: - 172.32.0.10 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: dns port: 53 protocol: UDP targetPort: dns - name: dns-tcp port: 53 protocol: TCP targetPort: dns-tcp - name: metrics port: 9154 protocol: TCP targetPort: metrics selector: dns.operator.openshift.io/daemonset-dns: default sessionAffinity: None type: ClusterIP status: loadBalancer: {}
Setting blocker- as this doesn't appear to be a regression, upgrade issue, or otherwise something that should block a release. This issue appears to be related to bug 1919737, which we fixed with a patch to openshift-sdn. This new BZ is about addressing the same issue in OVN-Kubernetes. The spec.internalTrafficPolicy API field is relatively new; "internalTrafficPolicy: Cluster" is the default the API sets. The DNS operator isn't explicitly setting internalTrafficPolicy. The Kubernetes documentation is contradictory as to when "internalTrafficPolicy" was enabled by default (<https://kubernetes.io/docs/concepts/services-networking/service-traffic-policy/> says Kubernetes 1.23, and <https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.22.md> says Kubernetes 1.22). The field seems to be present in Kubernetes 1.22 (as evidenced by bug 2002461), so we can set "internalTrafficPolicy: Local" in OpenShift 4.9 (which is based on Kubernetes 1.22; see <https://access.redhat.com/solutions/4870701>) and later. I'll check with the SDN team to see whether specifying "internalTrafficPolicy: Local" works or could break anything with openshift-sdn and OVN-Kubernetes.
Surya from the SDN team reminded me about bug 2039698 (and 4.9.z backport bug 2055317), which adds a fix in OVN-Kubernetes similar to the one in openshift-sdn. Surya also reminded me that "internalTrafficPolicy: Local" is not really what we need for the DNS service; we need the service to *prefer* a local endpoint and fall back to any available endpoint if no local endpoint is available. There is work upstream to add "internalTrafficPolicy: PreferLocal" (see <https://github.com/kubernetes/enhancements/pull/3016>), but right now, "internalTrafficPolicy" does not fit our needs. I'm closing this report as a duplicate of bug 2055317; please let me know if I have misunderstood the request in this BZ. *** This bug has been marked as a duplicate of bug 2055317 ***