Description of problem: Because of a missing readiness probe, CoreDNS pods can report Ready while unable to service a request (due to initialization, apiserver connectivity loss, etc.). When in this state, service traffic will still be routed to the CoreDNS pod, potentially causing timeouts or other connection errors when performing DNS queries. Version-Release number of selected component (if applicable): Payload: 4.1.0-0.nightly-2019-05-15-151517 How reproducible: See https://bugzilla.redhat.com/show_bug.cgi?id=1711364 In such a cluster state, the following command will sometimes fail when the request is distributed to the affected CoreDNS pod: dig @172.30.0.10 kubernetes.default.svc.cluster.local Actual results: Expected results: Additional info:
This bug was first observed while debugging https://bugzilla.redhat.com/show_bug.cgi?id=1711364
verified with 4.2.0-0.nightly-2019-06-25-003324 and issue has been fixed. $ oc get ds/dns-default -n openshift-dns -o yaml spec: template: spec: containers: - args: name: dns readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 check iptables and ensure dns query will not be distributed to the affected pod. -A KUBE-SERVICES -d 172.30.0.10/32 -p udp -m comment --comment "openshift-dns/dns-default:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-BGNS3J6UB7MMLVDO -A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.20000000019 -j KUBE-SEP-5AKZ7CS5F5VMNQSF -A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.25000000000 -j KUBE-SEP-6ZXZTDOJTY36K2QO -A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-YQXEHXFWXRE6J2LE -A KUBE-SVC-BGNS3J6UB7MMLVDO -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-UVEJRI5KEK43CMS5 -A KUBE-SVC-BGNS3J6UB7MMLVDO -j KUBE-SEP-IABYQC3JKZQOBY4V
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922