Verified in 4.7.0-0.nightly-2021-05-01-081439 $ oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-05-01-081439 True False 109m Cluster version is 4.7.0-0.nightly-2021-05-01-081439 1. Set up a custom nameserver $ oc adm new-project mydns Created project mydns $ oc -n mydns create configmap coredns --from-file=./test2/Corefile configmap/coredns created $ oc -n mydns create deployment coredns --image=openshift/origin-coredns:latest --replicas=0 --port=5353 -- coredns --conf=/etc/coredns/Corefile deployment.apps/coredns created oc -n mydns set volume deployments/coredns --add --mount-path=/etc/coredns --type=configmap --configmap-name=coredns info: Generated volume name: volume-xhsx6 deployment.apps/coredns volume updated $ oc -n mydns scale deployments/coredns --replicas=1 deployment.apps/coredns scaled $ oc -n mydns expose deployments/coredns --port=5353 --protocol=UDP service/coredns exposed $ oc -n mydns get services NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE coredns ClusterIP 172.30.168.121 <none> 5353/UDP 10s 2. Configured cluster DNS to forward queries for the zone to this custom nameserver $ oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"mydns","zones":["redhat.com"],"forwardPlugin":{"upstreams":["172.30.168.121:5353"]}}]}}' dns.operator.openshift.io/default patched 3. created a test pod, from the pod, continuously performed nslookups for a name in the zone for which the custom nameserver is responsible, nslookups succeed $ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json pod/hello-pod created [jechen@jechen ~]$ oc rsh hello-pod / # while :; do nslookup www.redhat.com; sleep 0.5; done Server: 172.30.0.10 Address: 172.30.0.10:53 Name: www.redhat.com Address: 1.2.3.4 Server: 172.30.0.10 Address: 172.30.0.10:53 Name: www.redhat.com Address: 1.2.3.4 Server: 172.30.0.10 Address: 172.30.0.10:53 ^C / # exit command terminated with exit code 130 4. changed the replicas for the deployment to 0 $ oc -n mydns scale deployments/coredns --replicas=0 deployment.apps/coredns scaled 5. after step 4, nslookup from test pod failed as expected $ oc rsh hello-pod / # while :; do nslookup www.redhat.com; sleep 0.5; done ;; connection timed out; no servers could be reached Server: 172.30.0.10 Address: 172.30.0.10:53 ** server can't find www.redhat.com: SERVFAIL ** server can't find www.redhat.com: SERVFAIL Server: 172.30.0.10 Address: 172.30.0.10:53 ^C / # exit command terminated with exit code 130 $ for pod in $(oc -n openshift-dns get pods -o name) > do oc -n openshift-dns logs -c dns $pod > done .:5353 [INFO] plugin/reload: Running configuration MD5 = 88abbc1ca2ae7b733e12d8821a5b24b8 CoreDNS-1.6.6 <--snip--> [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:36385->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:59034->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:50547->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:59893->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:42549->172.30.168.121:5353: read: connection refused [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:33987->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:48401->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:39898->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:50438->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:46909->172.30.168.121:5353: read: connection refused [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:47045->172.30.168.121:5353: read: connection refused [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:55262->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:57168->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:47617->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:59960->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:53228->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:56946->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:48361->172.30.168.121:5353: read: connection refused [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:40388->172.30.168.121:5353: read: connection refused [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:33089->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:49169->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:51940->172.30.168.121:5353: i/o timeout [ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:55137->172.30.168.121:5353: i/o timeout .:5353 <---snip---> 7. verified the custom nameserver Corefile has included "errors" plugin $ oc -n openshift-dns get configmaps/dns-default -o yaml apiVersion: v1 data: Corefile: | # mydns redhat.com:5353 { forward . 172.30.168.121:5353 errors <-- verified the fix with https://github.com/openshift/cluster-dns-operator/pull/268 bufsize 1232 } .:5353 { bufsize 1232 errors health { lameduck 20s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus 127.0.0.1:9153 forward . /etc/resolv.conf { policy sequential } cache 900 { denial 9984 30 } reload } kind: ConfigMap <---snip--->
This bug will be shipped as part of next z-stream release 4.7.11 on May 19th, as 4.7.10 was dropped due to a blocker https://bugzilla.redhat.com/show_bug.cgi?id=1958518.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1550