Bug 1953609

Summary: CoreDNS's "errors" plugin is not enabled for custom upstream resolvers
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: DNS QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amcdermo, aos-bugs, hongli, sgreene
Version: 4.8   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-19 15:16:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1934905    
Bug Blocks: 1954760    

Comment 4 jechen 2021-05-03 17:17:45 UTC
Verified in 4.7.0-0.nightly-2021-05-01-081439

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-05-01-081439   True        False         109m    Cluster version is 4.7.0-0.nightly-2021-05-01-081439

1. Set up a custom nameserver
$ oc adm new-project mydns
Created project mydns

$ oc -n mydns create configmap coredns --from-file=./test2/Corefile
configmap/coredns created

$ oc -n mydns create deployment coredns --image=openshift/origin-coredns:latest --replicas=0 --port=5353 -- coredns --conf=/etc/coredns/Corefile
deployment.apps/coredns created

oc -n mydns set volume deployments/coredns --add --mount-path=/etc/coredns --type=configmap --configmap-name=coredns
info: Generated volume name: volume-xhsx6
deployment.apps/coredns volume updated

$ oc -n mydns scale deployments/coredns --replicas=1
deployment.apps/coredns scaled

$ oc -n mydns expose deployments/coredns --port=5353 --protocol=UDP
service/coredns exposed

$ oc -n mydns get services
NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
coredns   ClusterIP   172.30.168.121   <none>        5353/UDP   10s


2. Configured cluster DNS to forward queries for the zone to this custom nameserver
$ oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"mydns","zones":["redhat.com"],"forwardPlugin":{"upstreams":["172.30.168.121:5353"]}}]}}'
dns.operator.openshift.io/default patched

3. created a test pod,  from the pod, continuously performed nslookups for a name in the zone for which the custom nameserver is responsible, nslookups succeed
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created


[jechen@jechen ~]$ oc rsh hello-pod
/ # while :; do nslookup www.redhat.com; sleep 0.5; done
Server:		172.30.0.10
Address:	172.30.0.10:53

Name:	www.redhat.com
Address: 1.2.3.4

Server:		172.30.0.10
Address:	172.30.0.10:53

Name:	www.redhat.com
Address: 1.2.3.4

Server:		172.30.0.10
Address:	172.30.0.10:53

^C
/ # exit
command terminated with exit code 130


4. changed the replicas for the deployment to 0
$ oc -n mydns scale deployments/coredns --replicas=0
deployment.apps/coredns scaled

5. after step 4, nslookup from test pod failed as expected
$ oc rsh hello-pod
/ # while :; do nslookup www.redhat.com; sleep 0.5; done
;; connection timed out; no servers could be reached

Server:		172.30.0.10
Address:	172.30.0.10:53

** server can't find www.redhat.com: SERVFAIL

** server can't find www.redhat.com: SERVFAIL

Server:		172.30.0.10
Address:	172.30.0.10:53

^C
/ # exit
command terminated with exit code 130


$ for pod in $(oc -n openshift-dns get pods -o name)  
> do oc -n openshift-dns logs -c dns $pod
> done
.:5353
[INFO] plugin/reload: Running configuration MD5 = 88abbc1ca2ae7b733e12d8821a5b24b8
CoreDNS-1.6.6
<--snip-->
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:36385->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:59034->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:50547->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:59893->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:42549->172.30.168.121:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:33987->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:48401->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:39898->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:50438->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:46909->172.30.168.121:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:47045->172.30.168.121:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:55262->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:57168->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:47617->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:59960->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:53228->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:56946->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:48361->172.30.168.121:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:40388->172.30.168.121:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:33089->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:49169->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.6:51940->172.30.168.121:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.6:55137->172.30.168.121:5353: i/o timeout
.:5353
<---snip--->

7. verified the custom nameserver Corefile has included "errors" plugin 
$ oc -n openshift-dns get configmaps/dns-default -o yaml
apiVersion: v1
data:
  Corefile: |
    # mydns
    redhat.com:5353 {
        forward . 172.30.168.121:5353
        errors                <-- verified the fix with https://github.com/openshift/cluster-dns-operator/pull/268
        bufsize 1232
    }
    .:5353 {
        bufsize 1232
        errors                
        health {
            lameduck 20s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            upstream
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus 127.0.0.1:9153
        forward . /etc/resolv.conf {
            policy sequential
        }
        cache 900 {
            denial 9984 30
        }
        reload
    }
kind: ConfigMap
<---snip--->

Comment 5 Siddharth Sharma 2021-05-10 17:59:33 UTC
This bug will be shipped as part of next z-stream release 4.7.11 on May 19th, as 4.7.10 was dropped due to a blocker https://bugzilla.redhat.com/show_bug.cgi?id=1958518.

Comment 9 errata-xmlrpc 2021-05-19 15:16:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1550