Bug 1954760

Summary: CoreDNS's "errors" plugin is not enabled for custom upstream resolvers
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NetworkingAssignee: Miciah Dashiel Butler Masters <mmasters>
Networking sub component: DNS QA Contact: jechen <jechen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: amcdermo, aos-bugs, hongli, sgreene
Version: 4.8   
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-05-20 11:52:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1953609    
Bug Blocks:    

Comment 2 jechen 2021-05-06 22:56:49 UTC
verified in 4.6.0-0.nightly-2021-05-06-185359

$ oc get clusterversions.config.openshift.io 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2021-05-06-185359   True        False         2m1s    Cluster version is 4.6.0-0.nightly-2021-05-06-185359

1. Set up a custom nameserver
$ oc adm new-project mydns
Created project mydns


$ oc -n mydns create configmap coredns --from-file=./test2/Corefile
configmap/coredns created

$ oc -n mydns create deployment coredns --image=openshift/origin-coredns:latest --replicas=0 --port=5353 -- coredns --conf=/etc/coredns/Corefile
deployment.apps/coredns created


$ oc -n mydns set volume deployments/coredns --add --mount-path=/etc/coredns --type=configmap --configmap-name=coredns
info: Generated volume name: volume-fhfnx
deployment.apps/coredns volume updated


$ oc -n mydns scale deployments/coredns --replicas=1
deployment.apps/coredns scaled


$ oc -n mydns expose deployments/coredns --port=5353 --protocol=UDP
service/coredns exposed

$ oc -n mydns get services
NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
coredns   ClusterIP   172.30.249.112   <none>        5353/UDP   11s


2. Configured cluster DNS to forward queries for the zone to this custom nameserver
$ oc patch dns.operator/default --type=merge --patch='{"spec":{"servers":[{"name":"mydns","zones":["redhat.com"],"forwardPlugin":{"upstreams":["172.30.249.112:5353"]}}]}}'
dns.operator.openshift.io/default patched


3. created a test pod,  from the pod, continuously performed nslookups for a name in the zone for which the custom nameserver is responsible
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created


$ oc rsh hello-pod
/ # while :; do nslookup www.redhat.com; sleep 0.5; done
Server:		172.30.0.10
Address:	172.30.0.10:53

Non-authoritative answer:
www.redhat.com	canonical name = ds-www.redhat.com.edgekey.net
ds-www.redhat.com.edgekey.net	canonical name = ds-www.redhat.com.edgekey.net.globalredir.akadns.net
ds-www.redhat.com.edgekey.net.globalredir.akadns.net	canonical name = e3396.dscx.akamaiedge.net
Name:	e3396.dscx.akamaiedge.net
Address: 23.59.99.64

Non-authoritative answer:
www.redhat.com	canonical name = ds-www.redhat.com.edgekey.net
ds-www.redhat.com.edgekey.net	canonical name = ds-www.redhat.com.edgekey.net.globalredir.akadns.net
ds-www.redhat.com.edgekey.net.globalredir.akadns.net	canonical name = e3396.dscx.akamaiedge.net
Name:	e3396.dscx.akamaiedge.net
Address: 2600:1408:9000:481::d44
Name:	e3396.dscx.akamaiedge.net
Address: 2600:1408:9000:496::d44

Server:		172.30.0.10
Address:	172.30.0.10:53
<--snip-->


4. changed the replicas for the deployment to 0
 oc -n mydns scale deployments/coredns --replicas=0
deployment.apps/coredns scaled


5. after step 4, nslookup from test pod failed as expected
$ oc rsh hello-pod
/ # while :; do nslookup www.redhat.com; sleep 0.5; done
;; connection timed out; no servers could be reached

Server:		172.30.0.10
Address:	172.30.0.10:53

** server can't find www.redhat.com: SERVFAIL

*** Can't find www.redhat.com: No answer

Server:		172.30.0.10
Address:	172.30.0.10:53

** server can't find www.redhat.com: SERVFAIL

** server can't find www.redhat.com: SERVFAIL

<--snip-->


6. verified DNS pods started logging errors
$ for pod in $(oc -n openshift-dns get pods -o name)
> do oc -n openshift-dns logs -c dns $pod
> done
<--snip-->
.:5353
[INFO] plugin/reload: Running configuration MD5 = d587e9b1a9e78220deb55ca080b8f672
CoreDNS-1.6.6
linux/amd64, go1.15.7, 
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 20s
[INFO] plugin/reload: Running configuration MD5 = 9ce9e29fc6a4173f57ac6b0c9f5b5721
[INFO] Reloading complete
.:5353
[INFO] plugin/reload: Running configuration MD5 = d587e9b1a9e78220deb55ca080b8f672
CoreDNS-1.6.6
linux/amd64, go1.15.7, 
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 20s
[INFO] plugin/reload: Running configuration MD5 = 9ce9e29fc6a4173f57ac6b0c9f5b5721
[INFO] Reloading complete
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:47949->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:51175->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:55914->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:41773->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:45510->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:35381->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:47768->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:59455->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:45804->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:50049->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:47915->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:50498->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:37584->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:55970->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:52351->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:53309->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:35827->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:54926->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:55623->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:44810->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:38616->172.30.249.112:5353: read: connection refused
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:49047->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:56183->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. AAAA: read udp 10.131.0.12:59278->172.30.249.112:5353: i/o timeout
[ERROR] plugin/errors: 2 www.redhat.com. A: read udp 10.131.0.12:54129->172.30.249.112:5353: i/o timeout
<--snip-->


7. verified the custom nameserver Corefile has included "errors" plugin 
$ oc -n openshift-dns get configmaps/dns-default -o yaml
apiVersion: v1
data:
  Corefile: |
    # mydns
    redhat.com:5353 {
        forward . 172.30.249.112:5353
        errors                 verified expected result fixed by https://github.com/openshift/cluster-dns-operator/pull/271
    }
    .:5353 {
        errors
        health {
            lameduck 20s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            upstream
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        forward . /etc/resolv.conf {
            policy sequential
        }
        cache 900 {
            denial 9984 30
        }
        reload
    }
<--snip-->

Comment 5 errata-xmlrpc 2021-05-20 11:52:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.29 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1521