Cause:
Cluster upstream resolver returns DNS response that exceeds 512 bytes via UDP.
Consequence:
CoreDNS may return SERVFAIL and or log various error messages, sometimes forcing the client to retry over TCP.
Fix:
Enable the CoreDNS bufisze plugin with a UDP buffer size of 1232 bytes.
Result: CoreDNS is less likely to return SERVFAIL or present any runtime errors when handling large DNS responses via UDP. Also, UDP packet fragmentation is less likely to occur.
Verified in 4.7.0-0.nightly-2021-05-01-081439
$ oc get clusterversions.config.openshift.io
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-05-01-081439 True False 2m26s Cluster version is 4.7.0-0.nightly-2021-05-01-081439
# verified bufsize in default DNS
[jechen@jechen ~]$ oc -n openshift-dns get cm/dns-default -oyaml
apiVersion: v1
data:
Corefile: |
.:5353 {
bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267
errors
health {
lameduck 20s
}
<--snip-->
# created a test pod as custom DNS server, added custom forwarding into coreDNS, verified bufsize in custom DNS
$ oc create -f https://raw.githubusercontent.com/lihongan/test-dnsmasq/master/test-dnsmasq.json
pod/test-dnsmasq created
$ oc get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-dnsmasq 1/1 Running 0 58s 10.131.0.31 ci-ln-0bskzdt-f76d1-fptks-worker-b-rk66z <none> <none>
$ oc edit dns.operator/default
<--snip-->
spec:
servers:
- forwardPlugin:
upstreams:
- 10.131.0.31
name: test
zones:
- mytest.ocp
<--snip-->
$ oc get cm dns-default -n openshift-dns -o yaml
apiVersion: v1
data:
Corefile: |
# test
mytest.ocp:5353 {
forward . 10.131.0.31
errors
bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267
}
.:5353 {
bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267
errors
health {
lameduck 20s
}
<--snip-->
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.7.11 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:1550
Verified in 4.7.0-0.nightly-2021-05-01-081439 $ oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-05-01-081439 True False 2m26s Cluster version is 4.7.0-0.nightly-2021-05-01-081439 # verified bufsize in default DNS [jechen@jechen ~]$ oc -n openshift-dns get cm/dns-default -oyaml apiVersion: v1 data: Corefile: | .:5353 { bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267 errors health { lameduck 20s } <--snip--> # created a test pod as custom DNS server, added custom forwarding into coreDNS, verified bufsize in custom DNS $ oc create -f https://raw.githubusercontent.com/lihongan/test-dnsmasq/master/test-dnsmasq.json pod/test-dnsmasq created $ oc get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-dnsmasq 1/1 Running 0 58s 10.131.0.31 ci-ln-0bskzdt-f76d1-fptks-worker-b-rk66z <none> <none> $ oc edit dns.operator/default <--snip--> spec: servers: - forwardPlugin: upstreams: - 10.131.0.31 name: test zones: - mytest.ocp <--snip--> $ oc get cm dns-default -n openshift-dns -o yaml apiVersion: v1 data: Corefile: | # test mytest.ocp:5353 { forward . 10.131.0.31 errors bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267 } .:5353 { bufsize 1232 <-- verified fix by https://github.com/openshift/cluster-dns-operator/pull/267 errors health { lameduck 20s } <--snip-->