verified with 4.5.0-0.nightly-2021-03-15-055437
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.5.0-0.nightly-2021-03-15-055437 True False 34m Cluster version is 4.5.0-0.nightly-2021-03-15-055437
## in first terminal, create a test pod, and rsh into the test pod, run the infinite loop to measure time for DNS lookup
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created
$ pod/hello-pod created
bash: pod/hello-pod: No such file or directory
$ oc rsh hello-pod
/ # while (true); do time getent hosts kubernetes.default.svc.cluster.local; sleep 1; done
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.02s
user 0m 0.00s
sys 0m 0.00s
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
<----snip----->
##In second terminal, reboot one of the master nodes
$ oc debug node/ci-ln-7y7qydt-f76d1-8qx7l-master-0
Starting pod/ci-ln-7y7qydt-f76d1-8qx7l-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.5
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# reboot
Terminated
### In first terminal, monitor to ensure that there is no DNS lookup delay when the master is rebooting (for about 3-5 min)
<----snip----->
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.02s
user 0m 0.00s
sys 0m 0.00s
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local
real 0m 0.00s
user 0m 0.00s
sys 0m 0.00s
<----snip----->
$ oc -n openshift-dns get ds/dns-default -oyaml
<---snip--->
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 3 <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
successThreshold: 1
timeoutSeconds: 3 <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
<---snip--->
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 4.5.36 bug fix update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:0840