Bug 1935297

Summary: hostname lookup delays when master node down
Product: OpenShift Container Platform Reporter: Miciah Dashiel Butler Masters <mmasters>
Component: DNSAssignee: Miciah Dashiel Butler Masters <mmasters>
Status: CLOSED ERRATA QA Contact: jechen <jechen>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.5CC: aarapov, abodhe, agabriel, amcdermo, aos-bugs, bbennett, bjarolim, dsquirre, hongli, juqiao, kelly.brown1, mfuruta, mmasters, mzali, openshift-bugs-escalate, openshift-bugzilla-robot, rbolling, rh-container, rhowe, sgaikwad
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1928773 Environment:
Last Closed: 2021-03-25 12:39:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1928773    
Bug Blocks:    

Comment 3 jechen 2021-03-15 19:10:36 UTC
verified with 4.5.0-0.nightly-2021-03-15-055437

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-03-15-055437   True        False         34m     Cluster version is 4.5.0-0.nightly-2021-03-15-055437

## in first terminal, create a test pod, and rsh into the test pod, run the infinite loop to measure time for DNS lookup 
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created
$ pod/hello-pod created
bash: pod/hello-pod: No such file or directory
$ oc rsh hello-pod
/ # while (true); do time getent hosts kubernetes.default.svc.cluster.local; sleep 1; done
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.02s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
<----snip----->


##In second terminal, reboot one of the master nodes
$ oc debug node/ci-ln-7y7qydt-f76d1-8qx7l-master-0 
Starting pod/ci-ln-7y7qydt-f76d1-8qx7l-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.5
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# reboot
Terminated

### In first terminal, monitor to ensure that there is no DNS lookup delay when the master is rebooting (for about 3-5 min)
<----snip----->
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.02s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
<----snip----->

$  oc -n openshift-dns get ds/dns-default -oyaml
<---snip--->
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 3     <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
          successThreshold: 1
          timeoutSeconds: 3    <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
<---snip--->

Comment 7 errata-xmlrpc 2021-03-25 12:39:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.36 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0840