Bug 1935297 - hostname lookup delays when master node down
Summary: hostname lookup delays when master node down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: DNS
Version: 4.5
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 4.5.z
Assignee: Miciah Dashiel Butler Masters
QA Contact: jechen
URL:
Whiteboard:
Depends On: 1928773
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-04 15:30 UTC by Miciah Dashiel Butler Masters
Modified: 2021-04-09 09:07 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1928773
Environment:
Last Closed: 2021-03-25 12:39:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 242 0 None open [release-4.5] Bug 1935297: Set CoreDNS readiness probe period and timeout each to 3 seconds 2021-03-05 16:24:31 UTC
Github openshift kubernetes pull 604 0 None open [sdn-4.5-kubernetes-1.18.2] Bug 1935297: UPSTREAM: <carry>: Prefer local endpoint for cluster DNS service 2021-03-09 01:01:07 UTC
Github openshift sdn pull 268 0 None closed Bug 1935297: Prefer local endpoint for cluster DNS service 2021-03-15 14:12:37 UTC
Red Hat Product Errata RHBA-2021:0840 0 None None None 2021-03-25 12:39:41 UTC

Comment 3 jechen 2021-03-15 19:10:36 UTC
verified with 4.5.0-0.nightly-2021-03-15-055437

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2021-03-15-055437   True        False         34m     Cluster version is 4.5.0-0.nightly-2021-03-15-055437

## in first terminal, create a test pod, and rsh into the test pod, run the infinite loop to measure time for DNS lookup 
$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created
$ pod/hello-pod created
bash: pod/hello-pod: No such file or directory
$ oc rsh hello-pod
/ # while (true); do time getent hosts kubernetes.default.svc.cluster.local; sleep 1; done
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.02s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
<----snip----->


##In second terminal, reboot one of the master nodes
$ oc debug node/ci-ln-7y7qydt-f76d1-8qx7l-master-0 
Starting pod/ci-ln-7y7qydt-f76d1-8qx7l-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.5
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# reboot
Terminated

### In first terminal, monitor to ensure that there is no DNS lookup delay when the master is rebooting (for about 3-5 min)
<----snip----->
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.02s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
<----snip----->

$  oc -n openshift-dns get ds/dns-default -oyaml
<---snip--->
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 3     <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
          successThreshold: 1
          timeoutSeconds: 3    <--verified the fix with https://github.com/openshift/cluster-dns-operator/pull/242
<---snip--->

Comment 7 errata-xmlrpc 2021-03-25 12:39:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.5.36 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0840


Note You need to log in before you can comment on or make changes to this bug.