Bug 1928304 - hostname lookup delays when master node down
Summary: hostname lookup delays when master node down
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: DNS
Version: 4.6
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Miciah Dashiel Butler Masters
QA Contact: jechen
URL:
Whiteboard:
: 1930913 (view as bug list)
Depends On: 1919737
Blocks: 1928773 1930917
TreeView+ depends on / blocked
 
Reported: 2021-02-12 23:08 UTC by OpenShift BugZilla Robot
Modified: 2021-04-09 09:03 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1928773 (view as bug list)
Environment:
Last Closed: 2021-03-10 11:24:00 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-dns-operator pull 235 0 None closed [release-4.7] Bug 1928304: Set CoreDNS readiness probe period and timeout each to 3 seconds 2021-03-02 05:29:03 UTC
Github openshift sdn pull 259 0 None closed [release-4.7] Bug 1928304: Prefer local endpoint for cluster DNS service 2021-02-27 05:15:32 UTC
Red Hat Product Errata RHBA-2021:0678 0 None None None 2021-03-10 11:24:31 UTC

Comment 1 Miciah Dashiel Butler Masters 2021-02-23 03:46:53 UTC
*** Bug 1930913 has been marked as a duplicate of this bug. ***

Comment 2 Miciah Dashiel Butler Masters 2021-02-25 22:20:26 UTC
Waiting for https://github.com/openshift/cluster-dns-operator/pull/235 to be approved.

Comment 6 jechen 2021-03-02 18:08:41 UTC
attempted to verify with 4.7.0-0.nightly-2021-03-01-085007, could only verify pull 259. pull 235 is missing, waiting for next image.

Comment 7 jechen 2021-03-03 16:29:12 UTC
no new 4.7 nightly build available today

Comment 8 jechen 2021-03-04 03:10:34 UTC
verified with 4.7.0-0.nightly-2021-03-04-004412, test passed


oc get clusterversions.config.openshift.io
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-03-04-004412   True        False         20m     Cluster version is 4.7.0-0.nightly-2021-03-04-004412


## in first terminal, create a test pod, and rsh into the test pod, run the infinite loop to measure time for DNS lookup

[jechen@jechen ~]$  oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json
pod/hello-pod created
[jechen@jechen ~]$ oc rsh hello-pod
/ # while (true); do time getent hosts kubernetes.default.svc.cluster.local; sleep 1; done
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.08s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s

<----snip----->

### in a second terminal, reboot a master node
$ oc get node
NAME                                            STATUS   ROLES    AGE   VERSION
hongli-47bv-hv7k6-master-0                      Ready    master   43m   v1.20.0+5fbfd19
hongli-47bv-hv7k6-master-1                      Ready    master   43m   v1.20.0+5fbfd19
hongli-47bv-hv7k6-master-2                      Ready    master   43m   v1.20.0+5fbfd19
hongli-47bv-hv7k6-worker-northcentralus-fg28q   Ready    worker   28m   v1.20.0+5fbfd19
hongli-47bv-hv7k6-worker-northcentralus-frggf   Ready    worker   34m   v1.20.0+5fbfd19
hongli-47bv-hv7k6-worker-northcentralus-wk6ls   Ready    worker   34m   v1.20.0+5fbfd19
[jechen@jechen ~]$ oc debug node/hongli-47bv-hv7k6-master-1
Starting pod/hongli-47bv-hv7k6-master-1-debug ...
To use host binaries, run `chroot /host`
chroot /hostPod IP: 10.0.0.5
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot



### In first terminal, monitor to ensure that there is no DNS lookup delay when the master is rebooting (for about 3-5 min)
<----snip----->
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.01s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
172.30.0.1        kubernetes.default.svc.cluster.local  kubernetes.default.svc.cluster.local
real	0m 0.00s
user	0m 0.00s
sys	0m 0.00s
<----snip----->



### verified that dns readiness probe has been changed with pull 235
$ oc -n openshift-dns get ds/dns-default -oyaml


        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 3       <------verified the change with https://github.com/openshift/cluster-dns-operator/pull/235/
          successThreshold: 1
          timeoutSeconds: 3      <------verified the change with https://github.com/openshift/cluster-dns-operator/pull/235/

Comment 10 errata-xmlrpc 2021-03-10 11:24:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.1 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0678


Note You need to log in before you can comment on or make changes to this bug.