Bug 1928304
Summary: | hostname lookup delays when master node down | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> | |
Component: | Networking | Assignee: | Miciah Dashiel Butler Masters <mmasters> | |
Networking sub component: | DNS | QA Contact: | jechen <jechen> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | aarapov, abodhe, alchan, amcdermo, aos-bugs, bjarolim, hongli, juqiao, mfuruta, mmasters, mzali, openshift-bugs-escalate, rbolling, rh-container, rhowe, sgaikwad | |
Version: | 4.6 | |||
Target Milestone: | --- | |||
Target Release: | 4.7.z | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1928773 (view as bug list) | Environment: | ||
Last Closed: | 2021-03-10 11:24:00 UTC | Type: | --- | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1919737 | |||
Bug Blocks: | 1928773, 1930917 |
Comment 1
Miciah Dashiel Butler Masters
2021-02-23 03:46:53 UTC
Waiting for https://github.com/openshift/cluster-dns-operator/pull/235 to be approved. attempted to verify with 4.7.0-0.nightly-2021-03-01-085007, could only verify pull 259. pull 235 is missing, waiting for next image. no new 4.7 nightly build available today verified with 4.7.0-0.nightly-2021-03-04-004412, test passed oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-03-04-004412 True False 20m Cluster version is 4.7.0-0.nightly-2021-03-04-004412 ## in first terminal, create a test pod, and rsh into the test pod, run the infinite loop to measure time for DNS lookup [jechen@jechen ~]$ oc create -f https://raw.githubusercontent.com/openshift/verification-tests/master/testdata/networking/aosqe-pod-for-ping.json pod/hello-pod created [jechen@jechen ~]$ oc rsh hello-pod / # while (true); do time getent hosts kubernetes.default.svc.cluster.local; sleep 1; done 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.08s user 0m 0.00s sys 0m 0.00s 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.00s user 0m 0.00s sys 0m 0.00s 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.00s user 0m 0.00s sys 0m 0.00s 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.00s user 0m 0.00s sys 0m 0.00s <----snip-----> ### in a second terminal, reboot a master node $ oc get node NAME STATUS ROLES AGE VERSION hongli-47bv-hv7k6-master-0 Ready master 43m v1.20.0+5fbfd19 hongli-47bv-hv7k6-master-1 Ready master 43m v1.20.0+5fbfd19 hongli-47bv-hv7k6-master-2 Ready master 43m v1.20.0+5fbfd19 hongli-47bv-hv7k6-worker-northcentralus-fg28q Ready worker 28m v1.20.0+5fbfd19 hongli-47bv-hv7k6-worker-northcentralus-frggf Ready worker 34m v1.20.0+5fbfd19 hongli-47bv-hv7k6-worker-northcentralus-wk6ls Ready worker 34m v1.20.0+5fbfd19 [jechen@jechen ~]$ oc debug node/hongli-47bv-hv7k6-master-1 Starting pod/hongli-47bv-hv7k6-master-1-debug ... To use host binaries, run `chroot /host` chroot /hostPod IP: 10.0.0.5 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# reboot ### In first terminal, monitor to ensure that there is no DNS lookup delay when the master is rebooting (for about 3-5 min) <----snip-----> 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.01s user 0m 0.00s sys 0m 0.00s 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.00s user 0m 0.00s sys 0m 0.00s 172.30.0.1 kubernetes.default.svc.cluster.local kubernetes.default.svc.cluster.local real 0m 0.00s user 0m 0.00s sys 0m 0.00s <----snip-----> ### verified that dns readiness probe has been changed with pull 235 $ oc -n openshift-dns get ds/dns-default -oyaml readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 3 <------verified the change with https://github.com/openshift/cluster-dns-operator/pull/235/ successThreshold: 1 timeoutSeconds: 3 <------verified the change with https://github.com/openshift/cluster-dns-operator/pull/235/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.7.1 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0678 |