Bug 1579917
| Summary: | Abbreviated cluster domain names result in upstream DNS query | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Robert Bost <rbost> |
| Component: | RFE | Assignee: | Ben Bennett <bbennett> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Xiaoli Tian <xtian> |
| Severity: | urgent | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.7.0 | CC: | aos-bugs, bbennett, czinda, erich, jokerman, knakai, mmccomas, openshift-bugs-escalate, rbost, rhowe |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-04-17 17:18:56 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1267746 | ||
If the pod's dnsPolicy is set to ClusterFirst then we should add ndots:5 to the pod's resolv.conf. If that is not working, can you grab the pod's definition please (oc get po ... -o yaml) and rsh into the pod and grab the /etc/resolv.conf definition please. Note, this area is in flux... it is settable in 3.9 (alpha, beta in 3.10) on a per-pod basis: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ It's been brought to my attention that this is the node not the pod. Sorry. You can change the ndot's setting in your node's resolv.conf if you want. But if you are just trying to reduce the dns load, it's probably not worth it. Reopening this under the installer.
Where we would add the following to the dispatcher script, so that when we use the registry URL of "docker-registry.default.svc" we first use the search domains rather than waiting of the upstream servers to respond not having this record.
echo "options ndots:3" >> /etc/resolv.conf
https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_node/files/networkmanager/99-origin-dns.sh#L111
(In reply to Ryan Howe from comment #7) > Reopening this under the installer. > > Where we would add the following to the dispatcher script, so that when we > use the registry URL of "docker-registry.default.svc" we first use the > search domains rather than waiting of the upstream servers to respond not > having this record. > > echo "options ndots:3" >> /etc/resolv.conf > > https://github.com/openshift/openshift-ansible/blob/master/roles/ > openshift_node/files/networkmanager/99-origin-dns.sh#L111 Should this be re-clasified as an RFE to allow a user to set ndots# at install/upgrade time of the cluster? Pods can have custom DNS options: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/ I believe this addresses the concern. Please reopen with more details if it does not address your needs. |
Description of problem: A DNS query that is made without .cluster.local appended (e.g. docker-registry.default.svc) and from an OpenShift service (or from a Pod using Linux namespace that links to host's /etc/resolv.conf) will result in upstream/external DNS servers being tried first. This will typically always fail because upstream DNS servers are not aware of domain names internal to OpenShift. A workaround is to included 'options ndots:3' in the host's /etc/resolv.conf, however, this has the side affect of automatically appending .cluster.local to domain names that would normally be served by upstream DNS (e.g. google.com). Version-Release number of selected component (if applicable): Tested on OpenShift 3.7. Would apply to other versions too. Steps to Reproduce: Here is strace output showing the failing upstream query and then the successful DNS query with .cluster.local appended. Further below is the same nslookup command but with 'options ndots:3' in /etc/resolv.conf. ======== # strace -Tttvs 1024 -e sendmsg,recvmsg nslookup docker-registry.default.svc Server: 10.74.157.61 Address: 10.74.157.61#53 Name: docker-registry.default.svc.cluster.local Address: 172.30.190.238 12:29:22.080359 --- SIGTERM {si_signo=SIGTERM, si_code=SI_TKILL, si_pid=14367, si_uid=0} --- 12:29:22.082120 +++ exited with 0 +++ [root@master-0 ~]# strace -Tttvfs 1024 -e sendmsg,recvmsg nslookup docker-registry.default.svc strace: Process 14400 attached strace: Process 14401 attached strace: Process 14402 attached [pid 14400] 12:29:29.063019 recvmsg(20, 0x7f6c1aa33ab0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000028> [pid 14400] 12:29:29.063383 sendmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"u\317\1\0\0\1\0\0\0\0\0\0\17docker-registry\7default\3svc\0\0\1\0\1", 45}], msg_controllen=0, msg_flags=0}, 0) = 45 <0.000258> [pid 14400] 12:29:29.064794 recvmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"u\317\201\203\0\1\0\0\0\1\0\0\17docker-registry\7default\3svc\0\0\1\0\1\0\0\6\0\1\0\0 \353\0@\1a\froot-servers\3net\0\5nstld\fverisign-grs\3com\0xI\6\330\0\0\7\10\0\0\3\204\0\t:\200\0\1Q\200", 65535}], msg_controllen=32, [{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */}], msg_flags=0}, 0) = 120 <0.000020> [pid 14400] 12:29:29.065817 recvmsg(21, 0x7f6c1aa333f0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000016> [pid 14400] 12:29:29.065925 sendmsg(21, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"r\342\1\0\0\1\0\0\0\0\0\0\17docker-registry\7default\3svc\7cluster\5local\0\0\1\0\1", 59}], msg_controllen=0, msg_flags=0}, 0) = 59 <0.000058> [pid 14400] 12:29:29.066751 recvmsg(21, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"r\342\205\200\0\1\0\1\0\0\0\0\17docker-registry\7default\3svc\7cluster\5local\0\0\1\0\1\300\f\0\1\0\1\0\0\0\36\0\4\254\36\276\356", 65535}], msg_controllen=32, [{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */}], msg_flags=0}, 0) = 75 <0.000019> Server: 10.74.157.61 Address: 10.74.157.61#53 Name: docker-registry.default.svc.cluster.local Address: 172.30.190.238 [pid 14399] 12:29:29.067573 --- SIGTERM {si_signo=SIGTERM, si_code=SI_TKILL, si_pid=14399, si_uid=0} --- [pid 14400] 12:29:29.068053 +++ exited with 0 +++ [pid 14402] 12:29:29.068495 +++ exited with 0 +++ [pid 14401] 12:29:29.068995 +++ exited with 0 +++ 12:29:29.069952 +++ exited with 0 +++ ======== Same nslookup command but with 'options ndots:3' in /etc/resolv.conf. ======== # echo 'options ndots:3' >> /etc/resolv.conf # strace -Tttvfs 1024 -e sendmsg,recvmsg nslookup docker-registry.default.svc strace: Process 14589 attached strace: Process 14590 attached strace: Process 14591 attached [pid 14589] 12:31:26.234661 recvmsg(20, 0x7fe090940ab0, 0) = -1 EAGAIN (Resource temporarily unavailable) <0.000031> [pid 14589] 12:31:26.235052 sendmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"\3025\1\0\0\1\0\0\0\0\0\0\17docker-registry\7default\3svc\7cluster\5local\0\0\1\0\1", 59}], msg_controllen=0, msg_flags=0}, 0) = 59 <0.000108> [pid 14589] 12:31:26.236035 recvmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.74.157.61")}, msg_iov(1)=[{"\3025\205\200\0\1\0\1\0\0\0\0\17docker-registry\7default\3svc\7cluster\5local\0\0\1\0\1\300\f\0\1\0\1\0\0\0\36\0\4\254\36\276\356", 65535}], msg_controllen=32, [{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */}], msg_flags=0}, 0) = 75 <0.000058> Server: 10.74.157.61 Address: 10.74.157.61#53 Name: docker-registry.default.svc.cluster.local Address: 172.30.190.238 [pid 14588] 12:31:26.237068 --- SIGTERM {si_signo=SIGTERM, si_code=SI_TKILL, si_pid=14588, si_uid=0} --- [pid 14589] 12:31:26.237559 +++ exited with 0 +++ [pid 14591] 12:31:26.237916 +++ exited with 0 +++ [pid 14590] 12:31:26.238454 +++ exited with 0 +++ 12:31:26.239443 +++ exited with 0 +++ ======== Additional Info: The problem with adding 'options ndots:3' is that is with automatically add .cluster.local to domains that would normally need upstream DNS to resolve it. For example, google.com, example.com... Actually, nslookup is running through the entire search list from /etc/resolv.conf before trying just "google.com". This tradeoff seems to work for some customers but it is not clear if using 'options ndots:3' would apply to every situation. I'm hoping engineering can help make a decision on if the current setup is still correct or if adding ndots:3 would result in better DNS performance.