Description of problem: After stopping one or two atomic-openshift-master-api services out of 3 atomic-openshift-master-api services running on three masters in HA, services by hostname are resolved sometimes and sometimes not. Related PR- https://github.com/kubernetes/kubernetes/pull/20975 Version-Release number of selected component (if applicable): 3.1.1 How reproducible: Always Steps to Reproduce: 1.In a 3 master HA setup, stop 2 api services. 2.try to dig @<kubernetes service cluster ip> kubernetes.default.svc.cluster.local 3.Check the results Actual results: The service by hostname by kubernetes cluster ip sometimes resolve and sometimes not when api server service is stopped. Expected results: When the api server service stops this shall not affect the resolving of service by hostname. Additional info: related PR- https://github.com/kubernetes/kubernetes/pull/20975 ┌─[✗]─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ^C┌─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7261 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;kubernetes.default.svc.cluster.local. IN A ;; ANSWER SECTION: kubernetes.default.svc.cluster.local. 30 IN A 172.30.0.1 ;; Query time: 12 msec ;; SERVER: 172.30.0.1#53(172.30.0.1) ;; WHEN: Thu Mar 24 10:56:27 EDT 2016 ;; MSG SIZE rcvd: 70 ┌─[root@rnode5]─[~] └──> ^C ┌─[✗]─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ^C┌─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ^C┌─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ^C┌─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ^C┌─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached ┌─[✗]─[root@rnode5]─[~] └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46153 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;kubernetes.default.svc.cluster.local. IN A ;; ANSWER SECTION: kubernetes.default.svc.cluster.local. 30 IN A 172.30.0.1 ;; Query time: 1 msec ;; SERVER: 172.30.0.1#53(172.30.0.1) ;; WHEN: Thu Mar 24 10:57:05 EDT 2016 ;; MSG SIZE rcvd: 70
#### REPRODUCER #### master1.example.com 172.17.28.10 atomic-openshift-maseter-api.service OFF master2.example.com 172.17.28.12 atomic-openshift-maseter-api.service OFF master3.example.com 172.17.28.18 TCP DUMP FROM NODE 172.17.28.3 # tcpdump udp port 53 -n -n -i any RAN 3 times from node: # dig @172.30.0.1 kubernetes.default.svc.cluster.local FAILED 11:21:07.449719 IP 172.17.28.30.55177 > 172.17.28.10.53: 64+ [1au] A? kubernetes.default.svc.cluster.local. (65) 11:21:21.133499 IP 172.17.28.30.41090 > 172.17.28.12.53: 18420+ [1au] A? kubernetes.default.svc.cluster.local. (65) SUCCESS 11:21:48.549492 IP 172.17.28.30.54937 > 172.17.28.18.53: 55387+ [1au] A? kubernetes.default.svc.cluster.local. (65) 11:21:48.556604 IP 172.17.28.18.53 > 172.17.28.30.54937: 55387* 1/0/0 A 172.30.0.1 (70) SUCCESS COMMAND # dig @172.30.0.1 kubernetes.default.svc.cluster.local ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55387 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;kubernetes.default.svc.cluster.local. IN A ;; ANSWER SECTION: kubernetes.default.svc.cluster.local. 30 IN A 172.30.0.1 ;; Query time: 7 msec ;; SERVER: 172.30.0.1#53(172.30.0.1) ;; WHEN: Thu Mar 24 11:21:48 EDT 2016 ;; MSG SIZE rcvd: 70 FAILED COMMAND # dig @172.30.0.1 kubernetes.default.svc.cluster.local ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local ; (1 server found) ;; global options: +cmd ;; connection timed out; no servers could be reached # oc get service kubernetes -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: 2016-02-23T22:36:58Z labels: component: apiserver provider: kubernetes name: kubernetes namespace: default resourceVersion: "11" selfLink: /api/v1/namespaces/default/services/kubernetes uid: f2c22c58-da7d-11e5-8e68-fa163e2dcef7 spec: clusterIP: 172.30.0.1 portalIP: 172.30.0.1 ports: - name: https port: 443 protocol: TCP targetPort: 443 - name: dns port: 53 protocol: UDP targetPort: 53 - name: dns-tcp port: 53 protocol: TCP targetPort: 53 sessionAffinity: None type: ClusterIP status: loadBalancer: {} # oc get endpoints kubernetes -o yaml apiVersion: v1 kind: Endpoints metadata: creationTimestamp: 2016-02-23T22:36:58Z name: kubernetes namespace: default resourceVersion: "2151105" selfLink: /api/v1/namespaces/default/endpoints/kubernetes uid: f2c49969-da7d-11e5-8e68-fa163e2dcef7 subsets: - addresses: - ip: 172.17.28.10 - ip: 172.17.28.12 - ip: 172.17.28.18 ports: - name: dns-tcp port: 53 protocol: TCP - name: https port: 443 protocol: TCP - name: dns port: 53 protocol: UDP
*** This bug has been marked as a duplicate of bug 1300028 ***