Bug 1321093 - After stopping one or two api-server service out of 3 api-server services running on three masters in HA, services by hostname with kuberenetes service cluster ip are resolved sometimes and sometimes not resolved..
Summary: After stopping one or two api-server service out of 3 api-server services ru...
Keywords:
Status: CLOSED DUPLICATE of bug 1300028
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Andy Goldstein
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-03-24 16:09 UTC by Miheer Salunke
Modified: 2019-11-14 07:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-24 16:12:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Miheer Salunke 2016-03-24 16:09:14 UTC
Description of problem:

After stopping one or two atomic-openshift-master-api services out of 3 atomic-openshift-master-api  services running on three masters in HA, services by hostname are resolved sometimes and sometimes not.
Related PR- https://github.com/kubernetes/kubernetes/pull/20975 

Version-Release number of selected component (if applicable):
3.1.1

How reproducible:
Always

Steps to Reproduce:
1.In a 3 master HA setup, stop 2 api services.
2.try to dig @<kubernetes service cluster ip> kubernetes.default.svc.cluster.local
3.Check the results

Actual results:

The service by hostname by kubernetes cluster ip sometimes resolve and sometimes not when api server service is stopped.

Expected results:
When the api server service stops this shall not affect the resolving of service by hostname.

Additional info:

related PR- https://github.com/kubernetes/kubernetes/pull/20975 


    ┌─[✗]─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
    ^C┌─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
     
    ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7261
    ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
     
    ;; QUESTION SECTION:
    ;kubernetes.default.svc.cluster.local. IN A
     
    ;; ANSWER SECTION:
    kubernetes.default.svc.cluster.local. 30 IN A   172.30.0.1
     
    ;; Query time: 12 msec
    ;; SERVER: 172.30.0.1#53(172.30.0.1)
    ;; WHEN: Thu Mar 24 10:56:27 EDT 2016
    ;; MSG SIZE  rcvd: 70
     
    ┌─[root@rnode5]─[~]
    └──> ^C
    ┌─[✗]─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
     
    ^C┌─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
    ^C┌─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
    ^C┌─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
    ^C┌─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
     
    ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local
    ; (1 server found)
    ;; global options: +cmd
    ;; connection timed out; no servers could be reached
    ┌─[✗]─[root@rnode5]─[~]
    └──> dig @172.30.0.1 kubernetes.default.svc.cluster.local
     
    ; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local
    ; (1 server found)
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46153
    ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
     
    ;; QUESTION SECTION:
    ;kubernetes.default.svc.cluster.local. IN A
     
    ;; ANSWER SECTION:
    kubernetes.default.svc.cluster.local. 30 IN A   172.30.0.1
     
    ;; Query time: 1 msec
    ;; SERVER: 172.30.0.1#53(172.30.0.1)
    ;; WHEN: Thu Mar 24 10:57:05 EDT 2016
    ;; MSG SIZE  rcvd: 70

Comment 1 Miheer Salunke 2016-03-24 16:10:29 UTC
#### REPRODUCER ####

    master1.example.com  172.17.28.10       atomic-openshift-maseter-api.service OFF
    master2.example.com  172.17.28.12       atomic-openshift-maseter-api.service OFF
    master3.example.com  172.17.28.18  
     
     
    TCP DUMP FROM NODE 172.17.28.3
    # tcpdump  udp port 53 -n -n -i any

     
    RAN 3 times from node: 
    # dig @172.30.0.1 kubernetes.default.svc.cluster.local
     
    FAILED
    11:21:07.449719 IP 172.17.28.30.55177 > 172.17.28.10.53: 64+ [1au] A? kubernetes.default.svc.cluster.local. (65)
    11:21:21.133499 IP 172.17.28.30.41090 > 172.17.28.12.53: 18420+ [1au] A? kubernetes.default.svc.cluster.local. (65)
     
     
    SUCCESS
    11:21:48.549492 IP 172.17.28.30.54937 > 172.17.28.18.53: 55387+ [1au] A? kubernetes.default.svc.cluster.local. (65)
    11:21:48.556604 IP 172.17.28.18.53 > 172.17.28.30.54937: 55387* 1/0/0 A 172.30.0.1 (70)


SUCCESS COMMAND

# dig @172.30.0.1 kubernetes.default.svc.cluster.local

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55387
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;kubernetes.default.svc.cluster.local. IN A

;; ANSWER SECTION:
kubernetes.default.svc.cluster.local. 30 IN A	172.30.0.1

;; Query time: 7 msec
;; SERVER: 172.30.0.1#53(172.30.0.1)
;; WHEN: Thu Mar 24 11:21:48 EDT 2016
;; MSG SIZE  rcvd: 70


FAILED COMMAND 

# dig @172.30.0.1 kubernetes.default.svc.cluster.local

; <<>> DiG 9.9.4-RedHat-9.9.4-29.el7_2.3 <<>> @172.30.0.1 kubernetes.default.svc.cluster.local
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached


# oc get service kubernetes -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: 2016-02-23T22:36:58Z
  labels:
    component: apiserver
    provider: kubernetes
  name: kubernetes
  namespace: default
  resourceVersion: "11"
  selfLink: /api/v1/namespaces/default/services/kubernetes
  uid: f2c22c58-da7d-11e5-8e68-fa163e2dcef7
spec:
  clusterIP: 172.30.0.1
  portalIP: 172.30.0.1
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  - name: dns
    port: 53
    protocol: UDP
    targetPort: 53
  - name: dns-tcp
    port: 53
    protocol: TCP
    targetPort: 53
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}


# oc get endpoints kubernetes -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  creationTimestamp: 2016-02-23T22:36:58Z
  name: kubernetes
  namespace: default
  resourceVersion: "2151105"
  selfLink: /api/v1/namespaces/default/endpoints/kubernetes
  uid: f2c49969-da7d-11e5-8e68-fa163e2dcef7
subsets:
- addresses:
  - ip: 172.17.28.10
  - ip: 172.17.28.12
  - ip: 172.17.28.18
  ports:
  - name: dns-tcp
    port: 53
    protocol: TCP
  - name: https
    port: 443
    protocol: TCP
  - name: dns
    port: 53
    protocol: UDP

Comment 2 Andy Goldstein 2016-03-24 16:12:08 UTC

*** This bug has been marked as a duplicate of bug 1300028 ***


Note You need to log in before you can comment on or make changes to this bug.