1698210 – Service type NodePort not exposing service on all nodes

Bug 1698210 - Service type NodePort not exposing service on all nodes

Summary: Service type NodePort not exposing service on all nodes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Casey Callendrello
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-09 19:56 UTC by Anurag saxena
Modified:	2019-06-04 10:47 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:47:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:47:25 UTC

Description Anurag saxena 2019-04-09 19:56:17 UTC

Description of problem: A traffic listener pod is created and exposed via service type NodePort. Traffic is sent from client pod to the exposed nodeport on all Nodes IPs one by one. All other nodes shows UNREPLIED entry in conntrack table except the one client pod runs on (from where the traffic is sent). Client pod is just a ping pod utilized to send traffic. 

All nodes supposed to be proxying the exposed service due to type NodePort.

$ oc get pods
NAME                READY   STATUS    RESTARTS   AGE
hello-pod           1/1     Running   0          22h  <<<<Ping pod
udp-rc-lcbst        1/1     Running   0          51m  <<<<Traffic listener pod

$ oc get svc
NAME           TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
udp-rc-lcbst   NodePort   172.30.154.219   <none>        8080:31963/UDP   105m


$ sudo podman run -rm --network host --privileged docker.io/aosqe/conntrack-tool conntrack -L | grep 31963
udp      17 5 src=172.31.130.146 dst=172.31.139.127 sport=34999 dport=31963 [UNREPLIED] src=172.31.139.127 dst=172.31.130.146 sport=31963 dport=34999 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 12 src=172.31.130.146 dst=172.31.159.254 sport=52167 dport=31963 [UNREPLIED] src=172.31.159.254 dst=172.31.130.146 sport=31963 dport=52167 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 20 src=172.31.130.146 dst=172.128.159.64 sport=37556 dport=31963 [UNREPLIED] src=172.128.159.64 dst=172.31.130.146 sport=31963 dport=37556 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 149 src=172.31.130.146 dst=172.31.130.146 sport=58178 dport=31963 src=10.129.2.23 dst=10.128.2.1 sport=8080 dport=58178 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

sudo podman command is just running conntrack utility in a container and removes container post command execution

Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-04-05-165550

$ oc version --short
Client Version: v4.0.22
Server Version: v1.13.4+ab11434


How reproducible: Always

Steps to Reproduce:
1. Create traffic listener pod and a ping pod. See in addtional info below
2. oc expose pod <traffic_listener_pod> --type=NodePort --port=8080 --protocol=UDP
3. Send traffic via client pod to all node IPs and nodeport one by one

Actual results: Not all nodes are responding to client but only that node on with client on 

Expected results: Expecting all nodes to reply to client as the service type is NodePort which is supposed to expose service on all nodes

Additional info: 

traffic listener pod template
-----------------------------

{
    "apiVersion": "v1",
    "kind": "List",
    "items": [
        {
            "apiVersion": "v1",
            "kind": "ReplicationController",
            "metadata": {
                "labels": {
                    "name": "udp-rc"
                },
                "name": "udp-rc"
            },
            "spec": {
                "replicas": 1,
                "template": {
                    "metadata": {
                        "labels": {
                            "name": "udp-pods"
                        }
                    },
                    "spec": {
                        "containers": [
                            {
                              "command": [ "/usr/bin/ncat", "-u", "-l", "8080","--keep-open", "--exec", "/bin/cat"],
                              "name": "udp-pod",
                              "image": "aosqe/pod-for-ping"
                            }
                        ],
                 "restartPolicy": "Always"
                    }
                }
            }
       }
    ]
}


$ oc get svc -oyaml
-----------------------
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: 2019-04-09T18:02:22Z
    labels:
      name: udp-pods
    name: udp-rc-lcbst
    namespace: test
    resourceVersion: "880960"
    selfLink: /api/v1/namespaces/test/services/udp-rc-lcbst
    uid: 9fddbc9b-5af1-11e9-82b2-02302f122dd4
  spec:
    clusterIP: 172.30.154.219
    externalTrafficPolicy: Cluster
    ports:
    - nodePort: 31963
      port: 8080
      protocol: UDP
      targetPort: 8080
    selector:
      name: udp-pods
    sessionAffinity: None
    type: NodePort
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 1 Anurag saxena 2019-04-09 23:06:26 UTC

Ok further experiments tells me that it might be due to node to node network connectivity absence in 4.x. I am not able to ping one node from another node or vice versa. Is it a restriction on CoreOS on 4.x? 
Please advise.

Comment 2 Meng Bo 2019-04-10 03:15:43 UTC

Looks like an AWS security group issue, from the console I can see we only opened the port range from 30000 to 32767 for TCP protocol. Maybe we need also open them for UDP.

To Anurag,
Can you help get the output about iptables and netstat for your udp node port?
Eg,
iptables-save | grep 31963
netstat -lnpu | grep 31963

I think all the related entries should be there.

Comment 3 Casey Callendrello 2019-04-10 10:49:30 UTC

Yup, we need to open this range for UDP as well, I'll file a PR.

Comment 4 Casey Callendrello 2019-04-10 13:18:34 UTC

Filed https://github.com/openshift/installer/pull/1577

Comment 5 Anurag saxena 2019-04-10 15:37:27 UTC

(In reply to Meng Bo from comment #2)
> Looks like an AWS security group issue, from the console I can see we only
> opened the port range from 30000 to 32767 for TCP protocol. Maybe we need
> also open them for UDP.
> 
> To Anurag,
> Can you help get the output about iptables and netstat for your udp node
> port?
> Eg,
> iptables-save | grep 31963
> netstat -lnpu | grep 31963
> 
> I think all the related entries should be there.

iptables-save entries seems to be correct 

$ sudo iptables-save | grep 31326
-A KUBE-NODEPORTS -p udp -m comment --comment "test/udp-rc-ctsj7:" -m udp --dport 31326 -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p udp -m comment --comment "test/udp-rc-ctsj7:" -m udp --dport 31326 -j KUBE-SVC-J5HIX5PZU2ZRSTD5

While netstat doesn;t show the expected port range opened 

$ netstat -lnpu | grep "Proto\|31326"
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
udp6       0      0 :::31326                :::*                                -

Comment 7 Anurag saxena 2019-04-18 14:55:31 UTC

Will have to verify this on next good build. Not getting green build on 4.1 since 8 days. Thanks.

Comment 8 Anurag saxena 2019-04-18 21:07:45 UTC

Verified on 4.1.0-0.nightly-2019-04-18-170154.
Port range 30000-32767 is now allowed for UDP for NodePort services. Test steps worked fine now as mentioned in description

Comment 10 errata-xmlrpc 2019-06-04 10:47:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.