Hide Forgot
Description of problem: A traffic listener pod is created and exposed via service type NodePort. Traffic is sent from client pod to the exposed nodeport on all Nodes IPs one by one. All other nodes shows UNREPLIED entry in conntrack table except the one client pod runs on (from where the traffic is sent). Client pod is just a ping pod utilized to send traffic. All nodes supposed to be proxying the exposed service due to type NodePort. $ oc get pods NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 22h <<<<Ping pod udp-rc-lcbst 1/1 Running 0 51m <<<<Traffic listener pod $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE udp-rc-lcbst NodePort 172.30.154.219 <none> 8080:31963/UDP 105m $ sudo podman run -rm --network host --privileged docker.io/aosqe/conntrack-tool conntrack -L | grep 31963 udp 17 5 src=172.31.130.146 dst=172.31.139.127 sport=34999 dport=31963 [UNREPLIED] src=172.31.139.127 dst=172.31.130.146 sport=31963 dport=34999 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 12 src=172.31.130.146 dst=172.31.159.254 sport=52167 dport=31963 [UNREPLIED] src=172.31.159.254 dst=172.31.130.146 sport=31963 dport=52167 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 20 src=172.31.130.146 dst=172.128.159.64 sport=37556 dport=31963 [UNREPLIED] src=172.128.159.64 dst=172.31.130.146 sport=31963 dport=37556 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 149 src=172.31.130.146 dst=172.31.130.146 sport=58178 dport=31963 src=10.129.2.23 dst=10.128.2.1 sport=8080 dport=58178 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 sudo podman command is just running conntrack utility in a container and removes container post command execution Version-Release number of selected component (if applicable): 4.0.0-0.nightly-2019-04-05-165550 $ oc version --short Client Version: v4.0.22 Server Version: v1.13.4+ab11434 How reproducible: Always Steps to Reproduce: 1. Create traffic listener pod and a ping pod. See in addtional info below 2. oc expose pod <traffic_listener_pod> --type=NodePort --port=8080 --protocol=UDP 3. Send traffic via client pod to all node IPs and nodeport one by one Actual results: Not all nodes are responding to client but only that node on with client on Expected results: Expecting all nodes to reply to client as the service type is NodePort which is supposed to expose service on all nodes Additional info: traffic listener pod template ----------------------------- { "apiVersion": "v1", "kind": "List", "items": [ { "apiVersion": "v1", "kind": "ReplicationController", "metadata": { "labels": { "name": "udp-rc" }, "name": "udp-rc" }, "spec": { "replicas": 1, "template": { "metadata": { "labels": { "name": "udp-pods" } }, "spec": { "containers": [ { "command": [ "/usr/bin/ncat", "-u", "-l", "8080","--keep-open", "--exec", "/bin/cat"], "name": "udp-pod", "image": "aosqe/pod-for-ping" } ], "restartPolicy": "Always" } } } } ] } $ oc get svc -oyaml ----------------------- apiVersion: v1 items: - apiVersion: v1 kind: Service metadata: creationTimestamp: 2019-04-09T18:02:22Z labels: name: udp-pods name: udp-rc-lcbst namespace: test resourceVersion: "880960" selfLink: /api/v1/namespaces/test/services/udp-rc-lcbst uid: 9fddbc9b-5af1-11e9-82b2-02302f122dd4 spec: clusterIP: 172.30.154.219 externalTrafficPolicy: Cluster ports: - nodePort: 31963 port: 8080 protocol: UDP targetPort: 8080 selector: name: udp-pods sessionAffinity: None type: NodePort status: loadBalancer: {} kind: List metadata: resourceVersion: "" selfLink: ""
Ok further experiments tells me that it might be due to node to node network connectivity absence in 4.x. I am not able to ping one node from another node or vice versa. Is it a restriction on CoreOS on 4.x? Please advise.
Looks like an AWS security group issue, from the console I can see we only opened the port range from 30000 to 32767 for TCP protocol. Maybe we need also open them for UDP. To Anurag, Can you help get the output about iptables and netstat for your udp node port? Eg, iptables-save | grep 31963 netstat -lnpu | grep 31963 I think all the related entries should be there.
Yup, we need to open this range for UDP as well, I'll file a PR.
Filed https://github.com/openshift/installer/pull/1577
(In reply to Meng Bo from comment #2) > Looks like an AWS security group issue, from the console I can see we only > opened the port range from 30000 to 32767 for TCP protocol. Maybe we need > also open them for UDP. > > To Anurag, > Can you help get the output about iptables and netstat for your udp node > port? > Eg, > iptables-save | grep 31963 > netstat -lnpu | grep 31963 > > I think all the related entries should be there. iptables-save entries seems to be correct $ sudo iptables-save | grep 31326 -A KUBE-NODEPORTS -p udp -m comment --comment "test/udp-rc-ctsj7:" -m udp --dport 31326 -j KUBE-MARK-MASQ -A KUBE-NODEPORTS -p udp -m comment --comment "test/udp-rc-ctsj7:" -m udp --dport 31326 -j KUBE-SVC-J5HIX5PZU2ZRSTD5 While netstat doesn;t show the expected port range opened $ netstat -lnpu | grep "Proto\|31326" (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name udp6 0 0 :::31326 :::* -
Will have to verify this on next good build. Not getting green build on 4.1 since 8 days. Thanks.
Verified on 4.1.0-0.nightly-2019-04-18-170154. Port range 30000-32767 is now allowed for UDP for NodePort services. Test steps worked fine now as mentioned in description
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758