Previously, for pods that used the `hostPort` definition to expose UDP ports to the host, the kubelet did not remove stale routing entries when a pod was deleted. As a result, those ports became unreachable when the pod was restarted. With this update, stale routing entries are removed, and the exposed UDP ports are reachable when the pods are restarted.
Created attachment 1769568[details]
sdn-log
Created attachment 1769568[details]
sdn-log
Description of problem:
Customer has a service that deploys pods through daemonset hostPort to list on some TCP and UDP ports. Their external service uses sport to dport to connect to these pods on those same UDP and TCP ports. When it comes to TCP everything works as expected, but for UDP ports, everytime these pods are restarted/deleted, the UDP connections get stuck and no new entries are created on conntrack. This is only solved when the external service making the connection is restarted as well.
With some research I found these issues on github for the upstream Kubernetes that seemed to fixed this issue, but I was unable to see the same piece of code change on our side:
https://github.com/kubernetes/kubernetes/issues/58336https://github.com/kubernetes/kubernetes/pull/59286https://github.com/kubernetes/kubernetes/issues/59033
Version-Release number of selected component (if applicable):
Tested this on OCP 3.11.394 as well as 4.5.35 and 4.6.21
How reproducible:
This can be reproduced everytime even on the latest OCP4.5 and 4.6 releases.
Steps to Reproduce:
1. Create a daemonset for a service to listen on some UDP hostPort
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
deprecated.daemonset.template.generation: "1"
creationTimestamp: "2021-04-06T12:16:56Z"
generation: 1
labels:
app: udp-server
name: test-udp-conntrack
namespace: test-network
resourceVersion: "774848"
selfLink: /apis/apps/v1/namespaces/test-network/daemonsets/test-udp-conntrack
uid: fadbd0c5-96d1-11eb-84c5-525400ea502a
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
name: test-udp-conntrack
template:
metadata:
creationTimestamp: null
labels:
app: udp-server
name: test-udp-conntrack
spec:
containers:
- args:
- python /etc/test/server.py $POD_IP 8888
command:
- /bin/bash
- -c
env:
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: registry.redhat.io/rhscl/python-36-rhel7:latest
imagePullPolicy: IfNotPresent
name: udp-server
ports:
- containerPort: 8888
hostPort: 8888
name: udpcon
protocol: UDP
- containerPort: 8888
hostPort: 8888
name: tcpcon
protocol: TCP
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 20m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/test
name: server-script
dnsPolicy: ClusterFirst
nodeSelector:
hostport: "true"
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 10
volumes:
- configMap:
defaultMode: 420
name: server-script
test-udp-conntrack-7gn7m 1/1 Running 0 16m 10.128.6.96 ocp-infra-node2.openshift3.redhatrules.local <none>
test-udp-conntrack-99f44 1/1 Running 0 16m 10.128.3.94 ocp-infra-node1.openshift3.redhatrules.local <none>
test-udp-conntrack-lz8n6 1/1 Running 0 16m 10.128.5.113 ocp-infra-node4.openshift3.redhatrules.local <none>
test-udp-conntrack-md8dq 1/1 Running 0 16m 10.128.4.156 ocp-infra-node3.openshift3.redhatrules.local <none>
2. On the nodes I see HOSTPORTS nat rules are created:
# iptables -t nat -nvL | grep 8888
0 0 KUBE-HP-LGATJ4G47OMMBLH3 tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ tcp dpt:8888
0 0 KUBE-HP-2XVMR6Y6WGJIEYMN udp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ udp dpt:8888
0 0 KUBE-MARK-MASQ all -- * * 10.128.6.96 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */
0 0 DNAT udp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ udp to:10.128.6.96:8888
0 0 KUBE-MARK-MASQ all -- * * 10.128.6.96 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */
0 0 DNAT tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ tcp to:10.128.6.96:8888
3. Start some connection to UDP on some node IP to port 8888 and monitor conntrack entries:
$ python3 udp_test-client.py 172.23.190.40 8888
# watch -n2 conntrack -L -p udp --dport=8888
udp 17 179 src=172.23.188.1 dst=172.23.190.40 sport=8888 dport=8888 src=10.128.6.96 dst=172.23.188.1 sport=8888 dport=8888 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
4. Delete the pod on these node and notice that when the timeout countdown reaches 0, entry disappears and no new one is created. Also checking iptables we see that it was updated as expected:
$ oc delete pod test-udp-conntrack-7gn7m
$ oc get pods -o wide
test-udp-conntrack-99f44 1/1 Running 0 44m 10.128.3.94 ocp-infra-node1.openshift3.redhatrules.local <none>
test-udp-conntrack-j5sd4 1/1 Running 0 9s 10.128.6.97 ocp-infra-node2.openshift3.redhatrules.local <none>
test-udp-conntrack-lz8n6 1/1 Running 0 44m 10.128.5.113 ocp-infra-node4.openshift3.redhatrules.local <none>
test-udp-conntrack-md8dq 1/1 Running 0 44m 10.128.4.156 ocp-infra-node3.openshift3.redhatrules.local <none>
# iptables -t nat -nvL | grep 8888
0 0 KUBE-HP-5RALKNMEBO6Q4WVO udp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-j5sd4_test-network hostport 8888 */ udp dpt:8888
0 0 KUBE-HP-MJ2HYBIHG6LVBSHG tcp -- * * 0.0.0.0/0 0.0.0.0/0 /* test-udp-conntrack-j5sd4_test-network hostport 8888 */ tcp dpt:8888
0 0 KUBE-MARK-MASQ all -- * * 10.128.6.97 0.0.0.0/0 /* test-udp-c
# watch -n2 conntrack -L -p udp --dport=8888
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.
5. When restarting the client connection is established again:
$ python3 udp_test-client.py 172.23.190.40 8888
CTRL+C
$ python3 udp_test-client.py 172.23.190.40 8888
# watch -n2 conntrack -L -p udp --dport=8888
onntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
udp 17 179 src=172.23.188.1 dst=172.23.190.40 sport=8888 dport=8888 src=10.128.6.97 dst=172.23.188.1 sport=8888 dport=8888 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
Actual results:
UDP contrack entries are not recreated if the pod is deleted and new pod IP is created.
Expected results:
Node to be able to clean old entries and new entries to be created without needing to restart external services.
Additional info:
Python scripts on my side were kindly provided by the customer as they only creates a service that listens on UDP socket on the podIP and the client that establishes the connection to UDP port on the nodeIP.
$ oc exec test-udp-conntrack-j5sd4 -- ps auxwww
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000090+ 1 0.0 0.0 32072 6664 ? Ss 13:01 0:00 python /etc/test/server.py 10.128.6.97 8888
This reproducer was to mimic what their application does. They are implementing Hashicorp Consul and it has 3 consul servers external to OCP and on OCP there is a daemonset that deploys the consul agents. These services connect to each other through a specific port on UDP and TCP.
Another additional information that might be interesting is the fact that this behaviour happens even if we set hostNetwork: true on the daemonset and open the ports on the OS_FIREWALL_ALLOW chain in the nodes, the same issue happens even though the pods get always the hostIP and no NAT is involved.
This bug reported on 3.11.z and reproduces all the way up to 4.6.z. Setting source release to 3.11 and target to 4.8.0 to get a fix in our development branch and then clone/backport as far as needed/requested.
I continue to be unable to reproduce, but I still believe the bug exists.
@andcosta would you be able to use `rpm-ostree install` to install `conntrack-tools` and restart the node, and see if this issue persists? If adding conntrack fixes the issue, I'll work with coreOS team to get it in RHCOS
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (OpenShift Container Platform 3.11.487 bug fix and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2021:2928