Bug 1946593

Summary: Daemonset hostPort UDP conntrack entries are not updated when pods are recreated
Product: OpenShift Container Platform Reporter: Andre Costa <andcosta>
Component: NodeAssignee: Ryan Phillips <rphillips>
Node sub component: Kubelet QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: aconstan, anbhat, aojeagar, aos-bugs, awallenb, bjarolim, ctauchen, jtanenba, mtapsonj, openshift-bugs-escalate, pehunt, rcarrier, schoudha, scuppett, vlaad, wzheng, zzhao
Version: 3.11.0   
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, for pods that used the `hostPort` definition to expose UDP ports to the host, the kubelet did not remove stale routing entries when a pod was deleted. As a result, those ports became unreachable when the pod was restarted. With this update, stale routing entries are removed, and the exposed UDP ports are reachable when the pods are restarted.
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-04 11:18:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sdn-log none

Description Andre Costa 2021-04-06 13:21:15 UTC
Created attachment 1769568 [details]
sdn-log

Created attachment 1769568 [details]
sdn-log

Description of problem:
Customer has a service that deploys pods through daemonset hostPort to list on some TCP and UDP ports. Their external service uses sport to dport to connect to these pods on those same UDP and TCP ports. When it comes to TCP everything works as expected, but for UDP ports, everytime these pods are restarted/deleted, the UDP connections get stuck and no new entries are created on conntrack. This is only solved when the external service making the connection is restarted as well.
With some research I found these issues on github for the upstream Kubernetes that seemed to fixed this issue, but I was unable to see the same piece of code change on our side:

https://github.com/kubernetes/kubernetes/issues/58336
https://github.com/kubernetes/kubernetes/pull/59286
https://github.com/kubernetes/kubernetes/issues/59033

Version-Release number of selected component (if applicable):
Tested this on OCP 3.11.394 as well as 4.5.35 and 4.6.21

How reproducible:
This can be reproduced everytime even on the latest OCP4.5 and 4.6 releases.

Steps to Reproduce:
1. Create a daemonset for a service to listen on some UDP hostPort
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
  creationTimestamp: "2021-04-06T12:16:56Z"
  generation: 1
  labels:
    app: udp-server
  name: test-udp-conntrack
  namespace: test-network
  resourceVersion: "774848"
  selfLink: /apis/apps/v1/namespaces/test-network/daemonsets/test-udp-conntrack
  uid: fadbd0c5-96d1-11eb-84c5-525400ea502a
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      name: test-udp-conntrack
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: udp-server
        name: test-udp-conntrack
    spec:
      containers:
      - args:
        - python /etc/test/server.py $POD_IP 8888
        command:
        - /bin/bash
        - -c
        env:
        - name: HOST_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.hostIP
        - name: POD_IP
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: status.podIP
        image: registry.redhat.io/rhscl/python-36-rhel7:latest
        imagePullPolicy: IfNotPresent
        name: udp-server
        ports:
        - containerPort: 8888
          hostPort: 8888
          name: udpcon
          protocol: UDP
        - containerPort: 8888
          hostPort: 8888
          name: tcpcon
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 256Mi
          requests:
            cpu: 20m
            memory: 128Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/test
          name: server-script
      dnsPolicy: ClusterFirst
      nodeSelector:
        hostport: "true"
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 10
      volumes:
      - configMap:
          defaultMode: 420
          name: server-script

test-udp-conntrack-7gn7m   1/1       Running   0          16m       10.128.6.96    ocp-infra-node2.openshift3.redhatrules.local   <none>
test-udp-conntrack-99f44   1/1       Running   0          16m       10.128.3.94    ocp-infra-node1.openshift3.redhatrules.local   <none>
test-udp-conntrack-lz8n6   1/1       Running   0          16m       10.128.5.113   ocp-infra-node4.openshift3.redhatrules.local   <none>
test-udp-conntrack-md8dq   1/1       Running   0          16m       10.128.4.156   ocp-infra-node3.openshift3.redhatrules.local   <none>

2. On the nodes I see HOSTPORTS nat rules are created:
# iptables -t nat -nvL | grep 8888
    0     0 KUBE-HP-LGATJ4G47OMMBLH3  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ tcp dpt:8888
    0     0 KUBE-HP-2XVMR6Y6WGJIEYMN  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ udp dpt:8888
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.128.6.96          0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */
    0     0 DNAT       udp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ udp to:10.128.6.96:8888
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.128.6.96          0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */
    0     0 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-7gn7m_test-network hostport 8888 */ tcp to:10.128.6.96:8888

3. Start some connection to UDP on some node IP to port 8888 and monitor conntrack entries:

 $ python3 udp_test-client.py 172.23.190.40 8888

 # watch -n2 conntrack -L -p udp --dport=8888
udp      17 179 src=172.23.188.1 dst=172.23.190.40 sport=8888 dport=8888 src=10.128.6.96 dst=172.23.188.1 sport=8888 dport=8888 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.

4. Delete the pod on these node and notice that when the timeout countdown reaches 0, entry disappears and no new one is created. Also checking iptables we see that it was updated as expected:

 $ oc delete pod test-udp-conntrack-7gn7m
 $ oc get pods -o wide
test-udp-conntrack-99f44   1/1       Running   0          44m       10.128.3.94    ocp-infra-node1.openshift3.redhatrules.local   <none>
test-udp-conntrack-j5sd4   1/1       Running   0          9s        10.128.6.97    ocp-infra-node2.openshift3.redhatrules.local   <none>
test-udp-conntrack-lz8n6   1/1       Running   0          44m       10.128.5.113   ocp-infra-node4.openshift3.redhatrules.local   <none>
test-udp-conntrack-md8dq   1/1       Running   0          44m       10.128.4.156   ocp-infra-node3.openshift3.redhatrules.local   <none>

  # iptables -t nat -nvL | grep 8888
    0     0 KUBE-HP-5RALKNMEBO6Q4WVO  udp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-j5sd4_test-network hostport 8888 */ udp dpt:8888
    0     0 KUBE-HP-MJ2HYBIHG6LVBSHG  tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            /* test-udp-conntrack-j5sd4_test-network hostport 8888 */ tcp dpt:8888
    0     0 KUBE-MARK-MASQ  all  --  *      *       10.128.6.97          0.0.0.0/0            /* test-udp-c 

  # watch -n2 conntrack -L -p udp --dport=8888
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.

5. When restarting the client connection is established again:

 $ python3 udp_test-client.py 172.23.190.40 8888
 CTRL+C
 $ python3 udp_test-client.py 172.23.190.40 8888

 # watch -n2 conntrack -L -p udp --dport=8888
onntrack v1.4.4 (conntrack-tools): 1 flow entries have been shown.
udp	 17 179 src=172.23.188.1 dst=172.23.190.40 sport=8888 dport=8888 src=10.128.6.97 dst=172.23.188.1 sport=8888 dport=8888 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

Actual results:
UDP contrack entries are not recreated if the pod is deleted and new pod IP is created.

Expected results:
Node to be able to clean old entries and new entries to be created without needing to restart external services.

Additional info:
Python scripts on my side were kindly provided by the customer as they only creates a service that listens on UDP socket on the podIP and the client that establishes the connection to UDP port on the nodeIP. 

 $ oc exec test-udp-conntrack-j5sd4 -- ps auxwww
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000090+      1  0.0  0.0  32072  6664 ?        Ss   13:01   0:00 python /etc/test/server.py 10.128.6.97 8888

This reproducer was to mimic what their application does. They are implementing Hashicorp Consul and it has 3 consul servers external to OCP and on OCP there is a daemonset that deploys the consul agents. These services connect to each other through a specific port on UDP and TCP.

Another additional information that might be interesting is the fact that this behaviour happens even if we set hostNetwork: true on the daemonset and open the ports on the OS_FIREWALL_ALLOW chain in the nodes, the same issue happens even though the pods get always the hostIP and no NAT is involved.

Comment 3 Stephen Cuppett 2021-04-22 12:48:05 UTC
This bug reported on 3.11.z and reproduces all the way up to 4.6.z. Setting source release to 3.11 and target to 4.8.0 to get a fix in our development branch and then clone/backport as far as needed/requested.

Comment 14 Andre Costa 2021-04-30 15:14:43 UTC
Hi Everyone.

Because RHCOS and toolbox have neither the conntrack-tools I was able to do that with this:

https://access.redhat.com/articles/5929341

Comment 15 Antonio Ojea 2021-04-30 15:28:52 UTC
and did it solve the issue?

Comment 18 Peter Hunt 2021-05-05 20:09:03 UTC
I continue to be unable to reproduce, but I still believe the bug exists.

@andcosta would you be able to use `rpm-ostree install` to install `conntrack-tools` and restart the node, and see if this issue persists? If adding conntrack fixes the issue, I'll work with coreOS team to get it in RHCOS

Comment 48 Antonio Ojea 2021-06-07 14:18:44 UTC
It turns out that opernshift/dockershim hostport doesn't implement the conntrack deletion, CRIO does.

Is this something we can test with steps on https://bugzilla.redhat.com/show_bug.cgi?id=1946593#c37 ?
https://github.com/openshift/origin/pull/26206

Comment 67 errata-xmlrpc 2021-08-04 11:18:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.487 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2928