Description of problem: The problem occurs when we delete the CRD, which in turns deletes the Service and the Deployment->Rs->Pod and we create it again, the old conntrack entry is still there trying to be use and it is never removed, still pointing to a pod which does not exist anymore. We have a service of type LoadBalancer: sgw1-s4s11 LoadBalancer 198.230.109.36 2123:31645/UDP This points to a certain pod created by a deployment: sgw1-gtpctrl > Replica Set -> Pod. Both the pod and the service are created by our own controller based on a CRD. The traffic works and we have a conntrack table in the nodes like this: node-1 conntrack v1.4.4 (conntrack-tools): 21606 flow entries have been shown. udp 17 80 src=20.130.0.98 dst=172.30.0.10 sport=52123 dport=53 src=20.130.0.14 dst=20.130.0.98 sport=5353 dport=52123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-2 conntrack v1.4.4 (conntrack-tools): 9303 flow entries have been shown. udp 17 116 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 114 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.111 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 116 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 85 src=20.129.0.196 dst=172.30.102.94 sport=32433 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32433 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-3 conntrack v1.4.4 (conntrack-tools): 3458 flow entries have been shown. udp 17 113 src=20.128.0.133 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 115 src=20.128.0.133 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 84 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32433 src=20.129.0.196 dst=20.128.0.130 sport=32433 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 115 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 src=20.129.0.1 dst=20.128.0.133 sport=53891 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 105 src=20.128.0.133 dst=20.128.0.129 sport=4248 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=4248 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 84 src=20.128.0.133 dst=172.30.160.132 sport=12233 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12233 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 115 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 When we delete the pod, it is recreated and the traffic still works, the conntrack entries look then like this: node-1 conntrack v1.4.4 (conntrack-tools): 11950 flow entries have been shown. node-2 conntrack v1.4.4 (conntrack-tools): 6114 flow entries have been shown. udp 17 113 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 70 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.111 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2 udp 17 119 src=20.129.0.226 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.129.0.226 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.226 dst=20.129.0.1 sport=2123 dport=62030 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-3 conntrack v1.4.4 (conntrack-tools): 3081 flow entries have been shown. udp 17 69 src=20.128.0.133 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 88 src=20.128.0.133 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 118 src=20.128.0.129 dst=20.129.0.226 sport=2123 dport=12226 src=20.129.0.226 dst=20.128.0.129 sport=12226 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 88 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 src=20.129.0.1 dst=20.128.0.133 sport=53891 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 78 src=20.128.0.133 dst=20.128.0.129 sport=4248 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=4248 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 112 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 Now, the problem occurs when we delete the CRD, which in turns deletes the Service and the Deployment->Rs->Pod and we create it again, the old conntrack entry is still there trying to be use and it is never removed, still pointing to a pod which does not exist anymore: udp 17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 node-1 conntrack v1.4.4 (conntrack-tools): 16397 flow entries have been shown. node-2, conntrack v1.4.4 (conntrack-tools): 7611 flow entries have been shown. udp 17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2 udp 17 98 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-3 conntrack v1.4.4 (conntrack-tools): 3332 flow entries have been shown. When we delete the conntrack entries for the service: sudo conntrack -D -d 198.230.109.36 The traffic resumes normally and new entries are created: udp 17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-1 conntrack v1.4.4 (conntrack-tools): 18168 flow entries have been shown. node-2 conntrack v1.4.4 (conntrack-tools): 8467 flow entries have been shown. udp 17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 119 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 node-3 conntrack v1.4.4 (conntrack-tools): 3406 flow entries have been shown. [kni@provisioner pcrf 2022-03-02 21:06:09]$ for i in 1 2 3; do tmpssh core@node-$i "sudo conntrack -L | grep 2123" ; done Warning: Permanently added 'node-1,10.62.1.3' (ECDSA) to the list of known hosts. conntrack v1.4.4 (conntrack-tools): 17665 flow entries have been shown. udp 17 21 src=20.130.0.98 dst=172.30.0.10 sport=42123 dport=53 src=20.130.0.14 dst=20.130.0.98 sport=5353 dport=42123 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 Warning: Permanently added 'node-2,10.62.1.4' (ECDSA) to the list of known hosts. conntrack v1.4.4 (conntrack-tools): 7736 flow entries have been shown. udp 17 114 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 24 src=20.129.0.228 dst=20.129.0.227 sport=4248 dport=2123 src=20.129.0.227 dst=20.129.0.228 sport=2123 dport=4248 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 118 src=20.129.0.228 dst=172.30.193.249 sport=12226 dport=2123 src=20.129.0.227 dst=20.129.0.228 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.228 dst=20.129.0.1 sport=2123 dport=45074 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 udp 17 95 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 Warning: Permanently added 'node-3,10.62.1.5' (ECDSA) to the list of known hosts. conntrack v1.4.4 (conntrack-tools): 3338 flow entries have been shown. udp 17 113 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 Version-Release number of selected component (if applicable): 4.8.29 How reproducible: See below. Steps to Reproduce: Our CRDs are too complex and depend on our own operator for deployment. It will not help you on this case to have the definitions. But I can give you a run down on how to reproduce something similar: We have a Load balancer service, using UDP port 2123: $ oc get services -n casa sgw1-s4s11 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE sgw1-s4s11 LoadBalancer 172.30.75.27 198.230.109.36 2123:32505/UDP 101m This service has been created by our operator based on an sgw1 instance of our own CRD: $ oc get service -n casa sgw1-s4s11 -o yaml | grep ownerReferences: -A5 ownerReferences: - apiVersion: axyom.casa.io/v1alpha2 blockOwnerDeletion: true controller: true kind: AxyomSGW name: sgw1 This service point to a pod which has been deployed with a Deployment->Replicaset->Pod which are daisy chained in ownership all the way to that CRD: $ oc get AxyomService -n casa sgw1-gtpctrl -o yaml | grep ownerReferences: -A5 ownerReferences: - apiVersion: axyom.casa.io/v1alpha2 blockOwnerDeletion: true controller: true kind: AxyomSGW name: sgw1 $ oc get deployment -n casa sgw1-gtpctrl -o yaml | grep ownerReferences: -A5 ownerReferences: - apiVersion: axyom.casa.io/v1alpha2 blockOwnerDeletion: true controller: true kind: AxyomService name: sgw1-gtpctrl $ oc get replicaset -n casa sgw1-gtpctrl-6d787d979f -o yaml | grep ownerReferences: -A5 ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: Deployment name: sgw1-gtpctrl $ oc get pod -n casa sgw1-gtpctrl-6d787d979f-gzbcf -o yaml | grep ownerReferences: -A5 ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: sgw1-gtpctrl-6d787d979f Once this is setup, a client pod is sending UDP traffic on port 2123 and getting responses from the pod: sgw1-gtpctrl-6d787d979f-gzbcf. In every node we can check the conntrack tables by using for example: for i in 1 2 3; do ssh core@node-$i "sudo conntrack -L | grep 2123" ; done And we can see a healthy list of conntrack entries. If we now delete the base instance, it deletes the whole chain of objects, but the conntrack entries remain and they slowly expire, except for the one the client is still trying to hit: oc delete AxyomSGW sgw1 If we check the conntrack entries after a few minutes we will still see then one the client is using alive in the node the client is running, something like this: [kni@provisioner pcrf 2022-03-02 21:06:04]$ for i in 1 2 3; do ssh core@node-$i "sudo conntrack -L | grep 2123" ; done udp 17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.228 dst=20.129.0.1 sport=2123 dport=45074 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 If we now redeploy our objects, the conntrack will still show only that entry until we manually delete it, which something like: for i in 1 2 3; do ssh core@node-$i "sudo conntrack -D -d 198.230.109.36" ; done At that point new entries will be created correctly for the new traffic flow. Actual results: Conntrack entries are not removed after the CRD instance is deleted. Expected results: Conntrack entries are removed when CRD instance is deleted. Additional info: Two must gathers a available and were created before the test and at the point the stale conntrack entry was found. See case 03163431.
It looks like cleaning up stale conntrack for UDP load balancer IP was fixed by: https://github.com/kubernetes/kubernetes/pull/104009
Already fixed in master by rebase to kube 1.23.4
@jechen could you help verify this bug, thanks
I am not sure if I see expected result either, I tried on a cluster with SDN plugin with latest 4.11 nightly image. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-27-140854 True False 14m Cluster version is 4.11.0-0.nightly-2022-03-27-140854 $ oc get network -o jsonpath='{.items[*].status.networkType}' OpenShiftSDN $ oc get node -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME jechen-0329a-ltcz8-compute-0 Ready worker 73m v1.23.3+54654d2 172.31.248.53 172.31.248.53 Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa) 4.18.0-348.20.1.el8_5.x86_64 cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8 jechen-0329a-ltcz8-compute-1 Ready worker 73m v1.23.3+54654d2 172.31.248.60 172.31.248.60 Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa) 4.18.0-348.20.1.el8_5.x86_64 cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8 jechen-0329a-ltcz8-control-plane-0 Ready master 85m v1.23.3+54654d2 172.31.248.31 172.31.248.31 Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa) 4.18.0-348.20.1.el8_5.x86_64 cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8 jechen-0329a-ltcz8-control-plane-1 Ready master 85m v1.23.3+54654d2 172.31.248.49 172.31.248.49 Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa) 4.18.0-348.20.1.el8_5.x86_64 cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8 jechen-0329a-ltcz8-control-plane-2 Ready master 85m v1.23.3+54654d2 172.31.248.63 172.31.248.63 Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa) 4.18.0-348.20.1.el8_5.x86_64 cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8 # pick a random port number 9151, before creating service with LB, check conntrack entires on worker nodes, I am getting $ oc debug node/jechen-0329a-ltcz8-compute-0 Starting pod/jechen-0329a-ltcz8-compute-0-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.53 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep port=9151 conntrack v1.4.4 (conntrack-tools): 417 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... [jechen@jechen ~]$ oc debug node/jechen-0329a-ltcz8-compute-1 Starting pod/jechen-0329a-ltcz8-compute-1-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.60 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep port=9151 conntrack v1.4.4 (conntrack-tools): 410 flow entries have been shown. sh-4.4# sh-4.4# sh-4.4# exit exit sh-4.4# exit exit # create project test1, create pod/rc/service in it $ oc new-project test1 $ cat pods_with_service_LB.yaml --- apiVersion: v1 kind: List items: - apiVersion: v1 kind: ReplicationController metadata: labels: name: test-rc name: test-rc spec: replicas: 4 template: metadata: labels: name: test-pods spec: containers: - command: - "/usr/bin/ncat" - "-u" - "-l" - '8080' - "--keep-open" - "--exec" - "/bin/cat" image: quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197 name: test-pod imagePullPolicy: Always resources: limits: memory: 340Mi - apiVersion: v1 kind: Service metadata: labels: name: test-service name: test-service spec: ports: - name: http port: 9151 protocol: UDP targetPort: 8080 externalIPs: - 172.31.248.53 selector: name: test-pods type: LoadBalancer $ oc get all NAME READY STATUS RESTARTS AGE pod/test-rc-bppts 1/1 Running 0 29s pod/test-rc-hfflw 1/1 Running 0 29s pod/test-rc-sghcz 1/1 Running 0 29s pod/test-rc-z9p6n 1/1 Running 0 29s NAME DESIRED CURRENT READY AGE replicationcontroller/test-rc 4 4 4 29s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/test-service LoadBalancer 172.30.63.117 172.31.248.53 9151:31351/UDP 29s # access the service from another test pod in project test2 $ for i in {1..4} ; do oc exec -n test2 test-rc-4ngc4 -i -- bash -c \(echo\ test\ \;\ sleep\ 1\ \;\ echo\ test\)\ \|\ /usr/bin/ncat\ -u\ 172.31.248.53\ 9151; done test test test test # delete svc/rc/pods, check conntrack entries again on each node $ oc delete svc --all -n test1 service "test-service" deleted $ oc delete rc --all -n test1 replicationcontroller "test-rc" deleted $ oc get all -n test1 No resources found in test1 namespace. $ oc debug node/jechen-0329a-ltcz8-compute-0 Starting pod/jechen-0329a-ltcz8-compute-0-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.53 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 425 flow entries have been shown. sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 425 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... $ oc debug node/jechen-0329a-ltcz8-compute-1 Starting pod/jechen-0329a-ltcz8-compute-1-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.60 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 423 flow entries have been shown. sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 422 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... $ oc debug node/jechen-0329a-ltcz8-control-plane-0 Starting pod/jechen-0329a-ltcz8-control-plane-0-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.31 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1309 flow entries have been shown. sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1311 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... $ oc debug node/jechen-0329a-ltcz8-control-plane-1 Starting pod/jechen-0329a-ltcz8-control-plane-1-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.49 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1456 flow entries have been shown. sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1459 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ... $ oc debug node/jechen-0329a-ltcz8-control-plane-2 Starting pod/jechen-0329a-ltcz8-control-plane-2-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.63 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1684 flow entries have been shown. sh-4.4# conntrack -L |grep 9151 conntrack v1.4.4 (conntrack-tools): 1687 flow entries have been shown. sh-4.4# exit exit sh-4.4# exit exit Removing debug pod ...
Verified in 4.11.0-0.nightly-2022-03-29-152521 # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-03-29-152521 True False 8h Error while reconciling 4.11.0-0.nightly-2022-03-29-152521: some cluster operators have not yet rolled out # created test service with metalLB setup on a BM machine # cat list.yaml --- apiVersion: v1 kind: List items: - apiVersion: v1 kind: ReplicationController metadata: labels: name: test-rc name: test-rc spec: replicas: 7 template: metadata: labels: name: test-pods spec: containers: - command: - "/usr/bin/ncat" - "-u" - "-l" - '8080' - "--keep-open" - "--exec" - "/bin/cat" image: quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197 name: test-pod imagePullPolicy: Always resources: limits: memory: 340Mi - apiVersion: v1 kind: Service metadata: labels: name: test-service name: test-service spec: ports: - name: http port: 8080 protocol: UDP targetPort: 8080 selector: name: test-pods type: LoadBalancer # oc get all NAME READY STATUS RESTARTS AGE pod/test-rc-7npzr 1/1 Running 0 2m17s pod/test-rc-fj6cc 1/1 Running 0 2m17s pod/test-rc-nmw4h 1/1 Running 0 2m17s pod/test-rc-p28rg 1/1 Running 0 2m17s pod/test-rc-s7qhp 1/1 Running 0 2m17s pod/test-rc-ssxbh 1/1 Running 0 2m17s pod/test-rc-ts62j 1/1 Running 0 2m17s NAME DESIRED CURRENT READY AGE replicationcontroller/test-rc 7 7 7 2m17s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/test-service LoadBalancer 172.30.189.202 10.73.116.58 8080:30357/UDP 2m17s # from a test pod in another project oc -n j1 rsh test-rc-n9wht ~ $ (while true ; sleep 1; do echo "hello"; done) | ncat -u 10.73.116.58 8080 hello hello hello hello hello hello hello hello hello hello hello hello hello 3:44 # check conntrack entry from node where pod resides # oc debug node/dell-per740-14.rhts.eng.pek2.redhat.com Starting pod/dell-per740-14rhtsengpek2redhatcom-debug ... To use host binaries, run `chroot /host` Pod IP: 10.73.116.62 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# conntrack -L | grep 8080 | grep 10.73.116.58 udp 17 119 src=10.128.2.24 dst=10.73.116.58 sport=38860 dport=8080 src=10.128.2.29 dst=10.128.2.1 sport=8080 dport=28019 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 tcp 6 297 ESTABLISHED src=10.73.116.50 dst=10.73.116.58 sport=58080 dport=2379 [UNREPLIED] src=10.73.116.58 dst=10.73.116.50 sport=2379 dport=58080 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 1277 flow entries have been shown. sh-4.4# # Then, delete the service and rc, pods would be terminated after [root@dell-per740-36 ~]# oc delete service/test-service service "test-service" deleted [root@dell-per740-36 ~]# oc delete replicationcontroller/test-rc replicationcontroller "test-rc" deleted [root@dell-per740-36 ~]# [root@dell-per740-36 ~]# [root@dell-per740-36 ~]# oc get all No resources found in j2 namespace. # wait a little, then check conntrack entry again on the node sh-4.4# conntrack -L | grep 8080 | grep 10.73.116.58 tcp 6 297 ESTABLISHED src=10.73.116.50 dst=10.73.116.58 sport=58080 dport=2379 [UNREPLIED] src=10.73.116.58 dst=10.73.116.50 sport=2379 dport=58080 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1 conntrack v1.4.4 (conntrack-tools): 1294 flow entries have been shown. sh-4.4# sh-4.4# sh-4.4# ==>. conntrack entry for this UDP test-service is removed correctly.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069