Bug 2061002 - Conntrack entry is not removed for LoadBalancer IP
Summary: Conntrack entry is not removed for LoadBalancer IP
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.11.0
Assignee: Dan Winship
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks: 2063885
TreeView+ depends on / blocked
 
Reported: 2022-03-04 21:47 UTC by Craig Robinson
Modified: 2022-08-10 10:52 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Conntrack entries for LoadBalancer IPs were not removed when the service endpoints were removed Consequence: Connections to the LoadBalancer IP might fail Fix: Conntrack entries are now cleaned up properly Result: Connections do not fail
Clone Of:
: 2063885 (view as bug list)
Environment:
Last Closed: 2022-08-10 10:52:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 104009 0 None Merged delete stale UDP conntrack entries for loadbalancer IPs 2022-03-10 20:29:45 UTC
Github openshift sdn pull 399 0 None Merged Rebase SDN k8 1.23.4 2022-03-14 14:59:14 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:52:30 UTC

Description Craig Robinson 2022-03-04 21:47:55 UTC
Description of problem:

The problem occurs when we delete the CRD, which in turns deletes the Service and the Deployment->Rs->Pod and we create it again, the old conntrack entry is still there trying to be use and it is never removed, still pointing to a pod which does not exist anymore.

We have a service of type LoadBalancer:


sgw1-s4s11                LoadBalancer      198.230.109.36   2123:31645/UDP


This points to a certain pod created by  a deployment: sgw1-gtpctrl > Replica Set -> Pod.
Both the pod and the service are created by our own controller based on a CRD.


The traffic works and we have a conntrack table in the nodes like this:


node-1

conntrack v1.4.4 (conntrack-tools): 21606 flow entries have been shown.
udp      17 80 src=20.130.0.98 dst=172.30.0.10 sport=52123 dport=53 src=20.130.0.14 dst=20.130.0.98 sport=5353 dport=52123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

node-2

conntrack v1.4.4 (conntrack-tools): 9303 flow entries have been shown.
udp      17 116 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 114 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.111 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 116 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 85 src=20.129.0.196 dst=172.30.102.94 sport=32433 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32433 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

node-3

conntrack v1.4.4 (conntrack-tools): 3458 flow entries have been shown.
udp      17 113 src=20.128.0.133 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 115 src=20.128.0.133 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 84 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32433 src=20.129.0.196 dst=20.128.0.130 sport=32433 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 115 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 src=20.129.0.1 dst=20.128.0.133 sport=53891 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 105 src=20.128.0.133 dst=20.128.0.129 sport=4248 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=4248 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 84 src=20.128.0.133 dst=172.30.160.132 sport=12233 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12233 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 115 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
When we delete the pod, it is recreated and the traffic still works, the conntrack entries look then like this:


node-1

conntrack v1.4.4 (conntrack-tools): 11950 flow entries have been shown.

node-2

conntrack v1.4.4 (conntrack-tools): 6114 flow entries have been shown.
udp      17 113 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 70 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 src=20.128.0.133 dst=20.129.0.111 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2
udp      17 119 src=20.129.0.226 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.129.0.226 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.226 dst=20.129.0.1 sport=2123 dport=62030 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

node-3

conntrack v1.4.4 (conntrack-tools): 3081 flow entries have been shown.
udp      17 69 src=20.128.0.133 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.128.0.133 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 88 src=20.128.0.133 dst=172.30.160.132 sport=12226 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 118 src=20.128.0.129 dst=20.129.0.226 sport=2123 dport=12226 src=20.129.0.226 dst=20.128.0.129 sport=12226 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 88 src=20.128.0.133 dst=20.129.0.1 sport=2123 dport=53891 src=20.129.0.1 dst=20.128.0.133 sport=53891 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 78 src=20.128.0.133 dst=20.128.0.129 sport=4248 dport=2123 src=20.128.0.129 dst=20.128.0.133 sport=2123 dport=4248 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 112 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
Now, the problem occurs when we delete the CRD, which in turns deletes the Service and the Deployment->Rs->Pod and we create it again, the old conntrack entry is still there trying to be use and it is never removed, still pointing to a pod which does not exist anymore:


udp      17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 


node-1
conntrack v1.4.4 (conntrack-tools): 16397 flow entries have been shown.

node-2,
conntrack v1.4.4 (conntrack-tools): 7611 flow entries have been shown.
udp      17 119 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2
udp      17 98 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

node-3
conntrack v1.4.4 (conntrack-tools): 3332 flow entries have been shown.
When we delete the conntrack entries for the service:


sudo conntrack -D -d 198.230.109.36
The traffic resumes normally and new entries are created:


udp      17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1


node-1
conntrack v1.4.4 (conntrack-tools): 18168 flow entries have been shown.

node-2
conntrack v1.4.4 (conntrack-tools): 8467 flow entries have been shown.
udp      17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.128.0.145 dst=20.129.0.1 sport=2123 dport=60424 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 119 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

node-3

conntrack v1.4.4 (conntrack-tools): 3406 flow entries have been shown.
[kni@provisioner pcrf 2022-03-02 21:06:09]$ for i in 1 2 3; do tmpssh core@node-$i "sudo conntrack -L | grep 2123" ; done
Warning: Permanently added 'node-1,10.62.1.3' (ECDSA) to the list of known hosts.
conntrack v1.4.4 (conntrack-tools): 17665 flow entries have been shown.
udp      17 21 src=20.130.0.98 dst=172.30.0.10 sport=42123 dport=53 src=20.130.0.14 dst=20.130.0.98 sport=5353 dport=42123 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
Warning: Permanently added 'node-2,10.62.1.4' (ECDSA) to the list of known hosts.
conntrack v1.4.4 (conntrack-tools): 7736 flow entries have been shown.
udp      17 114 src=20.129.0.196 dst=172.30.102.94 sport=32426 dport=2123 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 24 src=20.129.0.228 dst=20.129.0.227 sport=4248 dport=2123 src=20.129.0.227 dst=20.129.0.228 sport=2123 dport=4248 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 118 src=20.129.0.228 dst=172.30.193.249 sport=12226 dport=2123 src=20.129.0.227 dst=20.129.0.228 sport=2123 dport=12226 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.228 dst=20.129.0.1 sport=2123 dport=45074 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
udp      17 95 src=20.129.0.228 dst=10.62.2.44 sport=2123 dport=2123 src=20.129.0.111 dst=20.129.0.228 sport=2123 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
Warning: Permanently added 'node-3,10.62.1.5' (ECDSA) to the list of known hosts.
conntrack v1.4.4 (conntrack-tools): 3338 flow entries have been shown.
udp      17 113 src=20.128.0.130 dst=20.129.0.196 sport=2123 dport=32426 src=20.129.0.196 dst=20.128.0.130 sport=32426 dport=2123 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1


Version-Release number of selected component (if applicable): 4.8.29


How reproducible:  See below.


Steps to Reproduce:

Our CRDs are too complex and depend on our own operator for deployment.
It will not help you on this case to have the definitions.

But I can give you a run down on how to reproduce something similar:


We have a Load balancer service, using UDP port 2123:


$ oc get services -n casa  sgw1-s4s11
NAME         TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)          AGE
sgw1-s4s11   LoadBalancer   172.30.75.27   198.230.109.36   2123:32505/UDP   101m

This service has been created by our operator based on an sgw1 instance of our own CRD:


$ oc get service -n casa sgw1-s4s11 -o yaml | grep ownerReferences: -A5
  ownerReferences:
  - apiVersion: axyom.casa.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: AxyomSGW
    name: sgw1

This service point to a pod which has been deployed with a Deployment->Replicaset->Pod which are daisy chained in ownership all the way to that CRD:


$ oc get AxyomService  -n casa sgw1-gtpctrl -o yaml | grep ownerReferences: -A5
  ownerReferences:
  - apiVersion: axyom.casa.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: AxyomSGW
    name: sgw1
$ oc get deployment -n casa sgw1-gtpctrl -o yaml | grep ownerReferences: -A5
  ownerReferences:
  - apiVersion: axyom.casa.io/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: AxyomService
    name: sgw1-gtpctrl
$ oc get replicaset  -n casa sgw1-gtpctrl-6d787d979f -o yaml | grep ownerReferences: -A5
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: Deployment
    name: sgw1-gtpctrl
$ oc get pod -n casa sgw1-gtpctrl-6d787d979f-gzbcf -o yaml | grep ownerReferences: -A5
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: sgw1-gtpctrl-6d787d979f

Once this is setup, a client pod is sending UDP traffic on port 2123 and getting responses from the pod:  sgw1-gtpctrl-6d787d979f-gzbcf.
In every node we can check the conntrack tables by using for example:


 for i in 1 2 3; do ssh core@node-$i "sudo conntrack -L | grep 2123" ; done

And we can see a healthy list of conntrack entries.

If we now delete the base instance, it deletes the whole chain of objects, but the conntrack entries remain and they slowly expire, except for the one the client is still trying to hit:


oc delete AxyomSGW sgw1

If we check the conntrack entries after a few minutes we will still see then one the client is using alive in the node the client is running, something like this:


[kni@provisioner pcrf 2022-03-02 21:06:04]$ for i in 1 2 3; do ssh core@node-$i "sudo conntrack -L | grep 2123" ; done
udp      17 118 src=20.129.0.111 dst=198.230.109.36 sport=2123 dport=2123 src=20.129.0.228 dst=20.129.0.1 sport=2123 dport=45074 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

If we now redeploy our objects, the conntrack will still show only that entry until we manually delete it, which something like:


for i in 1 2 3; do ssh core@node-$i "sudo conntrack -D -d 198.230.109.36" ; done 

At that point new entries will be created correctly for the new traffic flow.



Actual results:
Conntrack entries are not removed after the CRD instance is deleted.

Expected results:
Conntrack entries are removed when CRD instance is deleted.

Additional info:  Two must gathers a available and were created before the test and at the point the stale conntrack entry was found. See case 03163431.

Comment 3 Tim Rozet 2022-03-10 20:29:14 UTC
It looks like cleaning up stale conntrack for UDP load balancer IP was fixed by: https://github.com/kubernetes/kubernetes/pull/104009

Comment 4 Dan Winship 2022-03-14 14:59:14 UTC
Already fixed in master by rebase to kube 1.23.4

Comment 6 zhaozhanqi 2022-03-21 07:35:44 UTC
@jechen could you help verify this bug, thanks

Comment 8 jechen 2022-03-29 14:35:28 UTC
I am not sure if I see expected result either, I tried on a cluster with SDN plugin with latest 4.11 nightly image.

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-27-140854   True        False         14m     Cluster version is 4.11.0-0.nightly-2022-03-27-140854


$ oc get network -o jsonpath='{.items[*].status.networkType}'
OpenShiftSDN

$ oc get node -owide
NAME                                 STATUS   ROLES    AGE   VERSION           INTERNAL-IP     EXTERNAL-IP     OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
jechen-0329a-ltcz8-compute-0         Ready    worker   73m   v1.23.3+54654d2   172.31.248.53   172.31.248.53   Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
jechen-0329a-ltcz8-compute-1         Ready    worker   73m   v1.23.3+54654d2   172.31.248.60   172.31.248.60   Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
jechen-0329a-ltcz8-control-plane-0   Ready    master   85m   v1.23.3+54654d2   172.31.248.31   172.31.248.31   Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
jechen-0329a-ltcz8-control-plane-1   Ready    master   85m   v1.23.3+54654d2   172.31.248.49   172.31.248.49   Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8
jechen-0329a-ltcz8-control-plane-2   Ready    master   85m   v1.23.3+54654d2   172.31.248.63   172.31.248.63   Red Hat Enterprise Linux CoreOS 411.85.202203242008-0 (Ootpa)   4.18.0-348.20.1.el8_5.x86_64   cri-o://1.24.0-5.rhaos4.11.gitd020fdb.el8


# pick a random port number 9151, before creating service with LB, check conntrack entires on worker nodes, I am getting
$ oc debug node/jechen-0329a-ltcz8-compute-0
Starting pod/jechen-0329a-ltcz8-compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.53
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep port=9151
conntrack v1.4.4 (conntrack-tools): 417 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...
[jechen@jechen ~]$ oc debug node/jechen-0329a-ltcz8-compute-1
Starting pod/jechen-0329a-ltcz8-compute-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.60
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep port=9151
conntrack v1.4.4 (conntrack-tools): 410 flow entries have been shown.
sh-4.4# 
sh-4.4# 
sh-4.4# exit
exit
sh-4.4# exit
exit


# create project test1, create pod/rc/service in it

$ oc new-project test1

$ cat pods_with_service_LB.yaml
---
apiVersion: v1
kind: List
items:
- apiVersion: v1
  kind: ReplicationController
  metadata:
    labels:
      name: test-rc
    name: test-rc
  spec:
    replicas: 4
    template:
      metadata:
        labels:
          name: test-pods
      spec:
        containers:
        - command:
          - "/usr/bin/ncat"
          - "-u"
          - "-l"
          - '8080'
          - "--keep-open"
          - "--exec"
          - "/bin/cat"
          image: quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197
          name: test-pod
          imagePullPolicy: Always
          resources:
            limits:
              memory: 340Mi
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      name: test-service
    name: test-service
  spec:
    ports:
    - name: http
      port: 9151
      protocol: UDP
      targetPort: 8080
    externalIPs:
    - 172.31.248.53 
    selector:
      name: test-pods
    type: LoadBalancer


$ oc get all
NAME                READY   STATUS    RESTARTS   AGE
pod/test-rc-bppts   1/1     Running   0          29s
pod/test-rc-hfflw   1/1     Running   0          29s
pod/test-rc-sghcz   1/1     Running   0          29s
pod/test-rc-z9p6n   1/1     Running   0          29s

NAME                            DESIRED   CURRENT   READY   AGE
replicationcontroller/test-rc   4         4         4       29s

NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)          AGE
service/test-service   LoadBalancer   172.30.63.117   172.31.248.53   9151:31351/UDP   29s


# access the service from another test pod in project test2

$ for i in {1..4} ; do oc exec -n test2 test-rc-4ngc4 -i -- bash -c \(echo\ test\ \;\ sleep\ 1\ \;\ echo\ test\)\ \|\ /usr/bin/ncat\ -u\ 172.31.248.53\ 9151; done
test
test
test
test


# delete svc/rc/pods, check conntrack entries again on each node
$ oc delete svc --all -n test1
service "test-service" deleted
$ oc delete rc --all -n test1
replicationcontroller "test-rc" deleted

$ oc get all -n test1
No resources found in test1 namespace.

$ oc debug node/jechen-0329a-ltcz8-compute-0
Starting pod/jechen-0329a-ltcz8-compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.53
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 425 flow entries have been shown.
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 425 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

$ oc debug node/jechen-0329a-ltcz8-compute-1
Starting pod/jechen-0329a-ltcz8-compute-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.60
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 423 flow entries have been shown.
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 422 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

$ oc debug node/jechen-0329a-ltcz8-control-plane-0
Starting pod/jechen-0329a-ltcz8-control-plane-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.31
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1309 flow entries have been shown.
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1311 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

$ oc debug node/jechen-0329a-ltcz8-control-plane-1
Starting pod/jechen-0329a-ltcz8-control-plane-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.49
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1456 flow entries have been shown.
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1459 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

$ oc debug node/jechen-0329a-ltcz8-control-plane-2
Starting pod/jechen-0329a-ltcz8-control-plane-2-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.63
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1684 flow entries have been shown.
sh-4.4# conntrack -L |grep 9151
conntrack v1.4.4 (conntrack-tools): 1687 flow entries have been shown.
sh-4.4# exit
exit
sh-4.4# exit
exit

Removing debug pod ...

Comment 11 jechen 2022-04-01 01:03:25 UTC
Verified in 4.11.0-0.nightly-2022-03-29-152521

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-29-152521   True        False         8h      Error while reconciling 4.11.0-0.nightly-2022-03-29-152521: some cluster operators have not yet rolled out

# created test service with metalLB setup on a BM machine 

# cat list.yaml
---
apiVersion: v1
kind: List
items:
- apiVersion: v1
  kind: ReplicationController
  metadata:
    labels:
      name: test-rc
    name: test-rc
  spec:
    replicas: 7
    template:
      metadata:
        labels:
          name: test-pods
      spec:
        containers:
        - command:
          - "/usr/bin/ncat"
          - "-u"
          - "-l"
          - '8080'
          - "--keep-open"
          - "--exec"
          - "/bin/cat"
          image: quay.io/openshifttest/hello-sdn@sha256:2af5b5ec480f05fda7e9b278023ba04724a3dd53a296afcd8c13f220dec52197
          name: test-pod
          imagePullPolicy: Always
          resources:
            limits:
              memory: 340Mi
- apiVersion: v1
  kind: Service
  metadata:
    labels:
      name: test-service
    name: test-service
  spec:
    ports:
    - name: http
      port: 8080
      protocol: UDP
      targetPort: 8080
    selector:
      name: test-pods
    type: LoadBalancer



# oc get all
NAME                READY   STATUS    RESTARTS   AGE
pod/test-rc-7npzr   1/1     Running   0          2m17s
pod/test-rc-fj6cc   1/1     Running   0          2m17s
pod/test-rc-nmw4h   1/1     Running   0          2m17s
pod/test-rc-p28rg   1/1     Running   0          2m17s
pod/test-rc-s7qhp   1/1     Running   0          2m17s
pod/test-rc-ssxbh   1/1     Running   0          2m17s
pod/test-rc-ts62j   1/1     Running   0          2m17s

NAME                            DESIRED   CURRENT   READY   AGE
replicationcontroller/test-rc   7         7         7       2m17s

NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP    PORT(S)          AGE
service/test-service   LoadBalancer   172.30.189.202   10.73.116.58   8080:30357/UDP   2m17s



# from a test pod in another project 
oc -n j1 rsh test-rc-n9wht 
~ $  (while true ; sleep 1;  do echo "hello"; done) | ncat -u 10.73.116.58 8080
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
hello
3:44


# check conntrack entry from node where pod resides
# oc debug node/dell-per740-14.rhts.eng.pek2.redhat.com
Starting pod/dell-per740-14rhtsengpek2redhatcom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.73.116.62
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4#  conntrack -L | grep 8080 | grep 10.73.116.58
udp      17 119 src=10.128.2.24 dst=10.73.116.58 sport=38860 dport=8080 src=10.128.2.29 dst=10.128.2.1 sport=8080 dport=28019 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
tcp      6 297 ESTABLISHED src=10.73.116.50 dst=10.73.116.58 sport=58080 dport=2379 [UNREPLIED] src=10.73.116.58 dst=10.73.116.50 sport=2379 dport=58080 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1277 flow entries have been shown.
sh-4.4#


# Then, delete the service and rc, pods would be terminated after
[root@dell-per740-36 ~]#  oc delete service/test-service 
service "test-service" deleted
[root@dell-per740-36 ~]# oc delete replicationcontroller/test-rc 
replicationcontroller "test-rc" deleted
[root@dell-per740-36 ~]# 
[root@dell-per740-36 ~]# 
[root@dell-per740-36 ~]# oc get all
No resources found in j2 namespace.


# wait a little, then check conntrack entry again on the node
sh-4.4#  conntrack -L | grep 8080 | grep 10.73.116.58
tcp      6 297 ESTABLISHED src=10.73.116.50 dst=10.73.116.58 sport=58080 dport=2379 [UNREPLIED] src=10.73.116.58 dst=10.73.116.50 sport=2379 dport=58080 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1294 flow entries have been shown.
sh-4.4# 
sh-4.4# 
sh-4.4# 



==>. conntrack entry for this UDP test-service is removed correctly.

Comment 13 errata-xmlrpc 2022-08-10 10:52:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.