Bug 1537780 - Conntrack rule for UDP traffic is not removed when using NodePort [NEEDINFO]
Summary: Conntrack rule for UDP traffic is not removed when using NodePort
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.1.0
Assignee: Ben Bennett
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1659194 1659204
TreeView+ depends on / blocked
 
Reported: 2018-01-23 20:29 UTC by Ryan Howe
Modified: 2020-02-21 01:40 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1659194 (view as bug list)
Environment:
Last Closed: 2019-06-04 10:40:18 UTC
Target Upstream Version:
agawand: needinfo? (jtanenba)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Github /kubernetes kubernetes pull 71573 None None None 2020-02-25 16:23:32 UTC
Origin (Github) 21655 None None None 2018-12-13 19:45:55 UTC
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:40:28 UTC

Description Ryan Howe 2018-01-23 20:29:05 UTC
Description of problem:

The pod is restarted and has a new pod IP. There is a long living stream of udp packets to the nodeport created for this this pod. The old conntrack table entry pointing to the old IP of the pod is never cleaned up. 

This pull was created for new services created for Node Port but I do not think it takes in account when the endpoint is deleted. 
https://github.com/kubernetes/kubernetes/pull/32561   

I believe we might need to add somthing here to clean up any entries for endpoint changes for pods with node ports and not just endpoint changes for services. 

https://github.com/kubernetes/kubernetes/blob/release-1.7/pkg/proxy/iptables/proxier.go#L964-L980


Version-Release number of selected component (if applicable):
 3.X  

How reproducible:
100%


Info:

App 11.11.2.139 
NodePort: 31803

UDP Traffic continuous test A: 22.22.100.123:5060 to 22.22.113.140:31803.  

Non continuous test B 22.22.113.58:2000 to 22.22.113.140:31803



# conntrack -L | grep 31803
Test B:
udp 17 46 src=22.22.113.58 dst=22.22.113.140 sport=2000 dport=31803 src=11.11.2.139 dst=11.11.2.1 sport=5060 dport=2000 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

Test A: 
udp 17 177 src=22.22.100.123 dst=22.22.113.140 sport=5060 dport=31803 src=11.11.2.139 dst=11.11.2.1 sport=5060 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2
conntrack v1.4.4 (conntrack-tools): 653 flow entries have been shown.


- The app is restarted and is now running on 11.11.2.142. 
- Test A has been sending UDP packets every 5 seconds and is getting no replies.  
- Conntrack shows a rule which show traffic from the client (22.22.100.123 ) should still being sent to 11.11.2.139.  

- Test B sourcing from 22.22.113.58:2000 and destined for 22.22.113.14:31803 works because the traffic was not continuous, the conntrack entry shows this traffic is now sent to 11.11.2.142. 

# conntrack -L | grep 31803
TEST B
udp 17 178 src=22.22.113.58 dst=22.22.113.140 sport=2000 dport=31803 src=11.11.2.142 dst=11.11.2.1 sport=5060 dport=2000 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1

TEST A
udp 17 177 src=22.22.100.123 dst=22.22.113.140 sport=5060 dport=31803 src=11.11.2.139 dst=11.11.2.1 sport=5060 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2
conntrack v1.4.4 (conntrack-tools): 674 flow entries have been shown.



The conntrack entry for 22.22.100.123 is manually deleted and a new entry shows up using the correct destination IP address of 11.11.2.142. TEST A is now showing traffic.

# conntrack -D -s 22.22.100.123
udp 17 176 src=22.22.100.123 dst=22.22.113.140 sport=5060 dport=31803 src=11.11.2.139 dst=11.11.2.1 sport=5060 dport=5060 [ASSURED] mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 1 flow entries have been deleted.

# conntrack -L | grep 31803
dp 17 28 src=22.22.100.123 dst=22.22.113.140 sport=5060 dport=31803 src=11.11.2.142 dst=11.11.2.1 sport=5060 dport=5060 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
conntrack v1.4.4 (conntrack-tools): 648 flow entries have been shown.

Comment 22 Greg Rodriguez II 2018-11-12 21:50:35 UTC
Customer confirmed they will be moving to OCP v3.11 in the next few weeks and would like to know if there will be an errata on OCP v3.11.x in the near future that address this issue.

Comment 24 jspahr 2018-11-20 02:30:25 UTC
With the short release cycles of k8s and OpenShift, a backport is likely overkill unless the effort is trivial.

We would like to see this fixed upstream as soon as possible so we have a clear target on what OpenShift release this will be fixed in.

Comment 25 jtanenba 2018-12-04 15:57:51 UTC
I attached a link to the upstream PR 

https://github.com//kubernetes/kubernetes/pull/71573

Comment 27 jtanenba 2018-12-13 19:02:18 UTC
Posted origin port from upstream

https://github.com/openshift/origin/pull/21655

Comment 32 Anurag saxena 2019-03-29 19:55:28 UTC
Verified it on 4.0.0-0.nightly-2019-03-28-210640. Conntrack entries looks as expected.

Steps taken are same as what defined in attached test case except the fact that conntrack was run using a docker image via podman run due to restrictions on CoreOS to install packages

1) Setup OCP 4.x cluster
2) Created a UDP listener pod listening on port 8080 with assigned IP say 10.128.2.12
3) Send traffic for 2-3 seconds via client pod to node having udp listener
4) An ASSURED entry gets created for 10.128.2.12 pod under conntrack table
4) Checked to make sure that ASSURED entry corresponds to 10.128.2.12
5) Deleted udp listener pod (it gets recreated with new IP 10.128.2.13 due to replica)
6) Noticed ASSURED entry related to old pod 10.128.2.12 gets erased
7) Repeated step 3 for new udp listener pod 10.128.2.13 and noticed conntrack entry exist for new pod only
   
Thanks!

Comment 34 errata-xmlrpc 2019-06-04 10:40:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.