2080069 – LoadBalancer SCTP service leaves stale conntrack entry that causes issues if service is recreated

Bug 2080069 - LoadBalancer SCTP service leaves stale conntrack entry that causes issues if service is recreated

Summary: LoadBalancer SCTP service leaves stale conntrack entry that causes issues if ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.10.z
Assignee:	Surya Seetharaman
QA Contact:	Weibin Liang
Docs Contact:
URL:
Whiteboard:
Depends On:	2053609
Blocks:	2090315
TreeView+	depends on / blocked

Reported:	2022-04-28 22:04 UTC by Surya Seetharaman
Modified:	2022-09-03 12:26 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2090315 (view as bug list)
Environment:
Last Closed:	2022-05-18 11:51:03 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 1061	0	None	open	Bug 2080069: Cleanup conntrack entries when services are deleted	2022-04-28 22:19:55 UTC
Red Hat Product Errata	RHBA-2022:2178	0	None	None	None	2022-05-18 11:51:17 UTC

Description Surya Seetharaman 2022-04-28 22:04:29 UTC

This bug was initially created as a copy of Bug #2053609

I am copying this bug because: 



Description of problem:

If an SCTP LoadBalancer service is deleted and re-created later with the same load balancer IP but different cluster IP, there is an old conntrack entry that causes packets to be still dnatted to the old cluster IP instead of the new one. 

The iptables rules are correct, it is just the entry in the conntrack table what seems to wreak havoc. If that entry is manually removed, everything starts working.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Consistently at customer side

Steps to Reproduce:
1. Install a LoadBalancer service with a SCTP port
2. Use it normally
3. Delete everything (in our case, it is done with helm uninstall, but that's not relevant)
4. Wait some time
5. Re-create the same LoadBalancer service such that it has the same load balancer IP, but let a different cluster IP be chosen at random.

Actual results:

Traffic is dnatted to old cluster IP and does not reach its destination.

Expected results:

Traffic to be dnatted to the right cluster IP and reach its destination

Additional info:

conntrack -D -r $OLD_CLUSTER_IP workarounds the issue.

More information in comments.

Comment 7 Weibin Liang 2022-05-11 15:00:26 UTC

Testing passed in 4.10.0-0.nightly-2022-05-10-182617 by following the verifying steps from https://github.com/ovn-org/ovn-kubernetes/pull/2829

Comment 9 errata-xmlrpc 2022-05-18 11:51:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2178

Note You need to log in before you can comment on or make changes to this bug.