Bug 1985336

Summary: OpenShift SDN doesn't add NOTRACK rule to raw iptables table to prevent vxlan from reaching conntrack
Product: OpenShift Container Platform Reporter: Pablo Alonso Rodriguez <palonsor>
Component: NetworkingAssignee: Andrew Stoycos <astoycos>
Networking sub component: openshift-sdn QA Contact: Ying Wang <yingwang>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: astoycos, jeharris, ktenzer, palonsor, rcernin, rhowe, snetting, surya, tmanor, yingwang
Version: 4.6   
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-18 17:40:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1995871    

Description Pablo Alonso Rodriguez 2021-07-23 12:09:05 UTC
Description of problem:

OpenShift SDN uses vxlan for node-to-node traffic encapsulation but it doesn't add NOTRACK iptables rules to the raw table, so conntrack still processes OCP traffic even when not necessary.

This can have severe performance implications in a heavily loaded environment, leading to a non-working environment.

Version-Release number of selected component (if applicable):

Found on 4.6, but should apply to any OCP version including 3.11 and any 4.y as long as openshift-sdn is in use

How reproducible:

Always

Steps to Reproduce:
1. iptables -t raw -S


Actual results:

-P PREROUTING ACCEPT
-P OUTPUT ACCEPT

Expected results:

-P PREROUTING ACCEPT
-P OUTPUT ACCEPT
-A PREROUTING -p udp -m udp --dport 4789 -j NOTRACK
-A OUTPUT -p udp -m udp --dport 4789 -j NOTRACK

Additional info:

More context on performance impact in comments.

Besides, if this had been fixed, it could also have prevented OpenShift from being affected by any of the multiple conntrack bugs that impacted OCP in the past.

Comment 2 Pablo Alonso Rodriguez 2021-07-26 12:47:28 UTC
I have opened a pull request with a fix for master branch: https://github.com/openshift/sdn/pull/324 

Can you please review?

Thanks in advance and regards.

Comment 3 Stephen Cuppett 2021-07-28 01:27:57 UTC
*** Bug 1973864 has been marked as a duplicate of this bug. ***

Comment 4 Ying Wang 2021-07-29 06:10:28 UTC
I deployed a cluster using https://github.com/openshift/sdn/pull/324 from cluster-bot, and checked on nodes, iptables rules are added.

sh-4.4# iptables -t raw -S
-P PREROUTING ACCEPT
-P OUTPUT ACCEPT
-N OPENSHIFT-VXLAN-NOTRACK
-A PREROUTING -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-VXLAN-NOTRACK
-A OUTPUT -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-VXLAN-NOTRACK
-A OPENSHIFT-VXLAN-NOTRACK -p udp -m udp --dport 4789 -j NOTRACK


lilia@liliadeMacBook-Pro Downloads % oc version
Client Version: 4.7.5
Server Version: 4.8.0-0.ci.test-2021-07-29-030151-ci-ln-210rw9t-latest
Kubernetes Version: v1.21.1-1394+051ac4f6786868-dirty
lilia@liliadeMacBook-Pro Downloads %

Comment 5 Surya Seetharaman 2021-08-03 11:48:01 UTC
@trozet: Checked in OVN-K. We are already disabling conntrack for geneve. All good (https://github.com/openshift/cluster-network-operator/blob/f202ceea725fc3cf315b3883206462a2b4defadd/bindata/network/ovn-kubernetes/ovnkube-node.yaml#L215).

2021-06-04T19:21:11.970092353Z + echo 'I0604 19:21:11.969676557 - disable conntrack on geneve port'
2021-06-04T19:21:11.970104775Z I0604 19:21:11.969676557 - disable conntrack on geneve port
2021-06-04T19:21:11.970127952Z + iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK
2021-06-04T19:21:11.983552565Z + iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK

Comment 13 Ying Wang 2021-08-23 09:11:41 UTC
Checked on version 4.9.0-0.nightly-2021-08-22-070405 with SDN network, and get iptables on nodes as below.


sh-4.4# iptables -t raw -S
-P PREROUTING ACCEPT
-P OUTPUT ACCEPT
-N OPENSHIFT-NOTRACK
-A PREROUTING -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-NOTRACK
-A OUTPUT -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-NOTRACK
-A OPENSHIFT-NOTRACK -p udp -m udp --dport 4789 -j NOTRACK


% oc version
Client Version: 4.9.0-0.nightly-2021-08-18-144658
Server Version: 4.9.0-0.nightly-2021-08-22-070405
Kubernetes Version: v1.22.0-rc.0+5c2f7cd

Comment 15 Jacob Tanenbaum 2021-09-22 17:33:52 UTC
*** Bug 2005733 has been marked as a duplicate of this bug. ***

Comment 17 errata-xmlrpc 2021-10-18 17:40:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759