Description of problem: OpenShift SDN uses vxlan for node-to-node traffic encapsulation but it doesn't add NOTRACK iptables rules to the raw table, so conntrack still processes OCP traffic even when not necessary. This can have severe performance implications in a heavily loaded environment, leading to a non-working environment. Version-Release number of selected component (if applicable): Found on 4.6, but should apply to any OCP version including 3.11 and any 4.y as long as openshift-sdn is in use How reproducible: Always Steps to Reproduce: 1. iptables -t raw -S Actual results: -P PREROUTING ACCEPT -P OUTPUT ACCEPT Expected results: -P PREROUTING ACCEPT -P OUTPUT ACCEPT -A PREROUTING -p udp -m udp --dport 4789 -j NOTRACK -A OUTPUT -p udp -m udp --dport 4789 -j NOTRACK Additional info: More context on performance impact in comments. Besides, if this had been fixed, it could also have prevented OpenShift from being affected by any of the multiple conntrack bugs that impacted OCP in the past.
I have opened a pull request with a fix for master branch: https://github.com/openshift/sdn/pull/324 Can you please review? Thanks in advance and regards.
*** Bug 1973864 has been marked as a duplicate of this bug. ***
I deployed a cluster using https://github.com/openshift/sdn/pull/324 from cluster-bot, and checked on nodes, iptables rules are added. sh-4.4# iptables -t raw -S -P PREROUTING ACCEPT -P OUTPUT ACCEPT -N OPENSHIFT-VXLAN-NOTRACK -A PREROUTING -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-VXLAN-NOTRACK -A OUTPUT -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-VXLAN-NOTRACK -A OPENSHIFT-VXLAN-NOTRACK -p udp -m udp --dport 4789 -j NOTRACK lilia@liliadeMacBook-Pro Downloads % oc version Client Version: 4.7.5 Server Version: 4.8.0-0.ci.test-2021-07-29-030151-ci-ln-210rw9t-latest Kubernetes Version: v1.21.1-1394+051ac4f6786868-dirty lilia@liliadeMacBook-Pro Downloads %
@trozet: Checked in OVN-K. We are already disabling conntrack for geneve. All good (https://github.com/openshift/cluster-network-operator/blob/f202ceea725fc3cf315b3883206462a2b4defadd/bindata/network/ovn-kubernetes/ovnkube-node.yaml#L215). 2021-06-04T19:21:11.970092353Z + echo 'I0604 19:21:11.969676557 - disable conntrack on geneve port' 2021-06-04T19:21:11.970104775Z I0604 19:21:11.969676557 - disable conntrack on geneve port 2021-06-04T19:21:11.970127952Z + iptables -t raw -A PREROUTING -p udp --dport 6081 -j NOTRACK 2021-06-04T19:21:11.983552565Z + iptables -t raw -A OUTPUT -p udp --dport 6081 -j NOTRACK
Checked on version 4.9.0-0.nightly-2021-08-22-070405 with SDN network, and get iptables on nodes as below. sh-4.4# iptables -t raw -S -P PREROUTING ACCEPT -P OUTPUT ACCEPT -N OPENSHIFT-NOTRACK -A PREROUTING -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-NOTRACK -A OUTPUT -m comment --comment "disable conntrack for vxlan" -j OPENSHIFT-NOTRACK -A OPENSHIFT-NOTRACK -p udp -m udp --dport 4789 -j NOTRACK % oc version Client Version: 4.9.0-0.nightly-2021-08-18-144658 Server Version: 4.9.0-0.nightly-2021-08-22-070405 Kubernetes Version: v1.22.0-rc.0+5c2f7cd
*** Bug 2005733 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759