The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2078927 - [OVN SCALE] Scalability issues due to port security logical flows
Summary: [OVN SCALE] Scalability issues due to port security logical flows
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 22.C
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Numan Siddique
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-26 13:56 UTC by Dumitru Ceara
Modified: 2023-03-13 07:15 UTC (History)
5 users (show)

Fixed In Version: ovn22.06-22.06.0-34
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-13 07:15:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
density-light-120node NB DB (10.42 MB, text/plain)
2022-04-26 13:56 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1914 0 None None None 2022-04-26 14:18:16 UTC

Description Dumitru Ceara 2022-04-26 13:56:17 UTC
Created attachment 1875088 [details]
density-light-120node NB DB

Created attachment 1875088 [details]
density-light-120node NB DB

Created attachment 1875088 [details]
density-light-120node NB DB

Description of problem:

In a large scale deployment, e.g., during a density-light OpenShift
scale test running a cluster of 120 nodes and 13K pods, northd spends
a large amount of time processing and generating logical flows that
implement port security.

With the attached database, focusing on a single logical port that
corresponds to an OCP POD (13b39b78-node-density-20220329_node-density-8311),
8 different logical flows are generated in the port security stages:

  table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "13b39b78-node-density-20220329_node-density-8311" && eth.src == {0a:58:0a:a8:00:4f}), action=(next;)
  table=1 (ls_in_port_sec_ip  ), priority=90   , match=(inport == "13b39b78-node-density-20220329_node-density-8311" && eth.src == 0a:58:0a:a8:00:4f && ((ip4.src == 0.0.0.0 && ip4.dst == 255.255.255.255 && udp.src == 68 && udp.dst == 67) || ip4.src == {10.168.0.79})), action=(next;)
  table=1 (ls_in_port_sec_ip  ), priority=80   , match=(inport == "13b39b78-node-density-20220329_node-density-8311" && eth.src == 0a:58:0a:a8:00:4f && ip), action=(drop;)
  table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "13b39b78-node-density-20220329_node-density-8311" && eth.src == 0a:58:0a:a8:00:4f && arp.sha == 0a:58:0a:a8:00:4f && arp.spa == {10.168.0.79}), action=(next;)
  table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "13b39b78-node-density-20220329_node-density-8311" && (arp || nd)), action=(drop;)
  table=8 (ls_out_port_sec_ip ), priority=90   , match=(outport == "13b39b78-node-density-20220329_node-density-8311" && eth.dst == 0a:58:0a:a8:00:4f && ip4.dst == {255.255.255.255, 224.0.0.0/4, 10.168.0.79}), action=(next;)
  table=8 (ls_out_port_sec_ip ), priority=80   , match=(outport == "13b39b78-node-density-20220329_node-density-8311" && eth.dst == 0a:58:0a:a8:00:4f && ip), action=(drop;)
  table=9 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "13b39b78-node-density-20220329_node-density-8311" && eth.dst == {0a:58:0a:a8:00:4f}), action=(output;)

The NB logical switch port configuration is:
    port 13b39b78-node-density-20220329_node-density-8311
        addresses: ["0a:58:0a:a8:00:4f 10.168.0.79"]

These flows can always translate to openflows on maximum one chassis,
the chassis where the logical port is bound.

We measured the time it takes to build these flows in ovn-northd.
We see that on a test machine, out of a total of ~1422ms spent during
one processing loop iteration, ~560ms are spent on port security flows.

Given the locality of these flows, there's no real point to generate
them within ovn-northd.  Instead this could easily be "offloaded" to
the chassis where the ports are bound.

We already do something similar for generating load balancer hairpin
flows.

A potential solution would be to store in the SB.Port_Binding, e.g., in
the "options" column, the MAC and IP addresses for which port security
should be enforced.

The logical flows for the port security tables will then become the
generic:

table=0 (ls_in_port_sec  ), priority=1   , match="1", action=(reg0[14]=enforce_port_security(); next;)
table=1 (ls_in_port_sec_check  ), prioryty=2   , match="reg0[14] == 1", action=(next;)
table=1 (ls_in_port_sec_check  ), prioryty=1   , match="1", action=(drop;)

The enforce_port_security() action would be decoded by ovn-controller
and would translate in port-security related flows for all local port
bindings.  These flows could be added in dedicated openflow tables (e.g.
73, 74) in a similar fashion as done for other features like MAC
bindings or FDB lookup.

Some points that need additional care:
- upgrades should be accounted for
- it might not be possible to just change the logical pipeline table
  names, so we might have to use ls_in_port_sec_l2 for calling the
  enforce_port_security() action and ls_in_port_sec_ip for verifying
  its result


Note You need to log in before you can comment on or make changes to this bug.