Bug 2078986 - [OVN SCALE] Scalability issues due to arp responder logical flows
Summary: [OVN SCALE] Scalability issues due to arp responder logical flows
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 22.C
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On: 2084668
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-26 16:09 UTC by Dumitru Ceara
Modified: 2023-08-04 14:14 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-04 14:14:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
density-light-120node NB DB (10.42 MB, text/plain)
2022-04-26 16:09 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1915 0 None None None 2022-04-26 16:15:50 UTC

Description Dumitru Ceara 2022-04-26 16:09:41 UTC
Created attachment 1875125 [details]
density-light-120node NB DB

Description of problem:

In a large scale deployment, e.g., during a density-light OpenShift
scale test running a cluster of 120 nodes and 13K pods, northd spends
a large amount of time processing and generating logical flows that
are used to reply to ARP requests.

With the attached database, focusing on a single logical port that
corresponds to an OCP POD (13b39b78-node-density-20220329_node-density-8311):

    port 13b39b78-node-density-20220329_node-density-8311
        addresses: ["0a:58:0a:a8:00:4f 10.168.0.79"]

There are two types of ARP responder flows:

1. In the logical switch pipeline:

  table=18(ls_in_arp_rsp      ), priority=100  , match=(arp.tpa == 10.168.0.79 && arp.op == 1 && inport == "13b39b78-node-density-20220329_node-density-8311"), action=(next;)
  table=18(ls_in_arp_rsp      ), priority=50   , match=(arp.tpa == 10.168.0.79 && arp.op == 1), action=(eth.dst = eth.src; eth.src = 0a:58:0a:a8:00:4f; arp.op = 2; /* ARP reply */ arp.tha = arp.sha; arp.sha = 0a:58:0a:a8:00:4f; arp.tpa = arp.spa; arp.spa = 10.168.0.79; outport = inport; flags.loopback = 1; output;)

These flows above can probably be skipped if all the VIF logical ports
that are part of that logical switch are claimed by the same chassis.
In such cases ARP requests will never leave br-int and there's no point
to try to optimize packet flow with an explicit ARP responder flow.  We
can just as easily let the VIF that owns the IP reply to the ARP itself.

2. In the logical router pipeline:

  table=15(lr_in_arp_resolve  ), priority=100  , match=(outport == "rtos-ip-10-0-177-133.us-west-2.compute.internal" && reg0 == 10.168.0.79), action=(eth.dst = 0a:58:0a:a8:00:4f; next;)

These flows can probably be skipped if the logical router is configured
to dynamically resolve unknown next-hops, i.e., if the logical router
is configured with NB.Logical_Router.options:dynamic_neigh_routers=true.

In ovn-kubernetes the ovn_cluster_router does *not* have
dynamic_neigh_routers=true but there should be no reason to not enable
it.

All in all, measuring the impact of avoiding generating these two types
of logical flows in ovn-northd when running with the attached database,
we see that one ovn-northd event processing loop iteration is reduced by
~300ms (from ~1500ms to ~1200ms).

Comment 3 Dan Williams 2023-08-04 13:50:20 UTC
Upstream patchset for MAC binding aging: http://patchwork.ozlabs.org/project/ovn/list/?series=366554&state=%2A&archive=both


Note You need to log in before you can comment on or make changes to this bug.