Bug 2043116 - [OVN SCALE][ovn-northd] Incrementally build logical datapaths.
Summary: [OVN SCALE][ovn-northd] Incrementally build logical datapaths.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 21.K
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 2043119 2043128
TreeView+ depends on / blocked
 
Reported: 2022-01-20 16:28 UTC by Dumitru Ceara
Modified: 2023-07-17 09:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-17 09:20:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Scale test NB database. (1.74 MB, application/gzip)
2022-01-20 16:28 UTC, Dumitru Ceara
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1727 0 None None None 2022-01-20 16:37:05 UTC

Description Dumitru Ceara 2022-01-20 16:28:20 UTC
Created attachment 1852237 [details]
Scale test NB database.

Description of problem:

In ovn-northd, the build_datapaths() function reconciles the Southbound DB Datapath_Binding records and ensures that there's exactly one record for every Northbound DB Logical_Switch and Logical_Router.

In some deployments (e.g., ovn-kubernetes) the datapaths (logical switches/routers) don't get added/removed often so recomputing the whole set every time something unrelated changes in the NB/SB doesn't make much sense.

Instead, we could extract this into a separate I-P engine node that computes 3 sets of datapaths:
- deleted
- updated
- added

This I-P node would need to have (at least) 3 input nodes:
- NB.Logical_Switch
- NB.Logical_Router
- SB.Datapath_Binding

For example, with the attached NB database (from a scale test run), build_datapaths() takes ~150msec on the test machine, each and every time northd runs to process a change in the NB/SB.

Furthermore, knowing the exact set of datapaths that changed can be the first step in adding incremental processing for other components like load balancers.

Comment 1 Mark Michelson 2022-01-20 18:05:53 UTC
If I understand correctly, the idea here is to have an engine node whose sole job is to output the set of deleted, updated, and added datapaths. It's then outside the scope of the engine node to use those sets of datapaths to determine how to act.

I think this is an ideal use of the I-P engine. It allows for different contexts to use the data produced by the I-P node as they see fit. I also think it's a good idea to limit the scope of this particular issue to reconciliation of the southbound datapaths since it is a more manageable change than trying to incrementally process *everything* involving datapath changes.

Question: Would it make sense to have two I-P engine nodes instead of one?

Node 1 inputs:
- NB.Logical_Switch
- SB.Datapath_Binding

Node 1 outputs the sets of updated, deleted, and added logical switches.

Node 2 inputs:
- NB.Logical_Router
- SB.Datapath Binding

Node 2 outputs the sets of updated, deleted, and added logical routers.

This might make the related load balancer issues (2043119 and 2043128) easier to implement, since each would be able to take distinct inputs of the appropriate logical datapath type.

Comment 2 Dumitru Ceara 2023-07-17 09:20:43 UTC
Closing this as "won't fix".  Recent patches from Han Zhou added support for incrementally processing switch/router port changes without the need of datapaths incremental processing:
https://github.com/ovn-org/ovn/commit/b337750e45be2de370eebd7bc8f1ecd52cb72750
https://github.com/ovn-org/ovn/commit/3b120ccf7f7c4d15e1936ea7a18d1fc047879c99

At the same time Numan is adding I-P for load balancers in a separate patch set:
https://patchwork.ozlabs.org/project/ovn/list/?series=362800&state=*

If we ever decide that we need LR/LS datapath incremental processing we can revisit this issue.


Note You need to log in before you can comment on or make changes to this bug.