The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2139194 - [OVN SCALE] ovn-controller: Inefficient condition updates for Port_Binding table
Summary: [OVN SCALE] ovn-controller: Inefficient condition updates for Port_Binding table
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 22.L
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Dumitru Ceara
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-01 19:05 UTC by Ilya Maximets
Modified: 2023-07-31 09:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-31 09:35:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2420 0 None None None 2022-11-01 19:14:00 UTC

Description Ilya Maximets 2022-11-01 19:05:10 UTC
If conditional monitoring is in use (ovn-monitor-all = false), every time
ovn-controller is discovering a new local port, it adds several new
clauses to the condition for the Port_Binding table.  For example, when
port 'lp-0-9' is added to OVS, following two clauses will be added to
the condition:

  ["parent_port","==","lp-0-9"]
  ["logical_port","==","lp-0-9"]

ovn-controller sends 'monitor_cond_change' request to the SbDB.  SbDB
has to re-check all the Port_Binding rows to prepare an update for the
client.  As the density increases and the total number of ports in the
cluster grows, these condition change requests start taking significant
amount of time on the database side.
For example, with 250 ports per node, condition will have more than 500
clauses.  With a 120-node cluster at that density each request will
require checking 120 * 250 = 30000 rows against more than 500 clauses.

This doesn't scale well.

Ideally, condition clauses should not depend on port names.  Datapaths
or chassis names might be a better option to explore as they are
more or less limited by the number of physical nodes, which is a much
smaller number and it doesn't really change too often.

Comment 1 Dumitru Ceara 2022-11-02 15:52:08 UTC
A possibility that came to mind was to add a "tag", for example,
SB.Port_Binding.options:monitor-tag.

We could add a new mode to ovn-controller to only monitor port bindings
that have a given (set of) tag(s).  This could be specified as an
external_id in the local OVS DB Open_vSwitch table.  Like this the
monitor condition is constant, regardless of how many OVS ports there
are.

The downside is that the CMS needs to explicitly request this mode to
be enabled on every chassis.  We need some "synchronization" and use
this mode in ovn-controller if and only if ovn-northd has been upgraded
to set this port binding option.

Comment 2 Dan Williams 2022-11-02 17:50:55 UTC
(In reply to Dumitru Ceara from comment #1)
> A possibility that came to mind was to add a "tag", for example,
> SB.Port_Binding.options:monitor-tag.
> 
> We could add a new mode to ovn-controller to only monitor port bindings
> that have a given (set of) tag(s).  This could be specified as an
> external_id in the local OVS DB Open_vSwitch table.  Like this the
> monitor condition is constant, regardless of how many OVS ports there
> are.
> 
> The downside is that the CMS needs to explicitly request this mode to
> be enabled on every chassis.  We need some "synchronization" and use
> this mode in ovn-controller if and only if ovn-northd has been upgraded
> to set this port binding option.

Wasn't one thought that we could use requested-chassis as that tag? eg, we could have a generic tag, but we could also have "field=value" where field=requested-chassis and value=chassis-name that the CMS would already be setting, and northd would already be propagating through.

Comment 3 Dan Williams 2022-11-02 17:52:58 UTC
(In reply to Dan Williams from comment #2)
> (In reply to Dumitru Ceara from comment #1)
> > A possibility that came to mind was to add a "tag", for example,
> > SB.Port_Binding.options:monitor-tag.
> > 
> > We could add a new mode to ovn-controller to only monitor port bindings
> > that have a given (set of) tag(s).  This could be specified as an
> > external_id in the local OVS DB Open_vSwitch table.  Like this the
> > monitor condition is constant, regardless of how many OVS ports there
> > are.
> > 
> > The downside is that the CMS needs to explicitly request this mode to
> > be enabled on every chassis.  We need some "synchronization" and use
> > this mode in ovn-controller if and only if ovn-northd has been upgraded
> > to set this port binding option.
> 
> Wasn't one thought that we could use requested-chassis as that tag? eg, we
> could have a generic tag, but we could also have "field=value" where
> field=requested-chassis and value=chassis-name that the CMS would already be
> setting, and northd would already be propagating through.

I actually mean: field=chassis (not requested-chassis)

This wouldn't require any additional northd or DB schema support, but isn't as flexible as an arbitrary "monitor-tag" of course.

Comment 4 Dumitru Ceara 2022-11-02 20:47:19 UTC
(In reply to Dan Williams from comment #3)
> (In reply to Dan Williams from comment #2)
> > (In reply to Dumitru Ceara from comment #1)
> > > A possibility that came to mind was to add a "tag", for example,
> > > SB.Port_Binding.options:monitor-tag.
> > > 
> > > We could add a new mode to ovn-controller to only monitor port bindings
> > > that have a given (set of) tag(s).  This could be specified as an
> > > external_id in the local OVS DB Open_vSwitch table.  Like this the
> > > monitor condition is constant, regardless of how many OVS ports there
> > > are.
> > > 
> > > The downside is that the CMS needs to explicitly request this mode to
> > > be enabled on every chassis.  We need some "synchronization" and use
> > > this mode in ovn-controller if and only if ovn-northd has been upgraded
> > > to set this port binding option.
> > 
> > Wasn't one thought that we could use requested-chassis as that tag? eg, we
> > could have a generic tag, but we could also have "field=value" where
> > field=requested-chassis and value=chassis-name that the CMS would already be
> > setting, and northd would already be propagating through.
> 
> I actually mean: field=chassis (not requested-chassis)
> 

I'm not sure that would work.  ovn-controller sets the 'chassis' field
after it claims the logical port.  What should work is to change the
monitor condition to only request port bindings that have
Port_Binding.options.requested-chassis=<local-chassis>.

This still has the downside that we need to enable this mode in
ovn-controller through config.  That's because a CMS is not obliged to
set requested-chassis and ovn-controller should still claim ports that
don't have it set.

> This wouldn't require any additional northd or DB schema support, but isn't
> as flexible as an arbitrary "monitor-tag" of course.

After some offline brainstorming with Ilya it seems that there might be
a way to implement this without the need for config knobs or changes to
northd/schema:

The reason why we filter for port bindings whose name matches local OVS
iface-ids is because (in the general case) we need to first bind a
logical port locally before declaring the logical datapath it is
assigned to as "local".

However, once we know that a datapath is local, we should:
a. update the monitor condition to include all port bindings that point
   to a local datapath.
b. remove the iface-ids that correspond to ports bound to local
   datapaths from the monitor condition because "a" already covers these.

"a" is already happening today:
https://github.com/ovn-org/ovn/blob/40df933ce49946c69099a902b154d7bfee4358a3/controller/ovn-controller.c#L260

"b" should be straightforward to implement.

This means that it's possible that new OVS iface-ids end up being part
of the monitor condition briefly but likely we won't have many of these
at once.  Also, as and as soon as a port binding is created for them
they'll be removed from the condition.

Comment 6 Dumitru Ceara 2023-06-21 09:50:08 UTC
V2 posted: https://patchwork.ozlabs.org/project/ovn/list/?series=360676&state=*

Comment 7 OVN Bot 2023-07-20 17:02:50 UTC
ovn23.09 fast-datapath-rhel-9 clone created at https://bugzilla.redhat.com/show_bug.cgi?id=2224400

Comment 8 Dumitru Ceara 2023-07-31 09:35:18 UTC
Will be picked up downstream in ovn23.09 via bug 2224400.  Closing this instance as fixed in next release.


Note You need to log in before you can comment on or make changes to this bug.