The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2069108 - [RFE] ovsdb-server: Prepare snapshot JSON in a separate thread
Summary: [RFE] ovsdb-server: Prepare snapshot JSON in a separate thread
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovsdb2.16
Version: FDP 22.A
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: Ilya Maximets
QA Contact: ovs-qe
URL:
Whiteboard:
Depends On: 2069089
Blocks: 2067342
TreeView+ depends on / blocked
 
Reported: 2022-03-28 10:03 UTC by Ilya Maximets
Modified: 2022-08-09 15:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-07-27 20:38:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-1857 0 None None None 2022-03-28 10:08:56 UTC

Description Ilya Maximets 2022-03-28 10:03:02 UTC
Currently ovsdb-server is mostly single-threaded, only some operations
(like disk sync) are performed asynchronously.  At the same time, the
database compaction (a.k.a. snapshot creation) is performed by the main
thread and it's also fairly heavy.  It's not a very frequent operation
in a real-world setup, but it may cause latency spikes from a few to a
dozen of seconds for control operations on a fairly big OVN cluster.
And that is inconvenient, especially for performance and scale testing
where lots of operations are executed at a very high rate, forcing the
database to compact itself frequently.

In order to avoid such latency, ovsdb-server in clustered mode transfers
the raft leadership before starting compaction, but that only helps for
leader-only connections such as CMS connections to the NbDB or ovn-northd
connections to databases.  ovn-controllers doesn't follow the leader and
will wait for the SbDB to finish the compaction until they can send a
port state update or receive an updated configuration.

The situation can be improved in a few different ways (e.g. by using
leader-only relays as a frontend for SbDB), this BZ is about one very
specific way of moving some parts of the compaction process out of the
main thread, allowing it to continue to serve clients and execute
transactions.

The main idea is to move database-to-json conversion to a separate thread.

The main problem here is the data availability, as ovsdb data structures
are not generally thread-safe, they can not be accessed by different threads,
while the main thread is chanign them by executing transactions.

This can be worked around by cloning the required objects first in the
main thread and handing them over to the worker thread.  The key is that
cloning itself should be fast for this to make any sense.

2 possible implementations:

1. RAFT module has a log of all the database operations in a JSON format,
   as well as the previous database snapshot.  And, conveniently, JSON
   objects support shallow copies.  That means that copy of a raft log
   can be created fairly quickly by the main thread, handed over to the
   worker thread that will create a new snapshot from that data.

   Pros:
   - all prerequisites are already in place.
   Cons:
   - worker will have to parse all JSON objects back to database objects
     and, basically, re-play all the transactions in order to construct
     a new representation of a database that can be converted back to JSON.
     This is a significant amount of extra work, so compaction itself will
     take much longer, consuming more CPU resources and memory, even though
     all that will take place in a separate thread.
   - RAFT-specific, i.e. can not be applied to the standalone database model.

2. If shallow copies can be created from database rows (BZ2069089) directly,
   the main thread could create a shallow copy of the current database state
   and hand it over to the worker thread for JSON conversion.

   Pros:
   - No unnecessary work, i.e. ovsdb-server will take roughly the same
     aggregated CPU time for compaction.
   - Implementation is storage-agnostic, so can, in theory, be applied to
     standalone databases (some changes of the DB log module may be needed).
   Cons:
   - Prerequisite: https://bugzilla.redhat.com/show_bug.cgi?id=2069089

The option 2 seems to make more sense performance-wise, so it is preferable.

The main thread will still need to perform operations related to the file
replacement and the raft log modifications.

Comment 3 Ilya Maximets 2022-07-27 20:38:30 UTC
Patches accepted upstream.  Will be part of OVS 3.0 release.


Note You need to log in before you can comment on or make changes to this bug.