1990058 – [RFE] raft: Reduce memory consumption by storing snapshot as a string instead of json object

The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 1990058 - [RFE] raft: Reduce memory consumption by storing snapshot as a string instead of json object

Summary: [RFE] raft: Reduce memory consumption by storing snapshot as a string instead...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Fast Datapath
Classification:	Red Hat
Component:	ovsdb2.16
Sub Component:
Version:	RHEL 8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	FDP 21.I
Assignee:	Ilya Maximets
QA Contact:	Rick Alongi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-04 16:24 UTC by Ilya Maximets
Modified:	2022-01-10 16:51 UTC (History)
CC List:	4 users (show)
Fixed In Version:	openvswitch2.16-2.16.0-6.el8fdp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-10 16:50:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	FD-1468	0	None	None	None	2021-08-22 04:31:17 UTC
Red Hat Product Errata	RHBA-2022:0053	0	None	None	None	2022-01-10 16:51:09 UTC

Description Ilya Maximets 2021-08-04 16:24:11 UTC

RAFT module inside the ovsdb-server holds a static json object
with a database snapshot and it takes a lot of RAM.
For the 125MB database from the ovn-k8s cluster-density test,
this json object takes ~612MB RAM out of 1.3 GB of the total
memory consumption of a process.  For the 270MB database from
the ovn-heater's node-density-heavy 120node test this json
object takes ~1.55GB from the total 3.8GB for a process.

In most cases this object is used only to serialize it to string
and store on disk or send over the network.  For a short time
it's needed to re-apply changes after compaction.
So, it should be possible to serialize this object once and store
the string instead and not keep this huge json forever in memory.
Since the size of the serialized string should be same as the
size of the on-disk database after compaction, this change should
save significant amount of RAM.

Side quest:  figure out if we can do the same for all the raft
log entries to same more memory.  This might be needed anyway
for the implementation clarity as the snapshot is just another
raft entry.

Comment 1 Ilya Maximets 2021-08-20 16:55:03 UTC

Patches sent for review:
  https://patchwork.ozlabs.org/project/openvswitch/list/?series=259000&state=*

Comment 2 Ilya Maximets 2021-08-24 19:16:18 UTC

v2:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=259477&state=*

Comment 3 OvS team 2021-09-01 22:02:34 UTC

* Tue Aug 31 2021 Ilya Maximets <i.maximets> - 2.16.0-6
- ovsdb: monitor: Store serialized json in a json cache. [RH git: bc20330c85] (#1996152)
commit 43e66fc27659af2a5c976bdd27fe747b442b5554
Author: Ilya Maximets <i.maximets>
Date: Tue Aug 24 21:00:39 2021 +0200

Same json from a json cache is typically sent to all the clients,
e.g., in case of OVN deployment with ovn-monitor-all=true.

There could be hundreds or thousands connected clients and ovsdb
will serialize the same json object for each of them before sending.

Serializing it once before storing into json cache to speed up
processing.

This change allows to save a lot of CPU cycles and a bit of memory
since we need to store in memory only a string and not the full json
object.

Testing with ovn-heater on 120 nodes using density-heavy scenario
shows reduction of the total CPU time used by Southbound DB processes
from 256 minutes to 147. Duration of unreasonably long poll intervals
also reduced dramatically from 7 to 2 seconds:

Count Min Max Median Mean 95 percentile
-------------------------------------------------------------
Before 1934 1012 7480 4302.5 4875.3 7034.3
After 1909 1004 2730 1453.0 1532.5 2053.6

Acked-by: Dumitru Ceara <dceara>
Acked-by: Han Zhou <hzhou>
Signed-off-by: Ilya Maximets <i.maximets>

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1996152
Signed-off-by: Ilya Maximets <i.maximets>

* Tue Aug 31 2021 Ilya Maximets <i.maximets> - 2.16.0-5
- raft: Don't keep full json objects in memory if no longer needed. [RH git: 4606423e8b] (#1990058)
commit 0de882954032aa37dc943bafd72c33324aa0c95a
Author: Ilya Maximets <i.maximets>
Date: Tue Aug 24 21:00:38 2021 +0200

raft: Don't keep full json objects in memory if no longer needed.

Raft log entries (and raft database snapshot) contains json objects
of the data. Follower receives append requests with data that gets
parsed and added to the raft log. Leader receives execution requests,
parses data out of them and adds to the log. In both cases, later
ovsdb-server reads the log with ovsdb_storage_read(), constructs
transaction and updates the database. On followers these json objects
in common case are never used again. Leader may use them to send
append requests or snapshot installation requests to followers.
However, all these operations (except for ovsdb_storage_read()) are
just serializing the json in order to send it over the network.

Json objects are significantly larger than their serialized string
representation. For example, the snapshot of the database from one of
the ovn-heater scale tests takes 270 MB as a string, but 1.6 GB as
a json object from the total 3.8 GB consumed by ovsdb-server process.

ovsdb_storage_read() for a given raft entry happens only once in a
lifetime, so after this call, we can serialize the json object, store
the string representation and free the actual json object that ovsdb
will never need again. This can save a lot of memory and can also
save serialization time, because each raft entry for append requests
and snapshot installation requests serialized only once instead of
doing that every time such request needs to be sent.

JSON_SERIALIZED_OBJECT can be used in order to seamlessly integrate
pre-serialized data into raft_header and similar json objects.

One major special case is creation of a database snapshot.
Snapshot installation request received over the network will be parsed
and read by ovsdb-server just like any other raft log entry. However,
snapshots created locally with raft_store_snapshot() will never be
read back, because they reflect the current state of the database,
hence already applied. For this case we can free the json object
right after writing snapshot on disk.

Tests performed with ovn-heater on 60 node density-light scenario,
where on-disk database goes up to 97 MB, shows average memory
consumption of ovsdb-server Southbound DB processes decreased by 58%
(from 602 MB to 256 MB per process) and peak memory consumption
decreased by 40% (from 1288 MB to 771 MB).

Test with 120 nodes on density-heavy scenario with 270 MB on-disk
database shows 1.5 GB memory consumption decrease as expected.
Also, total CPU time consumed by the Southbound DB process reduced
from 296 to 256 minutes. Number of unreasonably long poll intervals
reduced from 2896 down to 1934.

Deserialization is also implemented just in case. I didn't see this
function being invoked in practice.

Acked-by: Dumitru Ceara <dceara>
Acked-by: Han Zhou <hzhou>
Signed-off-by: Ilya Maximets <i.maximets>

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets>

* Tue Aug 31 2021 Ilya Maximets <i.maximets> - 2.16.0-4
- json: Add support for partially serialized json objects. [RH git: 885e5ce1b5] (#1990058)
commit b0bca6f27aae845c3ca8b48d66a7dbd3d978162a
Author: Ilya Maximets <i.maximets>
Date: Tue Aug 24 21:00:37 2021 +0200

json: Add support for partially serialized json objects.

Introducing a new json type JSON_SERIALIZED_OBJECT. It's not an
actual type that can be seen in a json message on a wire, but
internal type that is intended to hold a serialized version of
some other json object. For this reason it's defined after the
JSON_N_TYPES to not confuse parsers and other parts of the code
that relies on compliance with RFC 4627.

With this JSON type internal users may construct large JSON objects,
parts of which are already serialized. This way, while serializing
the larger object, data from JSON_SERIALIZED_OBJECT can be added
directly to the result, without additional processing.

This will be used by next commits to add pre-serialized JSON data
to the raft_header structure, that can be converted to a JSON
before writing the file transaction on disk or sending to other
servers. Same technique can also be used to pre-serialize json_cache
for ovsdb monitors, this should allow to not perform serialization
for every client and will save some more memory.

Since serialized JSON is just a string, reusing the 'json->string'
pointer for it.

Acked-by: Dumitru Ceara <dceara>
Acked-by: Han Zhou <hzhou>
Signed-off-by: Ilya Maximets <i.maximets>

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990058
Signed-off-by: Ilya Maximets <i.maximets>

* Tue Aug 31 2021 Ilya Maximets <i.maximets> - 2.16.0-3
- json: Optimize string serialization. [RH git: bb1654da63] (#1990069)
commit 748010ff304b7cd2c43f4eb98a554433f0df07f9
Author: Ilya Maximets <i.maximets>
Date: Tue Aug 24 23:07:22 2021 +0200

json: Optimize string serialization.

Current string serialization code puts all characters one by one.
This is slow because dynamic string needs to perform length checks
on every ds_put_char() and it's also doesn't allow compiler to use
better memory copy operations, i.e. doesn't allow copying few bytes
at once.

Special symbols are rare in a typical database. Quotes are frequent,
but not too frequent. In databases created by ovn-kubernetes, for
example, usually there are at least 10 to 50 chars between quotes.
So, it's better to count characters that doesn't require escaping
and use fast data copy for the whole sequential block.

Testing with a synthetic benchmark (included) on my laptop shows
following performance improvement:

Size Q S Before After Diff
-----------------------------------------------------
100000 0 0 : 0.227 ms 0.142 ms -37.4 %
100000 2 1 : 0.277 ms 0.186 ms -32.8 %
100000 10 1 : 0.361 ms 0.309 ms -14.4 %
10000000 0 0 : 22.720 ms 12.160 ms -46.4 %
10000000 2 1 : 27.470 ms 19.300 ms -29.7 %
10000000 10 1 : 37.950 ms 31.250 ms -17.6 %
100000000 0 0 : 239.600 ms 126.700 ms -47.1 %
100000000 2 1 : 292.400 ms 188.600 ms -35.4 %
100000000 10 1 : 387.700 ms 321.200 ms -17.1 %

Here Q - probability (%) for a character to be a '\"' and
S - probability (%) to be a special character ( < 32).

Testing with a closer to real world scenario shows overall decrease
of the time needed for database compaction by ~5-10 %. And this
change also decreases CPU consumption in general, because string
serialization is used in many different places including ovsdb
monitors and raft.

Signed-off-by: Ilya Maximets <i.maximets>
Acked-by: Numan Siddique <numans>
Acked-by: Dumitru Ceara <dceara>

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1990069
Signed-off-by: Ilya Maximets <i.maximets>

Comment 6 Rick Alongi 2022-01-05 18:31:11 UTC

Per email with Dev (i.maximets), no specific test case is feasible for this change; general testing performed during the release is sufficient for regression testing.  A full test suite was run for FDP 21.J using openvswitch2.16-2.16.0-32.el8fdp.

Marking as Verified.

Comment 7 Rick Alongi 2022-01-06 12:55:48 UTC

SanityOnly info:

[ralongi@ralongi openvswitch2.16]$ git log --oneline --grep=1990058
4606423e8b raft: Don't keep full json objects in memory if no longer needed.
885e5ce1b5 json: Add support for partially serialized json objects.
[ralongi@ralongi openvswitch2.16]$ git show 4606423e8b
commit 4606423e8b9bd399c1639fb9b00e13d3870adce9
Author: Ilya Maximets <i.maximets>
Date:   Tue Aug 24 21:00:38 2021 +0200

    raft: Don't keep full json objects in memory if no longer needed.
    
    commit 0de882954032aa37dc943bafd72c33324aa0c95a
    Author: Ilya Maximets <i.maximets>
    Date:   Tue Aug 24 21:00:38 2021 +0200
    
        raft: Don't keep full json objects in memory if no longer needed.
    
        Raft log entries (and raft database snapshot) contains json objects
        of the data.  Follower receives append requests with data that gets
        parsed and added to the raft log.  Leader receives execution requests,
        parses data out of them and adds to the log.  In both cases, later
        ovsdb-server reads the log with ovsdb_storage_read(), constructs
        transaction and updates the database.  On followers these json objects
        in common case are never used again.  Leader may use them to send
        append requests or snapshot installation requests to followers.
        However, all these operations (except for ovsdb_storage_read()) are
        just serializing the json in order to send it over the network.
    
[ralongi@ralongi openvswitch2.16]$ git show 885e5ce1b5
commit 885e5ce1b5a646185dcb653dd7d608a84ab43f53
Author: Ilya Maximets <i.maximets>
Date:   Tue Aug 24 21:00:37 2021 +0200

    json: Add support for partially serialized json objects.
    
    commit b0bca6f27aae845c3ca8b48d66a7dbd3d978162a
    Author: Ilya Maximets <i.maximets>
    Date:   Tue Aug 24 21:00:37 2021 +0200
    
        json: Add support for partially serialized json objects.
    
        Introducing a new json type JSON_SERIALIZED_OBJECT.  It's not an
        actual type that can be seen in a json message on a wire, but
        internal type that is intended to hold a serialized version of
        some other json object.  For this reason it's defined after the
        JSON_N_TYPES to not confuse parsers and other parts of the code
        that relies on compliance with RFC 4627.
    
        With this JSON type internal users may construct large JSON objects,
        parts of which are already serialized.  This way, while serializing
        the larger object, data from JSON_SERIALIZED_OBJECT can be added
        directly to the result, without additional processing.

Comment 9 errata-xmlrpc 2022-01-10 16:50:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.16 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0053

Note You need to log in before you can comment on or make changes to this bug.