Assuming we have a row with a column that contains a set of UUIDs. And these UUIDs are strong or weak references to rows in some other database table. If client will request addition of one new UUID to the set, currently, ovsdb-server will re-count references for all UUIDs in that set. If the referenced row will happen to have references to some other rows, they will be re-counted too. This is done to keep referential integrity of the database and for garbage collection, i.e. to delete all non-root rows that no longer has any references. Because of this behavior time to execute transaction linearly grows with increase of the number of elements in the set. However, in reality, it only need to re-count references for UUIDs that was added/removed to/from a set, and not for others. OVSDB knows the new state of a row and the old one, so the difference could be calculated. Even more, currently we're storing column diffs in a database storage, so the 'new' row is already calculated from the 'old' and 'diff'. We can just try to keep the diff after reading from the storage and use it during transaction commit to avoid unnecessary work. This should make transaction commit less dependent on the number of elements in the set by re-calculating only what really changed by the transaction. Related functions: - ovsdb_txn_precommit() - update_ref_counts() - assess_weak_refs()
Patch for strong references sent for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20210916201522.3693567-1-i.maximets@ovn.org/
Restricting the scope of this BZ to strong references only (update_ref_count()). Separate BZ 2005958 created for weak references, as they are a bit harder to optimize and mostly independent from the strong references.
Patch accepted in upstream: https://github.com/openvswitch/ovs/commit/b2712d026eae2d9a5150c2805310eaf506e1f162
* Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-13 - ovsdb-data: Optimize subtraction of sets. [RH git: 5bace82405] (#2005483) commit bb12b63176389e516ddfefce20dfa165f24430fb Author: Ilya Maximets <i.maximets> Date: Thu Sep 23 01:47:23 2021 +0200 Current algorithm for ovsdb_datum_subtract looks like this: for-each atom in a: if atom in b: swap(atom, <last atom in 'a'>) destroy(atom) quicksort(a) Complexity: Na * log2(Nb) + (Na - Nb) * log2(Na - Nb) Search Comparisons for quicksort It's not optimal, especially because Nb << Na in a vast majority of cases. Reversing the search phase to look up atoms from 'b' in 'a', and closing gaps from deleted elements in 'a' by plain memory copy to avoid quicksort. Resulted complexity: Nb * log2(Na) + (Na - Nb) Search Memory copies Subtraction is heavily used while executing database transactions. For example, to remove one port from a logical switch in OVN. Complexity of such operation if original logical switch had 100 ports goes down from 100 * log2(1) = 100 comparisons for search and 99 * log2(99) = 656 comparisons for quicksort ------------------------------ 756 comparisons in total to only 1 * log2(100) = 7 comparisons for search + memory copy of 99 * sizeof (union ovsdb_atom) bytes. We could use memmove to close the gaps after removing atoms, but it will lead to 2 memory copies inside the call, while we can perform only one to the temporary 'result' and swap pointers. Performance in cases, where sizes of 'a' and 'b' are comparable, should not change. Cases with Nb >> Na should not happen in practice. All in all, this change allows ovsdb-server to perform several times more transactions, that removes elements from sets, per second. Signed-off-by: Ilya Maximets <i.maximets> Acked-by: Han Zhou <hzhou> Acked-by: Mark D. Gray <mark.d.gray> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483 Signed-off-by: Dumitru Ceara <dceara> * Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-12 - ovsdb-data: Optimize union of sets. [RH git: e2a4c7d794] (#2005483) commit 51946d22274cd591dc061358fb507056fbd91420 Author: Ilya Maximets <i.maximets> Date: Thu Sep 23 01:47:22 2021 +0200 Current algorithm of ovsdb_datum_union looks like this: for-each atom in b: if not bin_search(a, atom): push(a, clone(atom)) quicksort(a) So, the complexity looks like this: Nb * log2(Na) + Nb + (Na + Nb) * log2(Na + Nb) Comparisons clones Comparisons for quicksort for search ovsdb_datum_union() is heavily used in database transactions while new element is added to a set. For example, if new logical switch port is added to a logical switch in OVN. This is a very common use case where CMS adds one new port to an existing switch that already has, let's say, 100 ports. For this case ovsdb-server will have to perform: 1 * log2(100) + 1 clone + 101 * log2(101) Comparisons Comparisons for for search quicksort. ~7 1 ~707 Roughly 714 comparisons of atoms and 1 clone. Since binary search can give us position, where new atom should go (it's the 'low' index after the search completion) for free, the logic can be re-worked like this: copied = 0 for-each atom in b: desired_position = bin_search(a, atom) push(result, a[ copied : desired_position - 1 ]) copied = desired_position push(result, clone(atom)) push(result, a[ copied : Na ]) swap(a, result) Complexity of this schema: Nb * log2(Na) + Nb + Na Comparisons clones memory copy on push for search 'swap' is just a swap of a few pointers. 'push' is not a 'clone', but a simple memory copy of 'union ovsdb_atom'. In general, this schema substitutes complexity of a quicksort with complexity of a memory copy of Na atom structures, where we're not even copying strings that these atoms are pointing to. Complexity in the example above goes down from 714 comparisons to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes. General complexity of a memory copy should always be lower than complexity of a quicksort, especially because these copies usually performed in bulk, so this new schema should work faster for any input. All in all, this change allows to execute several times more transactions per second for transactions that adds new entries to sets. Alternatively, union can be implemented as a linear merge of two sorted arrays, but this will result in O(Na) comparisons, which is more than Nb * log2(Na) in common case, since Na is usually far bigger than Nb. Linear merge will also mean per-atom memory copies instead of copying in bulk. 'replace' functionality of ovsdb_datum_union() had no users, so it just removed. But it can easily be added back if needed in the future. Signed-off-by: Ilya Maximets <i.maximets> Acked-by: Han Zhou <hzhou> Acked-by: Mark D. Gray <mark.d.gray> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483 Signed-off-by: Dumitru Ceara <dceara> * Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-11 - ovsdb: transaction: Use diffs for strong reference counting. [RH git: 85da133eaa] (#2003203) commit b2712d026eae2d9a5150c2805310eaf506e1f162 Author: Ilya Maximets <i.maximets> Date: Tue Sep 14 00:19:57 2021 +0200 Currently, even if one reference added to the set of strong references or removed from it, ovsdb-server will walk through the whole set and re-count references to other rows. These referenced rows will also be added to the transaction in order to re-count their references. For example, every time Logical Switch Port added to a Logical Switch, OVN Northbound database server will walk through all ports of this Logical Switch, clone their rows, and re-count references. This is not very efficient. Instead, it can only increase reference counters for added references and reduce for removed ones. In many cases this will be only one row affected in the Logical_Switch_Port table. Introducing new function that generates a diff of two datum objects, but stores added and removed atoms separately, so they can be used to increase or decrease row reference counters accordingly. This change allows to perform several times more transactions that adds or removes strong references to/from sets per second, because ovsdb-server no longer clones and re-counts rows that are irrelevant to current transaction. Acked-by: Dumitru Ceara <dceara> Signed-off-by: Ilya Maximets <i.maximets> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2003203 Signed-off-by: Dumitru Ceara <dceara>
Per email with Dev (i.maximets), no specific test case is feasible for this change; general testing performed during the release is sufficient for regression testing. A full test suite was run for FDP 21.J using openvswitch2.16-2.16.0-32.el8fdp. Marking as Verified.
SanityOnly info: [ralongi@ralongi openvswitch2.16]$ git log --oneline --grep=2003203 85da133eaa ovsdb: transaction: Use diffs for strong reference counting. [ralongi@ralongi openvswitch2.16]$ git show 85da133eaa commit 85da133eaa0f7d3dfda836055bf6aaa7d180e7e8 Author: Ilya Maximets <i.maximets> Date: Tue Sep 14 00:19:57 2021 +0200 ovsdb: transaction: Use diffs for strong reference counting. commit b2712d026eae2d9a5150c2805310eaf506e1f162 Author: Ilya Maximets <i.maximets> Date: Tue Sep 14 00:19:57 2021 +0200 Currently, even if one reference added to the set of strong references or removed from it, ovsdb-server will walk through the whole set and re-count references to other rows. These referenced rows will also be added to the transaction in order to re-count their references. For example, every time Logical Switch Port added to a Logical Switch, OVN Northbound database server will walk through all ports of this Logical Switch, clone their rows, and re-count references. This is not very efficient. Instead, it can only increase reference counters for added references and reduce for removed ones. In many cases this will be only one row affected in the Logical_Switch_Port table. Introducing new function that generates a diff of two datum objects,
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.16 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0053