Bug 2003203

Summary:	[RFE] ovsdb-server: Use column diffs for strong reference counting during transaction precommit
Product:	Red Hat Enterprise Linux Fast Datapath	Reporter:	Ilya Maximets <i.maximets>
Component:	ovsdb2.16	Assignee:	Ilya Maximets <i.maximets>
Status:	CLOSED ERRATA	QA Contact:	Rick Alongi <ralongi>
Severity:	high	Docs Contact:
Priority:	high
Version:	RHEL 8.0	CC:	ctrautma, jhsiao, kfida, ralongi, trozet
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	FDP 21.I
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openvswitch2.16-2.16.0-13.el8fdp	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2005958 (view as bug list)		Environment:
Last Closed:	2022-01-10 16:50:58 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ilya Maximets 2021-09-10 15:13:12 UTC

Assuming we have a row with a column that contains a set of UUIDs.
And these UUIDs are strong or weak references to rows in some other
database table.

If client will request addition of one new UUID to the set, currently,
ovsdb-server will re-count references for all UUIDs in that set.
If the referenced row will happen to have references to some other
rows, they will be re-counted too.  This is done to keep referential
integrity of the database and for garbage collection, i.e. to delete
all non-root rows that no longer has any references.
Because of this behavior time to execute transaction linearly grows
with increase of the number of elements in the set.

However, in reality, it only need to re-count references for UUIDs
that was added/removed to/from a set, and not for others.
OVSDB knows the new state of a row and the old one, so the difference
could be calculated.  Even more, currently we're storing column
diffs in a database storage, so the 'new' row is already calculated
from the 'old' and 'diff'.  We can just try to keep the diff after
reading from the storage and use it during transaction commit to avoid
unnecessary work.

This should make transaction commit less dependent on the number of
elements in the set by re-calculating only what really changed by the
transaction.

Related functions:
- ovsdb_txn_precommit()
- update_ref_counts()
- assess_weak_refs()

Comment 1 Ilya Maximets 2021-09-17 18:31:28 UTC

Patch for strong references sent for review:
  https://patchwork.ozlabs.org/project/openvswitch/patch/20210916201522.3693567-1-i.maximets@ovn.org/

Comment 2 Ilya Maximets 2021-09-20 14:52:47 UTC

Restricting the scope of this BZ to strong references only (update_ref_count()).
Separate BZ 2005958 created for weak references, as they are a bit harder to
optimize and mostly independent from the strong references.

Comment 3 Ilya Maximets 2021-09-24 14:13:29 UTC

Patch accepted in upstream:
 https://github.com/openvswitch/ovs/commit/b2712d026eae2d9a5150c2805310eaf506e1f162

Comment 4 OvS team 2021-09-29 15:49:53 UTC

* Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-13
- ovsdb-data: Optimize subtraction of sets. [RH git: 5bace82405] (#2005483)
    commit bb12b63176389e516ddfefce20dfa165f24430fb
    Author: Ilya Maximets <i.maximets>
    Date:   Thu Sep 23 01:47:23 2021 +0200
    
        Current algorithm for ovsdb_datum_subtract looks like this:
    
          for-each atom in a:
              if atom in b:
                  swap(atom, <last atom in 'a'>)
                  destroy(atom)
          quicksort(a)
    
        Complexity:
    
          Na * log2(Nb)  +  (Na - Nb) * log2(Na - Nb)
            Search          Comparisons for quicksort
    
        It's not optimal, especially because Nb << Na in a vast majority of
        cases.
    
        Reversing the search phase to look up atoms from 'b' in 'a', and
        closing gaps from deleted elements in 'a' by plain memory copy to
        avoid quicksort.
    
        Resulted complexity:
    
          Nb * log2(Na)  +    (Na - Nb)
            Search          Memory copies
    
        Subtraction is heavily used while executing database transactions.
        For example, to remove one port from a logical switch in OVN.
        Complexity of such operation if original logical switch had 100 ports
        goes down from
    
         100 * log2(1)   = 100 comparisons for search and
          99 * log2(99)  = 656 comparisons for quicksort
                           ------------------------------
                           756 comparisons in total
        to only
    
           1 * log2(100) = 7 comparisons for search
           + memory copy of 99 * sizeof (union ovsdb_atom) bytes.
    
        We could use memmove to close the gaps after removing atoms, but
        it will lead to 2 memory copies inside the call, while we can perform
        only one to the temporary 'result' and swap pointers.
    
        Performance in cases, where sizes of 'a' and 'b' are comparable,
        should not change.  Cases with Nb >> Na should not happen in practice.
    
        All in all, this change allows ovsdb-server to perform several times
        more transactions, that removes elements from sets, per second.
    
        Signed-off-by: Ilya Maximets <i.maximets>
        Acked-by: Han Zhou <hzhou>
        Acked-by: Mark D. Gray <mark.d.gray>
    
    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
    Signed-off-by: Dumitru Ceara <dceara>


* Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-12
- ovsdb-data: Optimize union of sets. [RH git: e2a4c7d794] (#2005483)
    commit 51946d22274cd591dc061358fb507056fbd91420
    Author: Ilya Maximets <i.maximets>
    Date:   Thu Sep 23 01:47:22 2021 +0200
    
        Current algorithm of ovsdb_datum_union looks like this:
    
          for-each atom in b:
              if not bin_search(a, atom):
                  push(a, clone(atom))
          quicksort(a)
    
        So, the complexity looks like this:
    
           Nb * log2(Na)   +    Nb     +   (Na + Nb) * log2(Na + Nb)
           Comparisons        clones       Comparisons for quicksort
           for search
    
        ovsdb_datum_union() is heavily used in database transactions while
        new element is added to a set.  For example, if new logical switch
        port is added to a logical switch in OVN.  This is a very common
        use case where CMS adds one new port to an existing switch that
        already has, let's say, 100 ports.  For this case ovsdb-server will
        have to perform:
    
           1 * log2(100)  + 1 clone + 101 * log2(101)
           Comparisons                Comparisons for
           for search                   quicksort.
               ~7           1            ~707
           Roughly 714 comparisons of atoms and 1 clone.
    
        Since binary search can give us position, where new atom should go
        (it's the 'low' index after the search completion) for free, the
        logic can be re-worked like this:
    
          copied = 0
          for-each atom in b:
              desired_position = bin_search(a, atom)
              push(result, a[ copied : desired_position - 1 ])
              copied = desired_position
              push(result, clone(atom))
          push(result, a[ copied : Na ])
          swap(a, result)
    
        Complexity of this schema:
    
           Nb * log2(Na)   +    Nb     +         Na
           Comparisons        clones       memory copy on push
           for search
    
        'swap' is just a swap of a few pointers.  'push' is not a 'clone',
        but a simple memory copy of 'union ovsdb_atom'.
    
        In general, this schema substitutes complexity of a quicksort
        with complexity of a memory copy of Na atom structures, where we're
        not even copying strings that these atoms are pointing to.
    
        Complexity in the example above goes down from 714 comparisons
        to 7 comparisons and memcpy of 100 * sizeof (union ovsdb_atom) bytes.
    
        General complexity of a memory copy should always be lower than
        complexity of a quicksort, especially because these copies usually
        performed in bulk, so this new schema should work faster for any input.
    
        All in all, this change allows to execute several times more
        transactions per second for transactions that adds new entries to sets.
    
        Alternatively, union can be implemented as a linear merge of two
        sorted arrays, but this will result in O(Na) comparisons, which
        is more than Nb * log2(Na) in common case, since Na is usually
        far bigger than Nb.  Linear merge will also mean per-atom memory
        copies instead of copying in bulk.
    
        'replace' functionality of ovsdb_datum_union() had no users, so it
        just removed.  But it can easily be added back if needed in the future.
    
        Signed-off-by: Ilya Maximets <i.maximets>
        Acked-by: Han Zhou <hzhou>
        Acked-by: Mark D. Gray <mark.d.gray>
    
    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2005483
    Signed-off-by: Dumitru Ceara <dceara>


* Wed Sep 29 2021 Dumitru Ceara <dceara> - 2.16.0-11
- ovsdb: transaction: Use diffs for strong reference counting. [RH git: 85da133eaa] (#2003203)
    commit b2712d026eae2d9a5150c2805310eaf506e1f162
    Author: Ilya Maximets <i.maximets>
    Date:   Tue Sep 14 00:19:57 2021 +0200
    
        Currently, even if one reference added to the set of strong references
        or removed from it, ovsdb-server will walk through the whole set and
        re-count references to other rows.  These referenced rows will also be
        added to the transaction in order to re-count their references.
    
        For example, every time Logical Switch Port added to a Logical Switch,
        OVN Northbound database server will walk through all ports of this
        Logical Switch, clone their rows, and re-count references.  This is
        not very efficient.  Instead, it can only increase reference counters
        for added references and reduce for removed ones.  In many cases this
        will be only one row affected in the Logical_Switch_Port table.
    
        Introducing new function that generates a diff of two datum objects,
        but stores added and removed atoms separately, so they can be used
        to increase or decrease row reference counters accordingly.
    
        This change allows to perform several times more transactions that
        adds or removes strong references to/from sets per second, because
        ovsdb-server no longer clones and re-counts rows that are irrelevant
        to current transaction.
    
        Acked-by: Dumitru Ceara <dceara>
        Signed-off-by: Ilya Maximets <i.maximets>
    
    Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2003203
    Signed-off-by: Dumitru Ceara <dceara>

Comment 8 Rick Alongi 2022-01-05 18:30:36 UTC

Per email with Dev (i.maximets), no specific test case is feasible for this change; general testing performed during the release is sufficient for regression testing.  A full test suite was run for FDP 21.J using openvswitch2.16-2.16.0-32.el8fdp.

Marking as Verified.

Comment 9 Rick Alongi 2022-01-06 12:52:20 UTC

SanityOnly info:

[ralongi@ralongi openvswitch2.16]$ git log --oneline --grep=2003203
85da133eaa ovsdb: transaction: Use diffs for strong reference counting.
[ralongi@ralongi openvswitch2.16]$ git show 85da133eaa
commit 85da133eaa0f7d3dfda836055bf6aaa7d180e7e8
Author: Ilya Maximets <i.maximets>
Date:   Tue Sep 14 00:19:57 2021 +0200

    ovsdb: transaction: Use diffs for strong reference counting.
    
    commit b2712d026eae2d9a5150c2805310eaf506e1f162
    Author: Ilya Maximets <i.maximets>
    Date:   Tue Sep 14 00:19:57 2021 +0200
    
        Currently, even if one reference added to the set of strong references
        or removed from it, ovsdb-server will walk through the whole set and
        re-count references to other rows.  These referenced rows will also be
        added to the transaction in order to re-count their references.
    
        For example, every time Logical Switch Port added to a Logical Switch,
        OVN Northbound database server will walk through all ports of this
        Logical Switch, clone their rows, and re-count references.  This is
        not very efficient.  Instead, it can only increase reference counters
        for added references and reduce for removed ones.  In many cases this
        will be only one row affected in the Logical_Switch_Port table.
    
        Introducing new function that generates a diff of two datum objects,

Comment 11 errata-xmlrpc 2022-01-10 16:50:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.16 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0053