Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2012949

Summary: ovsdb-server: Transaction history grows larger than database itself.
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ilya Maximets <i.maximets>
Component: ovsdb2.16Assignee: Ilya Maximets <i.maximets>
Status: CLOSED ERRATA QA Contact: Rick Alongi <ralongi>
Severity: high Docs Contact:
Priority: high    
Version: RHEL 8.0CC: ctrautma, fleitner, jhsiao, ralongi
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.16-2.16.0-27.el8fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-10 16:50:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ilya Maximets 2021-10-11 17:14:18 UTC
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself.  This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot.  This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.

In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router.  Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically.  In tests with 3000 LBs, memory goes up to 3 GB, while
holding at 30 MB if transaction history disabled in the code.
The output of 'valgrind --tool=massif' looks like this:

--------------------------------------------------------------------------------
  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
--------------------------------------------------------------------------------
 28 449,170,662,901    2,936,870,344    2,921,291,202    15,579,142            0
99.47% (2,921,291,202B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->92.02% (2,702,473,479B) 0x456A8D: xmalloc__ (util.c:137)
| ->92.02% (2,702,473,479B) 0x456A8D: xmalloc (util.c:172)
|   ->90.99% (2,672,386,864B) 0x442D9B: clone_atoms (ovsdb-data.c:990)
|   | ->90.99% (2,672,386,864B) 0x442D9B: clone_atoms (ovsdb-data.c:984)
|   |   ->90.99% (2,672,386,864B) 0x442D9B: ovsdb_datum_clone (ovsdb-data.c:1012)
|   |     ->90.99% (2,672,386,160B) 0x415AB4: ovsdb_row_clone (row.c:130)
|   |     | ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_clone (transaction.c:1059)
|   |     | | ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_add_to_history (transaction.c:1097)
|   |     | |   ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_replay_commit (transaction.c:1129)
|   |     | |     ->45.51% (1,336,446,000B) 0x4091E8: parse_txn (ovsdb-server.c:622)
|   |     | |       ->45.51% (1,336,446,000B) 0x4091E8: read_db (ovsdb-server.c:652)
|   |     | |         ->45.51% (1,336,446,000B) 0x4070D0: main_loop (ovsdb-server.c:236)
|   |     | |         | ->45.51% (1,336,446,000B) 0x4070D0: main (ovsdb-server.c:490)
|   |     | |         |   
|   |     | |         ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
|   |     | |         
|   |     | ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_clone (transaction.c:1058)
|   |     | | ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_add_to_history (transaction.c:1097)
|   |     | |   ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_replay_commit (transaction.c:1129)
|   |     | |     ->45.49% (1,335,916,800B) 0x4091E8: parse_txn (ovsdb-server.c:622)
|   |     | |       ->45.49% (1,335,916,800B) 0x4091E8: read_db (ovsdb-server.c:652)
|   |     | |         ->45.49% (1,335,916,800B) 0x4070D0: main_loop (ovsdb-server.c:236)
|   |     | |         | ->45.49% (1,335,916,800B) 0x4070D0: main (ovsdb-server.c:490)
|   |     | |         |   
|   |     | |         ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%)
|   |     | |         
|   |     | ->00.00% (23,360B) in 1+ places, all below ms_print's threshold (01.00%)
|   |     | 
|   |     ->00.00% (704B) in 1+ places, all below ms_print's threshold (01.00%)
|   |     
|   ->01.02% (30,086,615B) in 97 places, all below massif's threshold (1.00%)
|     
->04.62% (135,617,176B) 0x456CDE: xcalloc__ (util.c:121)

Here we can see that ovsdb_txn_add_to_history() is responsible for more
than 90% of allocated memory in the process.

The issue can be reproduced with a script like this:
---
nsw=120
nlb=3000

export OVN_NB_DAEMON=$(ovn-nbctl --detach --pidfile --log-file=ovn-nbctl.log)

sleep 1

echo "Adding switches/routers.."
for ((i = 0; i < $nsw; i++)); do
    ovn-nbctl ls-add ls$i
    ovn-nbctl lr-add lr$i
done

echo "Adding load balancers.."
for ((j = 0; j < $nlb; j++)); do
    echo "Adding LB $j.."
    cmd="-- lb-add lb$j 42.42.42.1:8080 42.42.42.2:8081 tcp"
    for ((i = 0; i < $nsw; i++)); do
        cmd="$cmd -- ls-lb-add ls$i lb$j -- lr-lb-add lr$i lb$j"
    done
    ovn-nbctl $cmd
done
---

Solution for this could be to limit the size of transaction history to
some sane value, e.g. size of the database itself.

Preliminary implementation is available here:
  https://github.com/igsilya/ovs/commit/b0df9c0b5f8321001044c5c6b0a59d7fe9746202

Comment 2 OvS team 2021-11-10 01:27:43 UTC
* Tue Nov 09 2021 Ilya Maximets <i.maximets> - 2.16.0-27
- ovsdb: Don't let transaction history grow larger than the database. [RH git: 93d1fa0bdf] (#2012949)
    commit 317b1bfd7dd315e241c158e6d4095002ff391ee3
    Author: Ilya Maximets <i.maximets>
    Date:   Tue Sep 28 13:17:21 2021 +0200
    
        ovsdb: Don't let transaction history grow larger than the database.
    
        If user frequently changes a lot of rows in a database, transaction
        history could grow way larger than the database itself.  This wastes
        a lot of memory and also makes monitor_cond_since slower than
        usual monotor_cond if the transaction id is old enough, because
        re-construction of the changes from a history is slower than just
        creation of initial database snapshot.  This is also the case if
        user deleted a lot of data, so transaction history still holds all of
        it while the database itself doesn't.
    
        In case of current lb-per-service model in ovn-kubernetes, each
        load-balancer is added to every logical switch/router.  Such a
        transaction touches more than a half of a OVN_Northbound database.
        And each of these transactions is added to the transaction history.
        Since transaction history depth is 100, in worst case scenario,
        it will hold 100 copies of a database increasing memory consumption
        dramatically.  In tests with 3000 LBs and 120 LSs, memory goes up
        to 3 GB, while holding at 30 MB if transaction history disabled in
        the code.
    
        Fixing that by keeping count of the number of ovsdb_atom's in the
        database and not allowing the total number of atoms in transaction
        history to grow larger than this value.  Counting atoms is fairly
        cheap because we don't need to iterate over them, so it doesn't have
        significant performance impact.  It would be ideal to measure the
        size of individual atoms, but that will hit the performance.
        Counting cells instead of atoms is not sufficient, because OVN
        users are adding hundreds or thousands of atoms to a single cell,
        so they are largely different in size.
    
        Signed-off-by: Ilya Maximets <i.maximets>
        Acked-by: Han Zhou <hzhou>
        Acked-by: Dumitru Ceara <dceara>
    
    Reported-at: https://bugzilla.redhat.com/2012949
    Signed-off-by: Ilya Maximets <i.maximets>

Comment 3 Rick Alongi 2021-12-08 17:49:29 UTC
Hi Ilya,

Can you provide further details on how to configure Valgrind for this scenario?

Thanks,
Rick

Comment 4 Ilya Maximets 2021-12-09 12:59:35 UTC
(In reply to Rick Alongi from comment #3)
> Can you provide further details on how to configure Valgrind for this
> scenario?

Hi, Rick.  In general, it should be as simple as prefixing the ovsdb-server
command line with 'valgrind --tool=massif'.  When I investigated the issue
what I did is that I run an ovn sandbox and looked at running ovsdb-server
process with 'ps -aux'.  Copied the current command line, then I killed the
current process and started a new one under valgrind.  E.g. if the original
process was:
   ovsdb-server <some args> nb1.db
The valgrind one will be:
   valgrind --tool=massif ovsdb-server <some args> nb1.db

ovn-ctl supports running the daemons under valgrind, but it doesn't
support the massif tool, so the process is a bit manual here.

At the end of a test you need to ask the process to exit with ovs-appctl.
SIGTERM might also work.  After that valgrind will start composing the report.
After a few seconds you should get a 'massif.out.<pid>' file that can be
interpreted with 'ms_print' utility.

Hope this helps.

Comment 7 Rick Alongi 2021-12-16 15:06:41 UTC
Test details below:

Reproduced issue using openvswitch2.16-2.16.0-16.el8fdp:

yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/28.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.16/2.16.0/16.el8fdp/x86_64/openvswitch2.16-2.16.0-16.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.16/2.16.0/16.el8fdp/x86_64/openvswitch2.16-debuginfo-2.16.0-16.el8fdp.x86_64.rpm

yum -y install valgrind

yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-21.09.0-12.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-central-21.09.0-12.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-host-21.09.0-12.el8fdp.x86_64.rpm 

systemctl enable openvswitch.service && systemctl start openvswitch.service
systemctl enable ovn-controller
systemctl enable ovn-northd
systemctl start ovn-controller
systemctl start ovn-northd

ovsnb_pid=$(ps -aux | grep ovsdb-server | grep 'ovnnb_db' | grep openvsw | awk '{print $2}')
for i in $ovsnb_pid; do kill -9 $i; done

schema_file=$(find / -name ovn-nb.ovsschema)
rm -f /var/lib/ovn/ovnnb_db.db

ovsdb-tool create-cluster /var/lib/ovn/ovnnb_db.db $schema_file unix:/run/ovn/ovnnb_db.raft

# Confirm that database is clustered:
[root@netqe9 ovn]# ovsdb-tool db-cid /var/lib/ovn/ovnnb_db.db
d04d8bf8-4b6c-411d-bcc9-1f0047683e6c

[root@netqe9 ovn]# ovsdb-tool db-is-clustered /var/lib/ovn/ovnnb_db.db; echo $?
0

setenforce 0
valgrind --tool=massif --massif-out-file=/run/ovn/massif.%p ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --unixctl=/run/ovn/ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db

vi bz2012949.sh (add test steps from comment #0)

nsw=120
nlb=3000

export OVN_NB_DAEMON=$(ovn-nbctl --detach --pidfile --log-file=ovn-nbctl.log)

sleep 1

echo "Adding switches/routers.."
for ((i = 0; i < $nsw; i++)); do
    ovn-nbctl ls-add ls$i
    ovn-nbctl lr-add lr$i
done

echo "Adding load balancers.."
for ((j = 0; j < $nlb; j++)); do
    echo "Adding LB $j.."
    cmd="-- lb-add lb$j 42.42.42.1:8080 42.42.42.2:8081 tcp"
    for ((i = 0; i < $nsw; i++)); do
        cmd="$cmd -- ls-lb-add ls$i lb$j -- lr-lb-add lr$i lb$j"
    done
    ovn-nbctl $cmd
done

chmod +x bz2012949.sh
./bz2012949.sh (takes a long time to complete)
ovs-appctl -t /run/ovn/ovnnb_db.ctl exit

[root@netqe9 ovn]# ls /run/ovn/massif*
/run/ovn/massif.39374  /run/ovn/massif.39375  /run/ovn/massif.39376

Note: Per Dev (i.maximets), due to ovsdb_txn_add_to_history function being optimized by a compiler, 'UnknownInlinedFun' is the ovsdb_txn_add_to_history in these traces trace.

Problem is indicated below:
[root@netqe9 ~]# ms_print massif.39376 | grep UnknownInlinedFun | grep transaction | grep '| | |   | |'
| | |   | | ->37.44% (70,594,960B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->37.23% (70,199,040B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->44.30% (628,696,720B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->44.27% (628,300,800B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.09% (708,203,920B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.06% (707,808,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->41.12% (899,912,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->41.11% (899,520,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.04% (995,593,360B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.02% (995,197,440B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.71% (1,051,592,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->45.70% (1,051,200,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->41.25% (1,118,792,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->41.24% (1,118,400,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)
| | |   | | ->46.34% (1,131,080,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983)
| | |   | | ->46.32% (1,130,688,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983)


Verified that problem is no longer present in fixed version (openvswitch2.16-2.16.0-32.el8fdp)

With fix:
[root@netqe21 ovn]# ms_print massif.40261 | grep UnknownInlinedFun | grep transaction | grep '| | |   | |'
| | |   | | ->02.61% (748,880B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.59% (744,960B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.70% (3,075,920B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.70% (3,072,000B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.71% (5,894,480B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.71% (5,890,560B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.71% (7,384,400B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.71% (7,380,480B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.72% (7,538,000B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.71% (7,534,080B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.72% (9,530,960B) 0x1A9570: UnknownInlinedFun (transaction.c:1122)
| | |   | | ->02.72% (9,527,040B) 0x1A955B: UnknownInlinedFun (transaction.c:1122)

Confirmed above findings with i.maximets.  Marking BZ Verified.

Comment 9 errata-xmlrpc 2022-01-10 16:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (openvswitch2.16 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0053