Bug 2012949
| Summary: | ovsdb-server: Transaction history grows larger than database itself. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux Fast Datapath | Reporter: | Ilya Maximets <i.maximets> |
| Component: | ovsdb2.16 | Assignee: | Ilya Maximets <i.maximets> |
| Status: | CLOSED ERRATA | QA Contact: | Rick Alongi <ralongi> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | RHEL 8.0 | CC: | ctrautma, fleitner, jhsiao, ralongi |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openvswitch2.16-2.16.0-27.el8fdp | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-10 16:50:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Patch sent for review: https://patchwork.ozlabs.org/project/openvswitch/patch/20211014145822.841717-1-i.maximets@ovn.org/ * Tue Nov 09 2021 Ilya Maximets <i.maximets> - 2.16.0-27
- ovsdb: Don't let transaction history grow larger than the database. [RH git: 93d1fa0bdf] (#2012949)
commit 317b1bfd7dd315e241c158e6d4095002ff391ee3
Author: Ilya Maximets <i.maximets>
Date: Tue Sep 28 13:17:21 2021 +0200
ovsdb: Don't let transaction history grow larger than the database.
If user frequently changes a lot of rows in a database, transaction
history could grow way larger than the database itself. This wastes
a lot of memory and also makes monitor_cond_since slower than
usual monotor_cond if the transaction id is old enough, because
re-construction of the changes from a history is slower than just
creation of initial database snapshot. This is also the case if
user deleted a lot of data, so transaction history still holds all of
it while the database itself doesn't.
In case of current lb-per-service model in ovn-kubernetes, each
load-balancer is added to every logical switch/router. Such a
transaction touches more than a half of a OVN_Northbound database.
And each of these transactions is added to the transaction history.
Since transaction history depth is 100, in worst case scenario,
it will hold 100 copies of a database increasing memory consumption
dramatically. In tests with 3000 LBs and 120 LSs, memory goes up
to 3 GB, while holding at 30 MB if transaction history disabled in
the code.
Fixing that by keeping count of the number of ovsdb_atom's in the
database and not allowing the total number of atoms in transaction
history to grow larger than this value. Counting atoms is fairly
cheap because we don't need to iterate over them, so it doesn't have
significant performance impact. It would be ideal to measure the
size of individual atoms, but that will hit the performance.
Counting cells instead of atoms is not sufficient, because OVN
users are adding hundreds or thousands of atoms to a single cell,
so they are largely different in size.
Signed-off-by: Ilya Maximets <i.maximets>
Acked-by: Han Zhou <hzhou>
Acked-by: Dumitru Ceara <dceara>
Reported-at: https://bugzilla.redhat.com/2012949
Signed-off-by: Ilya Maximets <i.maximets>
Hi Ilya, Can you provide further details on how to configure Valgrind for this scenario? Thanks, Rick (In reply to Rick Alongi from comment #3) > Can you provide further details on how to configure Valgrind for this > scenario? Hi, Rick. In general, it should be as simple as prefixing the ovsdb-server command line with 'valgrind --tool=massif'. When I investigated the issue what I did is that I run an ovn sandbox and looked at running ovsdb-server process with 'ps -aux'. Copied the current command line, then I killed the current process and started a new one under valgrind. E.g. if the original process was: ovsdb-server <some args> nb1.db The valgrind one will be: valgrind --tool=massif ovsdb-server <some args> nb1.db ovn-ctl supports running the daemons under valgrind, but it doesn't support the massif tool, so the process is a bit manual here. At the end of a test you need to ask the process to exit with ovs-appctl. SIGTERM might also work. After that valgrind will start composing the report. After a few seconds you should get a 'massif.out.<pid>' file that can be interpreted with 'ms_print' utility. Hope this helps. Test details below: Reproduced issue using openvswitch2.16-2.16.0-16.el8fdp: yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch-selinux-extra-policy/1.0/28.el8fdp/noarch/openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.16/2.16.0/16.el8fdp/x86_64/openvswitch2.16-2.16.0-16.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/openvswitch2.16/2.16.0/16.el8fdp/x86_64/openvswitch2.16-debuginfo-2.16.0-16.el8fdp.x86_64.rpm yum -y install valgrind yum -y install http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-21.09.0-12.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-central-21.09.0-12.el8fdp.x86_64.rpm http://download-node-02.eng.bos.redhat.com/brewroot/packages/ovn-2021/21.09.0/12.el8fdp/x86_64/ovn-2021-host-21.09.0-12.el8fdp.x86_64.rpm systemctl enable openvswitch.service && systemctl start openvswitch.service systemctl enable ovn-controller systemctl enable ovn-northd systemctl start ovn-controller systemctl start ovn-northd ovsnb_pid=$(ps -aux | grep ovsdb-server | grep 'ovnnb_db' | grep openvsw | awk '{print $2}') for i in $ovsnb_pid; do kill -9 $i; done schema_file=$(find / -name ovn-nb.ovsschema) rm -f /var/lib/ovn/ovnnb_db.db ovsdb-tool create-cluster /var/lib/ovn/ovnnb_db.db $schema_file unix:/run/ovn/ovnnb_db.raft # Confirm that database is clustered: [root@netqe9 ovn]# ovsdb-tool db-cid /var/lib/ovn/ovnnb_db.db d04d8bf8-4b6c-411d-bcc9-1f0047683e6c [root@netqe9 ovn]# ovsdb-tool db-is-clustered /var/lib/ovn/ovnnb_db.db; echo $? 0 setenforce 0 valgrind --tool=massif --massif-out-file=/run/ovn/massif.%p ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --unixctl=/run/ovn/ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db vi bz2012949.sh (add test steps from comment #0) nsw=120 nlb=3000 export OVN_NB_DAEMON=$(ovn-nbctl --detach --pidfile --log-file=ovn-nbctl.log) sleep 1 echo "Adding switches/routers.." for ((i = 0; i < $nsw; i++)); do ovn-nbctl ls-add ls$i ovn-nbctl lr-add lr$i done echo "Adding load balancers.." for ((j = 0; j < $nlb; j++)); do echo "Adding LB $j.." cmd="-- lb-add lb$j 42.42.42.1:8080 42.42.42.2:8081 tcp" for ((i = 0; i < $nsw; i++)); do cmd="$cmd -- ls-lb-add ls$i lb$j -- lr-lb-add lr$i lb$j" done ovn-nbctl $cmd done chmod +x bz2012949.sh ./bz2012949.sh (takes a long time to complete) ovs-appctl -t /run/ovn/ovnnb_db.ctl exit [root@netqe9 ovn]# ls /run/ovn/massif* /run/ovn/massif.39374 /run/ovn/massif.39375 /run/ovn/massif.39376 Note: Per Dev (i.maximets), due to ovsdb_txn_add_to_history function being optimized by a compiler, 'UnknownInlinedFun' is the ovsdb_txn_add_to_history in these traces trace. Problem is indicated below: [root@netqe9 ~]# ms_print massif.39376 | grep UnknownInlinedFun | grep transaction | grep '| | | | |' | | | | | ->37.44% (70,594,960B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->37.23% (70,199,040B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->44.30% (628,696,720B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->44.27% (628,300,800B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->45.09% (708,203,920B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->45.06% (707,808,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->41.12% (899,912,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->41.11% (899,520,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->45.04% (995,593,360B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->45.02% (995,197,440B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->45.71% (1,051,592,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->45.70% (1,051,200,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->41.25% (1,118,792,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->41.24% (1,118,400,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) | | | | | ->46.34% (1,131,080,000B) 0x1A8C98: UnknownInlinedFun (transaction.c:983) | | | | | ->46.32% (1,130,688,000B) 0x1A8C83: UnknownInlinedFun (transaction.c:983) Verified that problem is no longer present in fixed version (openvswitch2.16-2.16.0-32.el8fdp) With fix: [root@netqe21 ovn]# ms_print massif.40261 | grep UnknownInlinedFun | grep transaction | grep '| | | | |' | | | | | ->02.61% (748,880B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.59% (744,960B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.70% (3,075,920B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.70% (3,072,000B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.71% (5,894,480B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.71% (5,890,560B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.71% (7,384,400B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.71% (7,380,480B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.72% (7,538,000B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.71% (7,534,080B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.72% (9,530,960B) 0x1A9570: UnknownInlinedFun (transaction.c:1122) | | | | | ->02.72% (9,527,040B) 0x1A955B: UnknownInlinedFun (transaction.c:1122) Confirmed above findings with i.maximets. Marking BZ Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (openvswitch2.16 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0053 |
If user frequently changes a lot of rows in a database, transaction history could grow way larger than the database itself. This wastes a lot of memory and also makes monitor_cond_since slower than usual monotor_cond if the transaction id is old enough, because re-construction of the changes from a history is slower than just creation of initial database snapshot. This is also the case if user deleted a lot of data, so transaction history still holds all of it while the database itself doesn't. In case of current lb-per-service model in ovn-kubernetes, each load-balancer is added to every logical switch/router. Such a transaction touches more than a half of a OVN_Northbound database. And each of these transactions is added to the transaction history. Since transaction history depth is 100, in worst case scenario, it will hold 100 copies of a database increasing memory consumption dramatically. In tests with 3000 LBs, memory goes up to 3 GB, while holding at 30 MB if transaction history disabled in the code. The output of 'valgrind --tool=massif' looks like this: -------------------------------------------------------------------------------- n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B) -------------------------------------------------------------------------------- 28 449,170,662,901 2,936,870,344 2,921,291,202 15,579,142 0 99.47% (2,921,291,202B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->92.02% (2,702,473,479B) 0x456A8D: xmalloc__ (util.c:137) | ->92.02% (2,702,473,479B) 0x456A8D: xmalloc (util.c:172) | ->90.99% (2,672,386,864B) 0x442D9B: clone_atoms (ovsdb-data.c:990) | | ->90.99% (2,672,386,864B) 0x442D9B: clone_atoms (ovsdb-data.c:984) | | ->90.99% (2,672,386,864B) 0x442D9B: ovsdb_datum_clone (ovsdb-data.c:1012) | | ->90.99% (2,672,386,160B) 0x415AB4: ovsdb_row_clone (row.c:130) | | | ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_clone (transaction.c:1059) | | | | ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_add_to_history (transaction.c:1097) | | | | ->45.51% (1,336,446,000B) 0x41AD6A: ovsdb_txn_replay_commit (transaction.c:1129) | | | | ->45.51% (1,336,446,000B) 0x4091E8: parse_txn (ovsdb-server.c:622) | | | | ->45.51% (1,336,446,000B) 0x4091E8: read_db (ovsdb-server.c:652) | | | | ->45.51% (1,336,446,000B) 0x4070D0: main_loop (ovsdb-server.c:236) | | | | | ->45.51% (1,336,446,000B) 0x4070D0: main (ovsdb-server.c:490) | | | | | | | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) | | | | | | | ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_clone (transaction.c:1058) | | | | ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_add_to_history (transaction.c:1097) | | | | ->45.49% (1,335,916,800B) 0x41AD55: ovsdb_txn_replay_commit (transaction.c:1129) | | | | ->45.49% (1,335,916,800B) 0x4091E8: parse_txn (ovsdb-server.c:622) | | | | ->45.49% (1,335,916,800B) 0x4091E8: read_db (ovsdb-server.c:652) | | | | ->45.49% (1,335,916,800B) 0x4070D0: main_loop (ovsdb-server.c:236) | | | | | ->45.49% (1,335,916,800B) 0x4070D0: main (ovsdb-server.c:490) | | | | | | | | | ->00.00% (0B) in 1+ places, all below ms_print's threshold (01.00%) | | | | | | | ->00.00% (23,360B) in 1+ places, all below ms_print's threshold (01.00%) | | | | | ->00.00% (704B) in 1+ places, all below ms_print's threshold (01.00%) | | | ->01.02% (30,086,615B) in 97 places, all below massif's threshold (1.00%) | ->04.62% (135,617,176B) 0x456CDE: xcalloc__ (util.c:121) Here we can see that ovsdb_txn_add_to_history() is responsible for more than 90% of allocated memory in the process. The issue can be reproduced with a script like this: --- nsw=120 nlb=3000 export OVN_NB_DAEMON=$(ovn-nbctl --detach --pidfile --log-file=ovn-nbctl.log) sleep 1 echo "Adding switches/routers.." for ((i = 0; i < $nsw; i++)); do ovn-nbctl ls-add ls$i ovn-nbctl lr-add lr$i done echo "Adding load balancers.." for ((j = 0; j < $nlb; j++)); do echo "Adding LB $j.." cmd="-- lb-add lb$j 42.42.42.1:8080 42.42.42.2:8081 tcp" for ((i = 0; i < $nsw; i++)); do cmd="$cmd -- ls-lb-add ls$i lb$j -- lr-lb-add lr$i lb$j" done ovn-nbctl $cmd done --- Solution for this could be to limit the size of transaction history to some sane value, e.g. size of the database itself. Preliminary implementation is available here: https://github.com/igsilya/ovs/commit/b0df9c0b5f8321001044c5c6b0a59d7fe9746202