Bug 2011468

Summary: ovn-dbchecker/ovsdb-tool consuming ~1GiB of RSS
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Joe Talerico <jtaleric>
Component: OVNAssignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: low    
Version: RHEL 8.0CC: ctrautma, dceara, i.maximets, jiji, mmichels, trozet
Target Milestone: ---Flags: jtaleric: needinfo? (trozet)
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: perfscale-ovn
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Talerico 2021-10-06 16:41:36 UTC
Description of problem:
In OpenShift 4.10, we noticed the following :

https://files.slack.com/files-pri/T027F3GAJ-F02GFJ8FDJT/image.png

ovsdb-checker pod consuming around 1GiB of RSS periodically 

Looking in the pod:
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.2  0.0 2378492 48684 ?       Ssl  Oct05   2:16 /usr/bin/ovndbchecker --config-file=/run/ovnkube-config/ovnkube.conf --loglevel 4 --sb-address ssl:10.0.131.75:9642,ssl:10.0.189.132:9642,ssl:10.0.217.253:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert
root        6089  0.0  0.0  12052  3340 pts/0    Ss   16:21   0:00 /bin/sh
root        6102  0.0  0.0  11768  3012 pts/0    S+   16:21   0:00 watch ps aux
root        6136 88.4  1.9 1283036 1244116 ?     R    16:22   0:04 /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db
root        6143  0.0  0.0  11768  1132 pts/0    S+   16:22   0:00 watch ps aux
root        6144  0.0  0.0  44668  3288 pts/0    R+   16:22   0:00 ps aux

root        6202 91.8  1.9 1307588 1268660 ?     R    16:23   0:04 /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db

Running that command manually, with strace

sh-4.4# strace -c /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db
e594ea23-511b-4d22-afc2-b7dd62694c43
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ------------------
 75.09    0.140175           2     55018           read
 24.12    0.045032           2     19085           brk
  0.19    0.000358           8        44           mmap
  0.14    0.000253           9        28           mprotect
  0.12    0.000231          11        21           openat
  0.08    0.000146           6        21           close
  0.08    0.000144           6        21           fstat
  0.05    0.000088           6        14           lseek
  0.04    0.000077           5        13           rt_sigaction
  0.02    0.000037           9         4           getdents64
  0.02    0.000031           4         7           munmap
  0.02    0.000031           6         5           fcntl
  0.01    0.000017           8         2         2 access
  0.00    0.000009           9         1           pipe
  0.00    0.000009           4         2         1 arch_prctl
  0.00    0.000007           7         1           sched_getaffinity
  0.00    0.000007           7         1           set_tid_address
  0.00    0.000007           7         1           getrandom
  0.00    0.000006           6         1           rt_sigprocmask
  0.00    0.000006           6         1           set_robust_list
  0.00    0.000006           6         1           prlimit64
  0.00    0.000000           0         1           write
  0.00    0.000000           0         2           mremap
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ------------------
100.00    0.186677           2     74296         3 total

Comment 1 Ilya Maximets 2021-10-11 15:17:10 UTC
Since the database file is incremental, ovsdb-tool has to read and reconstruct
the whole database in order to find the current server_id.  If the database file
is large, it will take a lot of time and memory.

In general ovsdb-tool is intended to be used if the database server is offline.
But in this case server is up and running, IIUC, so the ovsdb-clinet can be used
to request this information from the running server instead, e.g.:

 ovsdb-client dump ssl:10.0.131.75:9642 _Server Database name sid | grep OVN_Southbound

This should be fast and cheap operation.

Comment 2 Dumitru Ceara 2021-10-11 15:24:19 UTC
There's also "ovs-appctl -t ... cluster/sid" which does the same thing.

ovn-kubernetes is moving to ovs-appctl:
https://github.com/ovn-org/ovn-kubernetes/pull/2554

I guess we can probably close this BZ.