Description of problem: In OpenShift 4.10, we noticed the following : https://files.slack.com/files-pri/T027F3GAJ-F02GFJ8FDJT/image.png ovsdb-checker pod consuming around 1GiB of RSS periodically Looking in the pod: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.2 0.0 2378492 48684 ? Ssl Oct05 2:16 /usr/bin/ovndbchecker --config-file=/run/ovnkube-config/ovnkube.conf --loglevel 4 --sb-address ssl:10.0.131.75:9642,ssl:10.0.189.132:9642,ssl:10.0.217.253:9642 --sb-client-privkey /ovn-cert/tls.key --sb-client-cert root 6089 0.0 0.0 12052 3340 pts/0 Ss 16:21 0:00 /bin/sh root 6102 0.0 0.0 11768 3012 pts/0 S+ 16:21 0:00 watch ps aux root 6136 88.4 1.9 1283036 1244116 ? R 16:22 0:04 /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db root 6143 0.0 0.0 11768 1132 pts/0 S+ 16:22 0:00 watch ps aux root 6144 0.0 0.0 44668 3288 pts/0 R+ 16:22 0:00 ps aux root 6202 91.8 1.9 1307588 1268660 ? R 16:23 0:04 /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db Running that command manually, with strace sh-4.4# strace -c /usr/bin/ovsdb-tool db-sid /etc/ovn/ovnsb_db.db e594ea23-511b-4d22-afc2-b7dd62694c43 % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ------------------ 75.09 0.140175 2 55018 read 24.12 0.045032 2 19085 brk 0.19 0.000358 8 44 mmap 0.14 0.000253 9 28 mprotect 0.12 0.000231 11 21 openat 0.08 0.000146 6 21 close 0.08 0.000144 6 21 fstat 0.05 0.000088 6 14 lseek 0.04 0.000077 5 13 rt_sigaction 0.02 0.000037 9 4 getdents64 0.02 0.000031 4 7 munmap 0.02 0.000031 6 5 fcntl 0.01 0.000017 8 2 2 access 0.00 0.000009 9 1 pipe 0.00 0.000009 4 2 1 arch_prctl 0.00 0.000007 7 1 sched_getaffinity 0.00 0.000007 7 1 set_tid_address 0.00 0.000007 7 1 getrandom 0.00 0.000006 6 1 rt_sigprocmask 0.00 0.000006 6 1 set_robust_list 0.00 0.000006 6 1 prlimit64 0.00 0.000000 0 1 write 0.00 0.000000 0 2 mremap 0.00 0.000000 0 1 execve ------ ----------- ----------- --------- --------- ------------------ 100.00 0.186677 2 74296 3 total
Since the database file is incremental, ovsdb-tool has to read and reconstruct the whole database in order to find the current server_id. If the database file is large, it will take a lot of time and memory. In general ovsdb-tool is intended to be used if the database server is offline. But in this case server is up and running, IIUC, so the ovsdb-clinet can be used to request this information from the running server instead, e.g.: ovsdb-client dump ssl:10.0.131.75:9642 _Server Database name sid | grep OVN_Southbound This should be fast and cheap operation.
There's also "ovs-appctl -t ... cluster/sid" which does the same thing. ovn-kubernetes is moving to ovs-appctl: https://github.com/ovn-org/ovn-kubernetes/pull/2554 I guess we can probably close this BZ.