Bug 1608749

Summary: high cpu consumption by trickle thread
Product: Red Hat Enterprise Linux 7 Reporter: Ludwig <lkrispen>
Component: libdbAssignee: Petr Kubat <pkubat>
Status: CLOSED ERRATA QA Contact: Vaclav Danek <vdanek>
Severity: unspecified Docs Contact: Lenka Špačková <lkuprova>
Priority: unspecified    
Version: 7.4CC: databases-maint, gparente, hhorak, jamills, jaredl, mmuzila, pkubat, tmihinto, vashirov, vdanek, yoliynyk
Target Milestone: rcKeywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libdb-5.3.21-25.el7 Doc Type: Bug Fix
Doc Text:
.Optimized CPU consumption by `libdb` A previous update to the `libdb` database caused an excessive CPU consumption in the trickle thread. With this update, the CPU usage has been optimized.
Story Points: ---
Clone Of:
: 1670768 (view as bug list) Environment:
Last Closed: 2019-08-06 12:50:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1630906, 1630914, 1661173, 1670768    

Description Ludwig 2018-07-26 09:07:50 UTC
Description of problem:

In 389-ds the trickle thread takes a lot of cpu, even if there is no activity in the database and there are no dirty pages.


Version-Release number of selected component (if applicable):

 libdb-5.3.21-24.el7.x86_64 

How reproducible:

always

Steps to Reproduce:
1.install and start 389-ds
2.wait and check process with top or perf
3.

Actual results:

trickle thread shows high cpu consumption

Expected results:

if no dirty pages are to trickle no activity

Additional info:


after one night without activity on the database top -H shows that the trickle thread was consuming much cpu:

Threads: 298 total,   0 running, 298 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 12746944+total, 10545320+free,   940692 used, 21075548 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 12161929+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                         
49442 dirsrv    20   0 13.756g 145100 131856 S  1.0  0.1   0:56.30 ns-slapd                                                                                                                                        
49294 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:01.47 ns-slapd                                                                                                                                        
49295 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49296 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49297 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 


and thread 49442 was the trickle thread:


Thread 149 (Thread 0x7fab212e1700 (LWP 49442)):
#0  0x00007fabf5e66a37 in select () at /lib64/libc.so.6
#1  0x00007fabf8d1d0b4 in DS_Sleep () at /usr/lib64/dirsrv/libslapd.so.0
#2  0x00007fabebd8d677 in trickle_threadmain (param=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:4539
#3  0x00007fabf6a8e318 in _pt_root () at /lib64/libnspr4.so
#4  0x00007fabf662f594 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fabf5e6f0df in clone () at /lib64/libc.so.6

Comment 2 Ludwig 2018-07-26 09:10:27 UTC
Viktor had run perf on a process showing:


  25.75%  libdb-5.3.so           [.] __memp_purge_dead_files
  24.25%  libdb-5.3.so           [.] __memp_stat_hash

Comment 3 Petr Kubat 2018-07-30 11:31:50 UTC
This got introduced with the fix for bug 1277887 which adds a call to `__memp_purge_dead_files` into the trickle code:

diff -U 5 -r db-5.3.21.old/src/mp/mp_trickle.c db-5.3.21/src/mp/mp_trickle.c
--- db-5.3.21.old/src/mp/mp_trickle.c   2012-05-12 01:57:53.000000000 +0800
+++ db-5.3.21/src/mp/mp_trickle.c   2016-10-25 17:27:57.000000000 +0800
@@ -65,10 +65,14 @@
        "DB_ENV->memp_trickle: %d: percent must be between 1 and 100",
            "%d"), pct);
        return (EINVAL);
    }
 
+   /* First we purge all dead files and their buffers. */
+   if ((ret = __memp_purge_dead_files(env)) != 0)
+       return (ret);
+
    /*
     * Loop through the caches counting total/dirty buffers.
     *
     * XXX
     * Using hash_page_dirty is our only choice at the moment, but it's not

So it makes sense you are seeing a downgrade in performance as the `__memp_purge_dead_files` operation is quite expensive (checks all buffers that could be possibly freed).

This is then followed by a loop over all the caches counting up total/dirty buffers (using `__memp_stat_hash`) - basically the same loop `__memp_purge_dead_files` is doing so that explains why they take up a similar amount of cpu.

I guess we could try to combine these two loops (should likely halve the cpu usage) but it would in any case result in some duplicate code...

Comment 4 Petr Kubat 2018-07-30 11:49:42 UTC
That said if the customer is experiencing additional cpu usage that is larger than 100% of what they were experiencing before (since the `__memp_stat_hash` loop is not new) it could be just that the trickle function is called too often...

Comment 21 errata-xmlrpc 2019-08-06 12:50:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2121