Bug 1608749

Summary:	high cpu consumption by trickle thread
Product:	Red Hat Enterprise Linux 7	Reporter:	Ludwig <lkrispen>
Component:	libdb	Assignee:	Petr Kubat <pkubat>
Status:	CLOSED ERRATA	QA Contact:	Vaclav Danek <vdanek>
Severity:	unspecified	Docs Contact:	Lenka Špačková <lkuprova>
Priority:	unspecified
Version:	7.4	CC:	databases-maint, gparente, hhorak, jamills, jaredl, mmuzila, pkubat, tmihinto, vashirov, vdanek, yoliynyk
Target Milestone:	rc	Keywords:	Patch
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	libdb-5.3.21-25.el7	Doc Type:	Bug Fix
Doc Text:	.Optimized CPU consumption by `libdb` A previous update to the `libdb` database caused an excessive CPU consumption in the trickle thread. With this update, the CPU usage has been optimized.	Story Points:	---
Clone Of:
Clones:	1670768 (view as bug list)		Environment:
Last Closed:	2019-08-06 12:50:50 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1630906, 1630914, 1661173, 1670768

Description Ludwig 2018-07-26 09:07:50 UTC

Description of problem:

In 389-ds the trickle thread takes a lot of cpu, even if there is no activity in the database and there are no dirty pages.


Version-Release number of selected component (if applicable):

 libdb-5.3.21-24.el7.x86_64 

How reproducible:

always

Steps to Reproduce:
1.install and start 389-ds
2.wait and check process with top or perf
3.

Actual results:

trickle thread shows high cpu consumption

Expected results:

if no dirty pages are to trickle no activity

Additional info:


after one night without activity on the database top -H shows that the trickle thread was consuming much cpu:

Threads: 298 total,   0 running, 298 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 12746944+total, 10545320+free,   940692 used, 21075548 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 12161929+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                         
49442 dirsrv    20   0 13.756g 145100 131856 S  1.0  0.1   0:56.30 ns-slapd                                                                                                                                        
49294 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:01.47 ns-slapd                                                                                                                                        
49295 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49296 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49297 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 


and thread 49442 was the trickle thread:


Thread 149 (Thread 0x7fab212e1700 (LWP 49442)):
#0  0x00007fabf5e66a37 in select () at /lib64/libc.so.6
#1  0x00007fabf8d1d0b4 in DS_Sleep () at /usr/lib64/dirsrv/libslapd.so.0
#2  0x00007fabebd8d677 in trickle_threadmain (param=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:4539
#3  0x00007fabf6a8e318 in _pt_root () at /lib64/libnspr4.so
#4  0x00007fabf662f594 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fabf5e6f0df in clone () at /lib64/libc.so.6

Comment 2 Ludwig 2018-07-26 09:10:27 UTC

Viktor had run perf on a process showing:


  25.75%  libdb-5.3.so           [.] __memp_purge_dead_files
  24.25%  libdb-5.3.so           [.] __memp_stat_hash

Comment 3 Petr Kubat 2018-07-30 11:31:50 UTC

This got introduced with the fix for bug 1277887 which adds a call to `__memp_purge_dead_files` into the trickle code:

diff -U 5 -r db-5.3.21.old/src/mp/mp_trickle.c db-5.3.21/src/mp/mp_trickle.c
--- db-5.3.21.old/src/mp/mp_trickle.c   2012-05-12 01:57:53.000000000 +0800
+++ db-5.3.21/src/mp/mp_trickle.c   2016-10-25 17:27:57.000000000 +0800
@@ -65,10 +65,14 @@
        "DB_ENV->memp_trickle: %d: percent must be between 1 and 100",
            "%d"), pct);
        return (EINVAL);
    }
 
+   /* First we purge all dead files and their buffers. */
+   if ((ret = __memp_purge_dead_files(env)) != 0)
+       return (ret);
+
    /*
     * Loop through the caches counting total/dirty buffers.
     *
     * XXX
     * Using hash_page_dirty is our only choice at the moment, but it's not

So it makes sense you are seeing a downgrade in performance as the `__memp_purge_dead_files` operation is quite expensive (checks all buffers that could be possibly freed).

This is then followed by a loop over all the caches counting up total/dirty buffers (using `__memp_stat_hash`) - basically the same loop `__memp_purge_dead_files` is doing so that explains why they take up a similar amount of cpu.

I guess we could try to combine these two loops (should likely halve the cpu usage) but it would in any case result in some duplicate code...

Comment 4 Petr Kubat 2018-07-30 11:49:42 UTC

That said if the customer is experiencing additional cpu usage that is larger than 100% of what they were experiencing before (since the `__memp_stat_hash` loop is not new) it could be just that the trickle function is called too often...

Comment 21 errata-xmlrpc 2019-08-06 12:50:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2121