1608749 – high cpu consumption by trickle thread

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1608749 - high cpu consumption by trickle thread

Summary: high cpu consumption by trickle thread

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	libdb
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Petr Kubat
QA Contact:	Vaclav Danek
Docs Contact:	Lenka Špačková
URL:
Whiteboard:
Depends On:
Blocks:	1630906 1630914 1661173 1670768
TreeView+	depends on / blocked

Reported:	2018-07-26 09:07 UTC by Ludwig
Modified:	2022-03-13 15:18 UTC (History)
CC List:	11 users (show)
Fixed In Version:	libdb-5.3.21-25.el7
Doc Type:	Bug Fix
Doc Text:	.Optimized CPU consumption by `libdb` A previous update to the `libdb` database caused an excessive CPU consumption in the trickle thread. With this update, the CPU usage has been optimized.
Clone Of:
Clones:	1670768 (view as bug list)
Environment:
Last Closed:	2019-08-06 12:50:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2121	0	None	None	None	2019-08-06 12:50:53 UTC

Description Ludwig 2018-07-26 09:07:50 UTC

Description of problem:

In 389-ds the trickle thread takes a lot of cpu, even if there is no activity in the database and there are no dirty pages.


Version-Release number of selected component (if applicable):

 libdb-5.3.21-24.el7.x86_64 

How reproducible:

always

Steps to Reproduce:
1.install and start 389-ds
2.wait and check process with top or perf
3.

Actual results:

trickle thread shows high cpu consumption

Expected results:

if no dirty pages are to trickle no activity

Additional info:


after one night without activity on the database top -H shows that the trickle thread was consuming much cpu:

Threads: 298 total,   0 running, 298 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 12746944+total, 10545320+free,   940692 used, 21075548 buff/cache
KiB Swap:  4194300 total,  4194300 free,        0 used. 12161929+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                                                         
49442 dirsrv    20   0 13.756g 145100 131856 S  1.0  0.1   0:56.30 ns-slapd                                                                                                                                        
49294 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:01.47 ns-slapd                                                                                                                                        
49295 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49296 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 ns-slapd                                                                                                                                        
49297 dirsrv    20   0 13.756g 145100 131856 S  0.0  0.1   0:00.00 


and thread 49442 was the trickle thread:


Thread 149 (Thread 0x7fab212e1700 (LWP 49442)):
#0  0x00007fabf5e66a37 in select () at /lib64/libc.so.6
#1  0x00007fabf8d1d0b4 in DS_Sleep () at /usr/lib64/dirsrv/libslapd.so.0
#2  0x00007fabebd8d677 in trickle_threadmain (param=<optimized out>) at ldap/servers/slapd/back-ldbm/dblayer.c:4539
#3  0x00007fabf6a8e318 in _pt_root () at /lib64/libnspr4.so
#4  0x00007fabf662f594 in start_thread () at /lib64/libpthread.so.0
#5  0x00007fabf5e6f0df in clone () at /lib64/libc.so.6

Comment 2 Ludwig 2018-07-26 09:10:27 UTC

Viktor had run perf on a process showing:


  25.75%  libdb-5.3.so           [.] __memp_purge_dead_files
  24.25%  libdb-5.3.so           [.] __memp_stat_hash

Comment 3 Petr Kubat 2018-07-30 11:31:50 UTC

This got introduced with the fix for bug 1277887 which adds a call to `__memp_purge_dead_files` into the trickle code:

diff -U 5 -r db-5.3.21.old/src/mp/mp_trickle.c db-5.3.21/src/mp/mp_trickle.c
--- db-5.3.21.old/src/mp/mp_trickle.c   2012-05-12 01:57:53.000000000 +0800
+++ db-5.3.21/src/mp/mp_trickle.c   2016-10-25 17:27:57.000000000 +0800
@@ -65,10 +65,14 @@
        "DB_ENV->memp_trickle: %d: percent must be between 1 and 100",
            "%d"), pct);
        return (EINVAL);
    }
 
+   /* First we purge all dead files and their buffers. */
+   if ((ret = __memp_purge_dead_files(env)) != 0)
+       return (ret);
+
    /*
     * Loop through the caches counting total/dirty buffers.
     *
     * XXX
     * Using hash_page_dirty is our only choice at the moment, but it's not

So it makes sense you are seeing a downgrade in performance as the `__memp_purge_dead_files` operation is quite expensive (checks all buffers that could be possibly freed).

This is then followed by a loop over all the caches counting up total/dirty buffers (using `__memp_stat_hash`) - basically the same loop `__memp_purge_dead_files` is doing so that explains why they take up a similar amount of cpu.

I guess we could try to combine these two loops (should likely halve the cpu usage) but it would in any case result in some duplicate code...

Comment 4 Petr Kubat 2018-07-30 11:49:42 UTC

That said if the customer is experiencing additional cpu usage that is larger than 100% of what they were experiencing before (since the `__memp_stat_hash` loop is not new) it could be just that the trickle function is called too often...

Comment 21 errata-xmlrpc 2019-08-06 12:50:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2121

Note You need to log in before you can comment on or make changes to this bug.