RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 983091 - Memory leak in 389-ds-base 1.2.11.15
Summary: Memory leak in 389-ds-base 1.2.11.15
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: 389-ds-base
Version: 6.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Rich Megginson
QA Contact: Sankar Ramalingam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-07-10 13:21 UTC by Rich Megginson
Modified: 2020-09-13 20:38 UTC (History)
8 users (show)

Fixed In Version: 389-ds-base-1.2.11.15-22.el6
Doc Type: Bug Fix
Doc Text:
Cause: Opening and closing many hundreds of connections to the directory server per minute for a long period of time. Consequence: The directory server memory growth will be unbounded until killed by the OOM killer. Fix: The directory server was not properly cleaning up and freeing the memory used by old connections. The fix is to clean up this memory properly. Result: The directory server memory usage should level off and not increase any more once the entry/dn/db caches are full.
Clone Of:
Environment:
Last Closed: 2013-11-21 21:10:39 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github 389ds 389-ds-base issues 765 0 None None None 2020-09-13 20:38:37 UTC
Red Hat Product Errata RHBA-2013:1653 0 normal SHIPPED_LIVE 389-ds-base bug fix update 2013-11-20 21:53:19 UTC

Description Rich Megginson 2013-07-10 13:21:48 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/47428

We are experiencing a memory leak when 389 (compiled with openldap support) runs for a very long period of time with many searches and binds.  Running valgrind (two valgrind traces attached), I see the leak appears in liblber.  Specifically, connection_table_new calls ber_sockbuf_alloc, but each new connection also calls ber_sockbuf_add_io which appears to malloc memory.  ber_sockbuf_remove_io is never called, and when 389 shuts down it calls ber_sockbuf_free.  I hackily added a ber_sockbuf_remove_io call to connection_cleanup in connection.c and the memory leak no longer shows.  The attached diff is meant to illustrate the change, definitely not the best way to fix the problem.

3266.log - a run with about 8k searches/binds
3479.log - a run with about 16k searches/binds

Note the:
==3479== 683,920 bytes in 17,098 blocks are still reachable in loss record 3,066 of 3,069
==3479==    at 0x4C279EE: malloc (vg_replace_malloc.c:270)
==3479==    by 0x5EDBFCB: ber_memalloc_x (in /lib64/liblber-2.4.so.2.5.6)
==3479==    by 0x5EDE7C1: ber_sockbuf_add_io (in /lib64/liblber-2.4.so.2.5.6)
==3479==    by 0x418A68: slapd_daemon (daemon.c:2636)
==3479==    by 0x41F16E: main (main.c:1253)

4904.log - run of about 16k connections, no longer containing the above trace
sockbuf.patch - diff adding the remove_io call.

Comment 1 Sankar Ramalingam 2013-07-15 16:38:47 UTC
Is it possible to reproduce this issue on the QE side? If yes, could you add the steps in the bugzilla?

Comment 2 Rich Megginson 2013-07-23 18:02:35 UTC
Steps to reproduce:

1) run top to monitor directory server memory usage
2) run dbmon.sh to monitor server cache usage
3) run ldapsearch/ldclt to open/close connections as quickly as possible

Once cache usage levels off (you don't see any changes to cache usage as shown by dbmon.sh) then monitor the mem% as shown by top.  You should not see the mem% of ns-slapd increase after that.

Comment 6 Ján Rusnačko 2013-10-17 15:37:52 UTC
(In reply to Rich Megginson from comment #2)
> Steps to reproduce:
> 
> 1) run top to monitor directory server memory usage
> 2) run dbmon.sh to monitor server cache usage
> 3) run ldapsearch/ldclt to open/close connections as quickly as possible
> 
> Once cache usage levels off (you don't see any changes to cache usage as
> shown by dbmon.sh) then monitor the mem% as shown by top.  You should not
> see the mem% of ns-slapd increase after that.
Where can I find dbmon.sh ? It`s not installed with DS out of the box. Thank you.

Comment 7 Rich Megginson 2013-10-17 15:44:30 UTC
https://github.com/richm/scripts/wiki/dbmon.sh

Comment 8 Ján Rusnačko 2013-10-18 07:50:39 UTC
With 389-ds-base-1.2.11.15-11.el6.x86_64 and RHEL7 client with ldclt:

[root@localhost ~]# ldclt -h 192.168.122.135 -p 389 -D "cn=tuser1,ou=people,dc=example,dc=com" -w Secret123 -b "ou=people,dc=example,dc=com" -f "cn=tuser2" -e bindeach -e bindonly -e esearch -n 8
ldclt version 4.23
ldclt[2326]: Starting at Fri Oct 18 08:44:24 2013

ldclt[2326]: Average rate: 1598.12/thr  (1278.50/sec), total:  12785
ldclt[2326]: Average rate: 1620.62/thr  (1296.50/sec), total:  12965
ldclt[2326]: Average rate: 1619.88/thr  (1295.90/sec), total:  12959
ldclt[2326]: Average rate: 1589.38/thr  (1271.50/sec), total:  12715
ldclt[2326]: Average rate: 1632.25/thr  (1305.80/sec), total:  13058
ldclt[2326]: Average rate: 1549.62/thr  (1239.70/sec), total:  12397
ldclt[2326]: Average rate: 1562.88/thr  (1250.30/sec), total:  12503
ldclt[2326]: Average rate: 1660.12/thr  (1328.10/sec), total:  13281
^C
ldclt[2326]: Global average rate: 12832.88/thr  (1283.29/sec), total: 102663
ldclt[2326]: Global number times "no activity" reports: never
ldclt[2326]: Global no error occurs during this session.
Catch SIGINT - exit...
ldclt[2326]: Ending at Fri Oct 18 08:45:51 2013
ldclt[2326]: Exit status 0 - No problem during execution.

[jrusnack@dstet ~]$ while true; do sudo pmap -x 2527 | awk '/total/ { print $4 "K" }'; sleep 1; done
39604K
39740K
39804K
39828K
39900K
39972K
40032K
40164K
40248K

Cache haven`t changed (as there was almost nothing to cache).

After upgrade with 389-ds-base-1.2.11.15-29.el6.x86_64:

[root@localhost ~]# ldclt -h 192.168.122.135 -p 389 -D "cn=tuser1,ou=people,dc=example,dc=com" -w Secret123 -b "ou=people,dc=example,dc=com" -f "cn=tuser2" -e bindeach -e bindonly -e esearch -n 8
ldclt version 4.23
ldclt[2338]: Starting at Fri Oct 18 08:48:40 2013

ldclt[2338]: Average rate: 1554.62/thr  (1243.70/sec), total:  12437
ldclt[2338]: Average rate: 1507.75/thr  (1206.20/sec), total:  12062
ldclt[2338]: Average rate: 1551.25/thr  (1241.00/sec), total:  12410
^C
ldclt[2338]: Global average rate: 4613.62/thr  (1230.30/sec), total:  36909
ldclt[2338]: Global number times "no activity" reports: never
ldclt[2338]: Global no error occurs during this session.
Catch SIGINT - exit...
ldclt[2338]: Ending at Fri Oct 18 08:49:20 2013
ldclt[2338]: Exit status 0 - No problem during execution.

[jrusnack@dstet ~]$ while true; do sudo pmap -x 3752 | awk '/total/ { print $4 "K" }'; sleep 1; done
50500K
50500K
50500K
50500K
50500K
50500K
50500K
50500K
50500K
50500K

Verified.

Comment 9 errata-xmlrpc 2013-11-21 21:10:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1653.html


Note You need to log in before you can comment on or make changes to this bug.