Bug 707462

Summary: Memory leak when using Export tool
Product: [Retired] 389 Reporter: Aaron Roots <aaron.roots>
Component: Directory ServerAssignee: Rich Megginson <rmeggins>
Status: CLOSED DUPLICATE QA Contact: Chandrasekar Kannan <ckannan>
Severity: high Docs Contact:
Priority: high    
Version: 1.2.8CC: aaron.roots, benl, daniel.appleby, nhosoi
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-29 16:42:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 434915, 708096    
Attachments:
Description Flags
Valgrind output none

Description Aaron Roots 2011-05-25 07:23:35 UTC
Am using the export util to dump out the directory (containing 60000 users). This occurs about every 5 minutes. This seems to accelerate a memory leak that we see in the replicants as well - but a lot slower there as the export tool is not being run. The server crashes once a day due to running out of memory.

We have rolled back to our previous version on our production system which does not appear to have this issue: 1.2.6.1. Still have the latest version 1.2.8.3 on our Dev environment for further testing

db2ldif.pl -1 -u -N -D "cn=Directory Manager" -n $LDAPINSTANCE -w $DIRMANAGERPWD -a $LDAPEXPORTFILE

Comment 1 Rich Megginson 2011-05-26 14:50:26 UTC
What are your cache settings on that machine?  32-bit or 64-bit?  How much RAM do you have?

Comment 2 Aaron Roots 2011-05-27 03:13:35 UTC
64-bit with 2GB of RAM.

Memory available for cache is 50 MB (52428800 bytes) for each database & LDBM Plug-in settings

Comment 3 Rich Megginson 2011-05-27 15:03:48 UTC
grep nsslapd-cachememsize /etc/dirsrv/slapd-INST/dse.ldif

ls -al /var/lib/dirsrv/slapd-INST/db/*/id2entry.db4

Comment 4 Aaron Roots 2011-05-27 22:14:16 UTC
egrep "^dn|nsslapd-cachememsize" /etc/dirsrv/slapd-INST/dse.ldif
dn: cn=accessgroupsData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=automountData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=groupData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=netgroupData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 10485760
dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800


ls -al /var/lib/dirsrv/slapd-INST/db/*/id2entry.db4
-rw------- 1 nobody nobody 1179648 May 27 13:51 /var/lib/dirsrv/slapd-INST/db/netgroupData/id2entry.db4
-rw------- 1 nobody nobody 12386304 Mar 18 10:34 /var/lib/dirsrv/slapd-INST/db/groupData/id2entry.db4
-rw------- 1 nobody nobody 60645376 May 27 15:49 /var/lib/dirsrv/slapd-INST/db/automountData/id2entry.db4
-rw------- 1 nobody nobody 139264 May 27 11:56 /var/lib/dirsrv/slapd-INST/db/NetscapeRoot/id2entry.db4
-rw------- 1 nobody nobody 860864512 May 28 07:18 /var/lib/dirsrv/slapd-INST/db/userRoot/id2entry.db4
-rw------- 1 nobody nobody 41025536 May 28 03:12 /var/lib/dirsrv/slapd-INST/db/accessgroupsData/id2entry.db4

Comment 5 Rich Megginson 2011-05-27 22:34:22 UTC
The nsslapd-cachememsize should be at least 2*the size of the id2entry.db4.  You can monitor the cache usage here - http://docs.redhat.com/docs/en-US/Red_Hat_Directory_Server/8.2/html-single/Administration_Guide/index.html#Monitoring_Server_and_Database_Activity-Monitoring_Database_Activity

Keep an eye on the sizes in bytes (ignore the (in entries) items)

Once the cache is warmed up, your cache hit ratio should approach 100 (percent)

I suspect this is related to https://bugzilla.redhat.com/show_bug.cgi?id=697701

Comment 6 Aaron Roots 2011-06-03 07:32:33 UTC
I've increased the cache settings - however are still experiencing the memory leak and crashes - there is a longer delay of a few days - however as we had to add more memory to the box to be able to increase the cache I am not sure if this is related to these settings

dn: cn=accessgroupsData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 125829120
dn: cn=automountData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 209715200
dn: cn=groupData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=netgroupData,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 52428800
dn: cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 10485760
dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
nsslapd-cachememsize: 3221225472


-rw------- 1 nobody nobody  41025536 Jun  3 03:11 /var/lib/dirsrv/slapd-auth-master-dev/db/accessgroupsData/id2entry.db4
-rw------- 1 nobody nobody  60645376 Jun  3 09:49 /var/lib/dirsrv/slapd-auth-master-dev/db/automountData/id2entry.db4
-rw------- 1 nobody nobody  12386304 Jun  3 03:11 /var/lib/dirsrv/slapd-auth-master-dev/db/groupData/id2entry.db4
-rw------- 1 nobody nobody   1253376 Jun  3 09:41 /var/lib/dirsrv/slapd-auth-master-dev/db/netgroupData/id2entry.db4
-rw------- 1 nobody nobody    139264 Jun  2 11:37 /var/lib/dirsrv/slapd-auth-master-dev/db/NetscapeRoot/id2entry.db4
-rw------- 1 nobody nobody 860864512 Jun  3 09:52 /var/lib/dirsrv/slapd-auth-master-dev/db/userRoot/id2entry.db4

Comment 7 Rich Megginson 2011-06-03 16:02:36 UTC
(In reply to comment #6)
> I've increased the cache settings - however are still experiencing the memory
> leak and crashes - there is a longer delay of a few days - however as we had to
> add more memory to the box to be able to increase the cache I am not sure if
> this is related to these settings
> 
> dn: cn=accessgroupsData,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 125829120
> dn: cn=automountData,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 209715200
> dn: cn=groupData,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 52428800
> dn: cn=netgroupData,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 52428800
> dn: cn=NetscapeRoot,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 10485760
> dn: cn=userRoot,cn=ldbm database,cn=plugins,cn=config
> nsslapd-cachememsize: 3221225472

Try increasing this some more - by multiples of the size of the id2entry database.  Start with 3221225472+860864512, see if that makes the problem better, then 3221225472+2*860864512, etc.
> 
> 
> -rw------- 1 nobody nobody  41025536 Jun  3 03:11
> /var/lib/dirsrv/slapd-auth-master-dev/db/accessgroupsData/id2entry.db4
> -rw------- 1 nobody nobody  60645376 Jun  3 09:49
> /var/lib/dirsrv/slapd-auth-master-dev/db/automountData/id2entry.db4
> -rw------- 1 nobody nobody  12386304 Jun  3 03:11
> /var/lib/dirsrv/slapd-auth-master-dev/db/groupData/id2entry.db4
> -rw------- 1 nobody nobody   1253376 Jun  3 09:41
> /var/lib/dirsrv/slapd-auth-master-dev/db/netgroupData/id2entry.db4
> -rw------- 1 nobody nobody    139264 Jun  2 11:37
> /var/lib/dirsrv/slapd-auth-master-dev/db/NetscapeRoot/id2entry.db4
> -rw------- 1 nobody nobody 860864512 Jun  3 09:52
> /var/lib/dirsrv/slapd-auth-master-dev/db/userRoot/id2entry.db4

Comment 10 Noriko Hosoi 2011-06-07 23:18:04 UTC
I'm trying to reproduce the problem, but so far no luck.

I created 2 backends:
  backend1:
    nsslapd-cachememsize: 150000000
    id2entry.db4: 136855552 bytes (50K entries)
  backend2:
    nsslapd-cachememsize: 1600000000
    id2entry.db4: 1371119616 bytes (500K entries)

I repeatedly ran ./db2ldif.pl against the 2 backends + ran add/delete/search operations at the same time.

The size of ns-slapd started with 162,322KB and gradually increased.  I was monitoring the entry cache size.  Once the cache reaches the max cache size:
  currententrycachesize: 1599997873
  maxentrycachesize: 1600000000
the growth of the process size stopped.  The size was 1,296,959KB.

I ran this test with the standalone Directory Server.  I'm wondering if there could be some other configurations/operations/data that triggers the leak(s).  Could it be possible to share your configuration file (dse.ldif) and log files (errors and access) with us?  

Also, could there be anything unique to your system?  Which plug-ins you enabled?  Custom schema?  Any special data, images, certs in entries?

Your help would be greatly appreciate it.

Comment 11 Daniel 2011-06-20 03:37:01 UTC
Hi Noriko,

I am working with Aaron on this issue. We run a fairly stock system with a couple of custom schemas. We don't use any plugin's. We mainly see the issue with the userRoot database due to the large number of objects it contains.

Our scripts export and immediately start doing updates (if any are required). I have setup a test case which just uses the export tool (no writing afterwards) to try and narrow down the problem.

I am happy to send my custom schemas and dse.ldif but would like them to be kept private. Is their a way i can get these files to you privately?

Comment 12 Daniel 2011-06-21 00:24:36 UTC
Created attachment 505722 [details]
Valgrind output

Comment 13 Daniel 2011-06-21 00:27:47 UTC
Hi,

I have run dirsrv under valgrind and reproduced the issue. See the attached output.

I triggered it by running the export utility on the same database over and over (every minute)

Let me know if you need any more info.

Regards,
Daniel

Comment 14 Noriko Hosoi 2011-06-27 22:11:15 UTC
Thank you for the valgrind output.  It looks the leak is already fixed in the master tree.

https://bugzilla.redhat.com/show_bug.cgi?id=697027#c6

We are releasing 389-ds-base 1.2.9 alpha soon.  When it's available, could you please run the test on the release?

Comment 15 Noriko Hosoi 2011-06-28 21:51:27 UTC
389-ds-base 1.2.9 alpha 2 is ready.

Could you go to this site and download a package that matches your platform?
http://koji.fedoraproject.org/koji/packageinfo?packageID=8423

  389-ds-base-1.2.9-0.2.a2.el5
  389-ds-base-1.2.9-0.2.a2.fc14
  389-ds-base-1.2.9-0.2.a2.fc15
  389-ds-base-1.2.9-0.2.a2.fc16

We'd greatly appreciate your testing on the new alpha release!

Comment 16 Daniel 2011-06-29 02:10:30 UTC
Hi Noriko,

I have installed 389-ds-base-1.2.9-0.2.a2.el5 and the memory usage appears to be holding. I'll leave it for a few more hours and let you know but it's no longer growing at the rate it was before.

Thanks,
Daniel

Comment 17 Daniel 2011-06-29 07:01:32 UTC
Hi Noriko,

389-ds-base-1.2.9-0.2.a2.el5 has fixed the issue. The cache size no longer exceeds the max cache size.

Do you know when the 1.2.9 will move into testing?

Thank you for your assistance with this bug.

Regards,
Daniel

Comment 18 Rich Megginson 2011-06-29 12:50:29 UTC
(In reply to comment #17)
> Hi Noriko,
> 
> 389-ds-base-1.2.9-0.2.a2.el5 has fixed the issue. The cache size no longer
> exceeds the max cache size.
> 
> Do you know when the 1.2.9 will move into testing?

It's already in Testing.  It should be in the updates-testing and epel-testing mirrors today or tomorrow.

We don't have an estimate yet of when 1.2.9 will be Stable.  We have some more bug fixes and testing yet to do.

> 
> Thank you for your assistance with this bug.
> 
> Regards,
> Daniel

Comment 19 Noriko Hosoi 2011-06-29 16:42:43 UTC
Daniel, thank you so much for testing 389-ds-base-1.2.9-0.2.a2.  We are glad that the memory leak is no longer observed on the new version.

Let me mark this bug as a dup of 697027 for the future reference.

*** This bug has been marked as a duplicate of bug 697027 ***