Hide Forgot
Description of problem: As we scale up on the number of JBOSS application servers connecting to the RHDS 10 directories we see performance begin to degrade on the uniquemember group look ups for what groups a user is a member of. With only 1 or 2 JBOSS servers connected to the directory performance is good then when we add a 3rd or 4th directory performance quickly degrades on the lookups with a LDAP filter of (&(uniquemember=userDN)(objectclass=companygroup)). The Directory is split up into two databases, one is the userRoot database which contains the root Suffix for the directory, the second is a subsuffix with the groupRoot database. The group lookups with the filter listed above begin to slow down as the number of connections ramp up on the directory. Etimes begin to climb into the 1 to 2 minute range and the CPU load rises. The response times on the userRoot database continue to be good. The cache hit ratio on both of the databases is 95% or above. The file descriptors have been increased to 32K and the hard limit is 64K, also the limit on procs has been increased to 32K. Logconv has been run and there are no unindexed queries that show up in the report and there are thousands of connections that are left and listed as available. The databases are on a seperate partition that is mounted onto SSD drive and the file system is ZFS. We have been able to isolate this prob.em down to a query and connection concurrency problem with the groupRoot db, we are looking for Red Hat support to provide additional recommendations for remedying this problem.
As far as what you can do to check, what's your nsslapd-threadnumber? Have you followed the performance tuning guide? Can you use HR etime to see what's going on there? What is mounted for /var/log? Can you disable COW on the userRoot/groupRoot dbs?
Upstream ticket: https://pagure.io/389-ds-base/issue/49330
Build tested: 389-ds-base-1.3.7.5-18.el7.x86_64 My testing server with 48Gb RAM was configured with the following settings: (default settings) nsslapd-idlistscanlimit: 4000 nsslapd-dbcachesize: 536870912 nsslapd-cachememsize: 4563402752 I increased ndn-cache-max-size: nsslapd-ndn-cache-max-size: 2097152000 Directory contains 1 group with 10k members, unindexed component (description). I see 8-10x increase on average in search rate: ldclt -D 'cn=Directory Manager' -w Secret123 -e esearch,random -r0 -R99999 -f "(&(description=*)(objectClass=groupOfUniqueNames)(uniqueMember=uid=uXXXXXX,ou=People,dc=example,dc=com))" 389-ds-base-1.3.6.1-19.el7_4.x86_64 (without the fix): ldclt[40687]: Average rate: 20.40/thr ( 20.40/sec), total: 204 389-ds-base-1.3.7.5-18.el7.x86_64 ldclt[39467]: Average rate: 192.90/thr ( 192.90/sec), total: 1929 Marking as VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0811