Bug 1458536

Summary: Performance issues with RHDS 10 - NDN cache investigation.
Product: Red Hat Enterprise Linux 7 Reporter: Ash Westbrook <awestbro>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED ERRATA QA Contact: Viktor Ashirov <vashirov>
Severity: urgent Docs Contact: Marc Muehlfeld <mmuehlfe>
Priority: urgent    
Version: 7.3CC: gparente, mreynolds, msauton, nkinder, pasik, rmeggins
Target Milestone: pre-dev-freezeKeywords: ZStream
Target Release: 7.5   
Hardware: All   
OS: Linux   
Fixed In Version: 389-ds-base- Doc Type: Enhancement
Doc Text:
Directory Server now uses separate normalized DN caches for each worker thread Previously, multiple worker threads used a single normalized Distinguished Name (DN) cache. Consequently, if multiple clients performed operations on Directory Server, performance decreased. With this update, Directory Server now creates separate normalized DN caches for each worker thread. As a result, performance no longer decreases in the mentioned scenario.
Story Points: ---
Clone Of:
: 1486128 (view as bug list) Environment:
Last Closed: 2018-04-10 14:16:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1420851, 1472344, 1477926, 1486128, 1490412    

Description Ash Westbrook 2017-06-04 03:26:45 UTC
Description of problem:

As we scale up on the number of JBOSS application servers connecting to the RHDS 10 directories we see performance begin to degrade on the uniquemember group look ups for what groups a user is a member of.  With only 1 or 2 JBOSS servers connected to the directory performance is good then when we add a 3rd or 4th directory performance quickly degrades on the lookups with a LDAP filter of (&(uniquemember=userDN)(objectclass=companygroup)).

The Directory is split up into two databases, one is the userRoot database which contains the root Suffix for the directory, the second is a subsuffix with the groupRoot database.  The group lookups with the filter listed above begin to slow down as the number of connections ramp up on the directory.  Etimes begin to climb into the 1 to 2 minute range and the CPU load rises.  The response times on the userRoot database continue to be good.

The cache hit ratio on both of the databases is 95% or above.  The file descriptors have been increased to 32K and the hard limit is 64K, also the limit on procs has been increased to 32K.

Logconv has been run and there are no unindexed queries that show up in the report and there are thousands of connections that are left and listed as available.  

The databases are on a seperate partition that is mounted onto SSD drive and the file system is ZFS.  We have been able to isolate this prob.em down to a query and connection concurrency  problem with the groupRoot db, we are looking for Red Hat support to provide additional recommendations for remedying this problem.

Comment 2 wibrown@redhat.com 2017-06-05 02:52:19 UTC
As far as what you can do to check, what's your nsslapd-threadnumber? Have you followed the performance tuning guide? Can you use HR etime to see what's going on there? What is mounted for /var/log? Can you disable COW on the userRoot/groupRoot dbs?

Comment 5 wibrown@redhat.com 2017-07-21 01:13:52 UTC
Upstream ticket:

Comment 11 Viktor Ashirov 2018-02-19 15:15:03 UTC
Build tested:

My testing server with 48Gb RAM was configured with the following settings:

(default settings)
nsslapd-idlistscanlimit: 4000
nsslapd-dbcachesize: 536870912
nsslapd-cachememsize: 4563402752

I increased ndn-cache-max-size:
nsslapd-ndn-cache-max-size: 2097152000

Directory contains 1 group with 10k members, unindexed component (description). 

I see 8-10x increase on average in search rate:
ldclt -D 'cn=Directory Manager' -w Secret123 -e esearch,random -r0 -R99999  -f "(&(description=*)(objectClass=groupOfUniqueNames)(uniqueMember=uid=uXXXXXX,ou=People,dc=example,dc=com))"

389-ds-base- (without the fix):
ldclt[40687]: Average rate:   20.40/thr  (  20.40/sec), total:    204

ldclt[39467]: Average rate:  192.90/thr  ( 192.90/sec), total:   1929

Marking as VERIFIED.

Comment 15 errata-xmlrpc 2018-04-10 14:16:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.