Bug 2123525

Summary: perf - slow memberof fixup task for large static groups, high CPU use
Product: Red Hat Directory Server Reporter: Marc Sauton <msauton>
Component: 389-ds-baseAssignee: thierry bordaz <tbordaz>
Status: CLOSED ERRATA QA Contact: LDAP QA Team <idm-ds-qe-bugs>
Severity: urgent Docs Contact: Zuzana Zoubkova <zzoubkov>
Priority: high    
Version: 11.5CC: bsmejkal, emartyny, idm-ds-dev-bugs, jachapma, ldelouw, mreynolds, mrhodes, pasik, snagothu, striker, tbordaz, tmihinto, vvanhaft
Target Milestone: DS11.7Keywords: Triaged
Target Release: dirsrv-11.7   
Hardware: All   
OS: Linux   
Whiteboard: sync-to-jira
Fixed In Version: redhat-ds-11-8080020221130182235.022a399e Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-23 09:27:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 thierry bordaz 2022-09-02 10:33:32 UTC
The slow performance is related to the fixup thread having a too small NDN cache (evicting entries). The reason is that the NDN cache size is set at server level shared by workers threads. So if the NDN cache size is 2Gb, each worker can use a maximum of 2Gb/32(workers)=64Mb that is too small. (dsconf localhost config replace nsslapd-ndn-cache-max-size=2147483647)

An option would be to remove the limit of 2Gb for the global NDN cache. It is easy to do.

Another option is to make the NDN cache global. The reason it was made per thread was to reduce contention. However I think we can eliminate this contention even with a global cache. The main benefit of a global NDN is that normalization done by a given thread can be reused by the others. I think it is longer term improvement.

A possible workaround is to reduce the number of workers, to get larger NDN cache for the fixup task.

Comment 7 Serisha Nagothu 2022-09-07 18:41:18 UTC
Hi, 

Red Hat Directory Server performance is very critical to the customer. The customer is experiencing performance issues while migrating the data from Oracle LDAP to RHDS. We recommended several tuning parameters, which showed some improvement, but it still takes 3 to 4 days to complete the process. Still, the process is running single-threaded; even though the customer initiated eight threads, they are getting processed sequentially instead of multi-threaded. 

The renewal of the Oracle LDAP subscription is coming up in 10 days. The customer wants to remove Oracle LDAP altogether and migrate to RHDS. If the performance is not improved, they are forced to stay with Oracle LDAP because this delay impacts this manufacturing line. We are requesting help because this will affect the customer and Red Hat's business with the customer.

Thanks
Serisha

Comment 9 thierry bordaz 2022-09-08 14:51:36 UTC
https://github.com/389ds/389-ds-base/issues/5440

Comment 40 Jamie Chapman 2023-02-16 21:05:50 UTC
Verified with the following parameters

Red Hat Enterprise Linux release 8.6 (Ootpa)
RHDS 11.7 
389-ds-base-1.4.3.32-1.module+el8dsrv+17400+a7f2694e

Steps
Follow the steps detailed in Marc/Thierry reproducer notes attached in comment 1

Result
389-ds-base-1.4.3.28-8.module+el8.6.0+16880+945f9b53.x86_64 - memberof fixup takes ~60 min
389-ds-base-1.4.3.32-1.module+el8dsrv+17400+a7f2694e.x86_64 - memberof fixup takes ~4 min

Setting to verified

Comment 41 mreynolds 2023-05-03 15:28:10 UTC
*** Bug 2141177 has been marked as a duplicate of this bug. ***

Comment 43 errata-xmlrpc 2023-05-23 09:27:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (redhat-ds:11 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3267