Bug 1974242

Summary: Paged search impacts performance
Product: Red Hat Enterprise Linux 9 Reporter: thierry bordaz <tbordaz>
Component: 389-ds-baseAssignee: Pierre Rogier <progier>
Status: VERIFIED --- QA Contact: LDAP QA Team <idm-ds-qe-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 9.1CC: bsmejkal, idm-ds-dev-bugs, jonmoore, mreynolds, mrhodes, pasik, progier, tbordaz, tmihinto, vashirov
Target Milestone: rcKeywords: Reopened, Triaged, ZStream
Target Release: 9.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sync-to-jira
Fixed In Version: 389-ds-base-2.3.4-3.el9 Doc Type: Bug Fix
Doc Text:
Cause: When sending paged results some lock contention occurs with the thread that polls for network events Consequence: The performances drops by a 4 to 5 factor when page search occurs. Another consequence is that if a network issue occurs while sending page search, the whole server may get unresponsive until nsslapd-iotimeout expires. Fix: The lock has been split in several ones to avoid the contention. Result: No more performance impact when page search are performed
Story Points: ---
Clone Of:
: 2224505 2224507 2231841 (view as bug list) Environment:
Last Closed: 2023-06-21 07:28:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2224505, 2224507, 2231841    

Description thierry bordaz 2021-06-21 07:38:40 UTC
Description of problem:
When a server is under search load, a paged search from a regular user impacts the performance.
Same paged search requested by DM has much smaller impact.

Version-Release number of selected component (if applicable):
since 7.x

How reproducible:
systematic

Steps to Reproduce:
    Create a db with 40000 users
    Run search load using ldclt -D "uid=test,ou=people,dc=example,dc=com" -w test -e bindeach,esearch -b "ou=people,dc=example,dc=com" -f "uid=00001"
    While the search load is running, run paged search in a loop that requests all ids:
    while : ; do ldapsearch -D "uid=test,ou=people,dc=example,dc=com" -w test -b dc=example,dc=com 'uid=*' -E pr=100/noprompt; done


Actual results:
ldclt[1353]: Average rate: 2415.70/thr  (2415.70/sec), total:  24157 -- only ldclt is running 
ldclt[1353]: Average rate: 2342.50/thr  (2342.50/sec), total:  23425
ldclt[1353]: Average rate: 1048.90/thr  (1048.90/sec), total:  10489 -\
ldclt[1353]: Average rate:  413.10/thr  ( 413.10/sec), total:   4131   | paged search from a regular user
ldclt[1353]: Average rate:  461.00/thr  ( 461.00/sec), total:   4610   |
ldclt[1353]: Average rate: 1759.30/thr  (1759.30/sec), total:  17593 -/
ldclt[1353]: Average rate: 2374.70/thr  (2374.70/sec), total:  23747
ldclt[1353]: Average rate: 1952.70/thr  (1952.70/sec), total:  19527 -\
ldclt[1353]: Average rate: 1783.00/thr  (1783.00/sec), total:  17830   | paged search from DM
ldclt[1353]: Average rate: 1749.70/thr  (1749.70/sec), total:  17497 -/
ldclt[1353]: Average rate: 2319.80/thr  (2319.80/sec), total:  23198
ldclt[1353]: Average rate: 2378.80/thr  (2378.80/sec), total:  23788

Expected results:
There shouldn't be a significant drop in performance.

Additional info:
@wisebaldone who reported the issue, also mentioned that 1.2.11.15-48 doesn't have the issue and 1.2.11.15-97 does have.

Comment 3 RHEL Program Management 2022-12-21 07:27:55 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 10 RHEL Program Management 2023-06-21 07:28:16 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 11 Pierre Rogier 2023-07-17 07:56:29 UTC
Reopening the bug as root cause is finally understood and a fix is in progress.

Comment 13 Pierre Rogier 2023-07-17 08:10:55 UTC
The issue is due to a very small lock contention impacting 0.3% of the server CPU time and 5% of the listening thread time but that was enough to decrease the performance by 60%.
Even worse we have seen case (after a network issue (tcp router restarted)) were the server was fully unresponsive until the nsslapd-ioblocktimeout expired.