Bug 1975930
Summary: | RFE - Add additional connection listening threads | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | mreynolds |
Component: | 389-ds-base | Assignee: | Jamie Chapman <jachapma> |
Status: | CLOSED ERRATA | QA Contact: | LDAP QA Team <idm-ds-qe-bugs> |
Severity: | unspecified | Docs Contact: | Mugdha Soni <musoni> |
Priority: | high | ||
Version: | 9.1 | CC: | bsmejkal, emartyny, idm-ds-dev-bugs, jachapma, musoni, pasik, progier, tbordaz, vashirov |
Target Milestone: | rc | Keywords: | FutureFeature, Reopened, Triaged |
Target Release: | 9.3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | sync-to-jira | ||
Fixed In Version: | 389-ds-base-2.3.4-1.el9 | Doc Type: | Enhancement |
Doc Text: |
.New `nsslapd-numlisteners` configuration option is now available
The `nsslapd-numlisteners` attribute specifies the number of listener threads Directory Server can use to monitor established connections. You can improve the response times when the server experiences a large number of client connection by increasing the attribute value.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2023-11-07 08:25:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
mreynolds
2021-06-24 18:43:36 UTC
@Thierry You are mostly correct, at the moment we have the framework upstream, what is remaining is to add config support to allow a user define the number of listeners they require, I already have a patch for this. But I feel we need to do more performance testing with this feature before we let it loose on the world. And yes, the documentation will also need updating. So IMHO I would prefer wait till the next release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Some note to help with verification of this RFE. Overview This RFE removes a bottleneck we identified when the server is managing a large number of concurrent connections, but we didnt see an improvement in performance as we have now identified another bottleneck. Although we can improve the number of connections the server can handle, the work queue is the next throughput limiting bottleneck. This makes verification of this RFE is rather tricky. Here is what I used when testing this RFE, if you can think of an easier/better way then go with that. Also, the client is a WIP, so improve it if you like. Test environment Ideally we would use two physical machines to test this, the server running on one machine and the other machine functions as the client. But, if physical machines are an issue we could run both the server and client on one machine. Server Install DS with the numlistener RFE included. Client You can find the ldapclient script here - https://github.com/jchapma/tools Get script argument info - python ldapclient.py -h usage: ldapclient.py [-h] [-u URL] [-p PROCS] [-c NCONNS] [-s NSRCH] [-n NCONNSPERTHREAD] ds search load client, used to test multi listener. Default values are a 10k connection throughput test, just set your url. Expect a high cpu load when running this. optional arguments: -h, --help show this help message and exit -u URL Instance URL. (Default: ldap://localhost:389) -p PROCS Number of load processes to spawn. (Default: 50) -c NCONNS Number of connections per process. (Default: 200) -s NSRCH Number of searches per connection. (Default: 100) -n NCONNSPERTHREAD Number of connections per process thread. (Default: 10) The defaults are for 10K connections, you will most likely need to define the server URI as the default is localhost. It takes a while to run, I think around 15min on a high powered machine. Test methodology Use the default value (1) Run the client an record its reported throughout Config the server for nsslapd-numlisteners 2 Run the client an record its reported throughout Config the server for nsslapd-numlisteners 3 Run the client an record its reported throughout Config the server for nsslapd-numlisteners 4 Run the client an record its reported throughout I guess verify that there is no performance drop across the four test runs. Test commands Default value for numlisteners (1) # dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners # dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=2 # dsctl instance_name restart # dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners # dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=3 # dsctl instance_name restart # dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners # dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=4 # dsctl instance_name restart # dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners Client python ldapclient.py -u ldap://10.19.41.17:389 The client will report its measured througuput rates. I verified this using logconv - logconv.pl /var/log/dirsrv/slapd-inst01/access* | grep -E "Connections|Searches|Average" After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Build tested: 389-ds-base-2.3.4-3.el9.x86_64 I used a VM with 8 CPUs and 32 GB of RAM. With each additional listener thread I see about 3% decrease of # of connections per second, as well as the # of searches per second: # grep "LDAP Connections:" *log 1.log: - LDAP Connections: 10000 (124.24/sec) (7454.50/min) 2.log: - LDAP Connections: 10000 (120.48/sec) (7229.08/min) 3.log: - LDAP Connections: 10000 (115.95/sec) (6956.90/min) 4.log: - LDAP Connections: 10000 (111.59/sec) (6695.33/min) 5.log: - LDAP Connections: 10000 (110.61/sec) (6636.54/min) 6.log: - LDAP Connections: 10000 (108.31/sec) (6498.86/min) 7.log: - LDAP Connections: 10000 (109.14/sec) (6548.36/min) 8.log: - LDAP Connections: 10000 (112.45/sec) (6747.17/min) # grep ^Searches: *log 1.log:Searches: 1000000 (12424.17/sec) (745450.49/min) 2.log:Searches: 1000000 (12048.46/sec) (722907.69/min) 3.log:Searches: 1000000 (11594.84/sec) (695690.40/min) 4.log:Searches: 1000000 (11158.89/sec) (669533.47/min) 5.log:Searches: 1000000 (11060.90/sec) (663654.16/min) 6.log:Searches: 1000000 (10831.43/sec) (649885.56/min) 7.log:Searches: 1000000 (10913.93/sec) (654835.70/min) 8.log:Searches: 1000000 (11245.28/sec) (674716.66/min) Is this expected/acceptable? Is it a result of work queue contention? Thanks. I tested with https://github.com/389ds/389-ds-base/blob/main/dirsrvtests/tests/perf/ltest.py to measure latency instead of throughput. With 20000 connections: 1 listener thread: 9.22 ms 2 listener threads: 7.88 ms 3 listener threads: 6.33 ms With 4 listener threads I hit another issue where test was not proceeding: https://bugzilla.redhat.com/show_bug.cgi?id=2231559, but with a smaller number of connections it worked and there was an improvement too. Another minor issue is https://bugzilla.redhat.com/show_bug.cgi?id=2231269 Marking as Verified:Tested As per comment #c21 marking as VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:6350 |