Bug 1975930

Summary:	RFE - Add additional connection listening threads
Product:	Red Hat Enterprise Linux 9	Reporter:	mreynolds
Component:	389-ds-base	Assignee:	Jamie Chapman <jachapma>
Status:	CLOSED ERRATA	QA Contact:	LDAP QA Team <idm-ds-qe-bugs>
Severity:	unspecified	Docs Contact:	Mugdha Soni <musoni>
Priority:	high
Version:	9.1	CC:	bsmejkal, emartyny, idm-ds-dev-bugs, jachapma, musoni, pasik, progier, tbordaz, vashirov
Target Milestone:	rc	Keywords:	FutureFeature, Reopened, Triaged
Target Release:	9.3
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	sync-to-jira
Fixed In Version:	389-ds-base-2.3.4-1.el9	Doc Type:	Enhancement
Doc Text:	.New `nsslapd-numlisteners` configuration option is now available The `nsslapd-numlisteners` attribute specifies the number of listener threads Directory Server can use to monitor established connections. You can improve the response times when the server experiences a large number of client connection by increasing the attribute value.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-11-07 08:25:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description mreynolds 2021-06-24 18:43:36 UTC

Issue Description

Listener thread poll establishes connections. When the number of connections is high and/or there is high incoming traffic, this single thread becomes a bottleneck. The thread is eating a lot of cpu to go through the huge array of established connection but it is not running fast enough if there is a big incoming traffic and finally limits the capacity of the server.

The idea is to split the connection table into several portions. Each portion being poll by a dedicated listener thread.

This solution would also help to scale for 10K problem


Steps to Reproduce

Configure the server with a connection table with 5000 slots. Create 5000 clients and hammer the server. The overall throughput is lower than with 100 clients hammering the server (TBC)
One symptom would be high CPU on listener.

Expected results

High CPU should be on workers and throughput with large number of client should be higher than with smaller number of clients

Comment 6 Jamie Chapman 2022-12-08 13:25:28 UTC

@Thierry

You are mostly correct, at the moment we have the framework upstream, what is remaining is to add config support to allow a user define the number of listeners they require, I already have a patch for this. 

But I feel we need to do more performance testing with this feature before we let it loose on the world. And yes, the documentation will also need updating. So IMHO I would prefer wait till the next release.

Comment 8 RHEL Program Management 2022-12-24 07:27:49 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 12 Jamie Chapman 2023-05-16 14:06:28 UTC

Some note to help with verification of this RFE.

Overview
This RFE removes a bottleneck we identified when the server is managing a large number of concurrent connections, but we didnt see an improvement
in performance as we have now identified another bottleneck. Although we can improve the number of connections the server can handle, the work queue
is the next throughput limiting bottleneck. This makes verification of this RFE is rather tricky.

Here is what I used when testing this RFE, if you can think of an easier/better way then go with that. Also, the client is a WIP, so improve it if you like.

Test environment
Ideally we would use two physical machines to test this, the server running on one machine and the other machine functions as the client. But, if physical
machines are an issue we could run both the server and client on one machine.

Server
Install DS with the numlistener RFE included.

Client
You can find the ldapclient script here - https://github.com/jchapma/tools
Get script argument info - python ldapclient.py -h
usage: ldapclient.py [-h] [-u URL] [-p PROCS] [-c NCONNS] [-s NSRCH] [-n NCONNSPERTHREAD]

ds search load client, used to test multi listener. Default values are a 10k connection throughput test, just set your url. Expect a high cpu load when running this.

optional arguments:
-h, --help show this help message and exit
-u URL Instance URL. (Default: ldap://localhost:389)
-p PROCS Number of load processes to spawn. (Default: 50)
-c NCONNS Number of connections per process. (Default: 200)
-s NSRCH Number of searches per connection. (Default: 100)
-n NCONNSPERTHREAD Number of connections per process thread. (Default: 10)

The defaults are for 10K connections, you will most likely need to define the server URI as the default is localhost.
It takes a while to run, I think around 15min on a high powered machine.

Test methodology
Use the default value (1)
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 2
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 3
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 4
Run the client an record its reported throughout

I guess verify that there is no performance drop across the four test runs.

Test commands
Default value for numlisteners (1)
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=2
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=3
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=4
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

Client
python ldapclient.py -u ldap://10.19.41.17:389
The client will report its measured througuput rates.
I verified this using logconv - logconv.pl /var/log/dirsrv/slapd-inst01/access* | grep -E "Connections|Searches|Average"

Comment 14 RHEL Program Management 2023-07-02 07:27:57 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 18 Viktor Ashirov 2023-08-08 14:07:35 UTC

Build tested: 389-ds-base-2.3.4-3.el9.x86_64

I used a VM with 8 CPUs and 32 GB of RAM.

With each additional listener thread I see about 3% decrease of # of connections per second, as well as the # of searches per second:

# grep "LDAP Connections:" *log
1.log: - LDAP Connections:           10000         (124.24/sec)  (7454.50/min)
2.log: - LDAP Connections:           10000         (120.48/sec)  (7229.08/min)
3.log: - LDAP Connections:           10000         (115.95/sec)  (6956.90/min)
4.log: - LDAP Connections:           10000         (111.59/sec)  (6695.33/min)
5.log: - LDAP Connections:           10000         (110.61/sec)  (6636.54/min)
6.log: - LDAP Connections:           10000         (108.31/sec)  (6498.86/min)
7.log: - LDAP Connections:           10000         (109.14/sec)  (6548.36/min)
8.log: - LDAP Connections:           10000         (112.45/sec)  (6747.17/min)

# grep ^Searches: *log
1.log:Searches:                      1000000       (12424.17/sec)  (745450.49/min)
2.log:Searches:                      1000000       (12048.46/sec)  (722907.69/min)
3.log:Searches:                      1000000       (11594.84/sec)  (695690.40/min)
4.log:Searches:                      1000000       (11158.89/sec)  (669533.47/min)
5.log:Searches:                      1000000       (11060.90/sec)  (663654.16/min)
6.log:Searches:                      1000000       (10831.43/sec)  (649885.56/min)
7.log:Searches:                      1000000       (10913.93/sec)  (654835.70/min)
8.log:Searches:                      1000000       (11245.28/sec)  (674716.66/min)

Is this expected/acceptable? Is it a result of work queue contention?

Thanks.

Comment 21 Viktor Ashirov 2023-08-11 21:37:24 UTC

I tested with https://github.com/389ds/389-ds-base/blob/main/dirsrvtests/tests/perf/ltest.py to measure latency instead of throughput.

With 20000 connections:
1 listener thread: 9.22 ms
2 listener threads: 7.88 ms
3 listener threads: 6.33 ms

With 4 listener threads I hit another issue where test was not proceeding: https://bugzilla.redhat.com/show_bug.cgi?id=2231559, but with a smaller number of connections it worked and there was an improvement too.

Another minor issue is https://bugzilla.redhat.com/show_bug.cgi?id=2231269

Marking as Verified:Tested

Comment 24 bsmejkal 2023-08-14 07:42:46 UTC

As per comment #c21 marking as VERIFIED.

Comment 29 errata-xmlrpc 2023-11-07 08:25:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6350