1975930 – RFE - Add additional connection listening threads

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1975930 - RFE - Add additional connection listening threads

Summary: RFE - Add additional connection listening threads

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 9
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	9.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	9.3
Assignee:	Jamie Chapman
QA Contact:	LDAP QA Team
Docs Contact:	Mugdha Soni
URL:
Whiteboard:	sync-to-jira
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-24 18:43 UTC by mreynolds
Modified:	2023-11-07 09:09 UTC (History)
CC List:	9 users (show)
Fixed In Version:	389-ds-base-2.3.4-1.el9
Doc Type:	Enhancement
Doc Text:	.New `nsslapd-numlisteners` configuration option is now available The `nsslapd-numlisteners` attribute specifies the number of listener threads Directory Server can use to monitor established connections. You can improve the response times when the server experiences a large number of client connection by increasing the attribute value.
Clone Of:
Environment:
Last Closed:	2023-11-07 08:25:17 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 4812	None	open	Listener thread does not scale when dealing with a high number of established connections	2022-04-13 14:49:59 UTC
Red Hat Issue Tracker	IDMDS-2876	None	None	None	2023-03-22 16:40:38 UTC
Red Hat Issue Tracker	IDMDS-3429	None	None	None	2023-07-19 15:13:24 UTC
Red Hat Product Errata	RHBA-2023:6350	None	None	None	2023-11-07 08:25:47 UTC

Description mreynolds 2021-06-24 18:43:36 UTC

Issue Description

Listener thread poll establishes connections. When the number of connections is high and/or there is high incoming traffic, this single thread becomes a bottleneck. The thread is eating a lot of cpu to go through the huge array of established connection but it is not running fast enough if there is a big incoming traffic and finally limits the capacity of the server.

The idea is to split the connection table into several portions. Each portion being poll by a dedicated listener thread.

This solution would also help to scale for 10K problem


Steps to Reproduce

Configure the server with a connection table with 5000 slots. Create 5000 clients and hammer the server. The overall throughput is lower than with 100 clients hammering the server (TBC)
One symptom would be high CPU on listener.

Expected results

High CPU should be on workers and throughput with large number of client should be higher than with smaller number of clients

Comment 6 Jamie Chapman 2022-12-08 13:25:28 UTC

@Thierry

You are mostly correct, at the moment we have the framework upstream, what is remaining is to add config support to allow a user define the number of listeners they require, I already have a patch for this. 

But I feel we need to do more performance testing with this feature before we let it loose on the world. And yes, the documentation will also need updating. So IMHO I would prefer wait till the next release.

Comment 8 RHEL Program Management 2022-12-24 07:27:49 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 12 Jamie Chapman 2023-05-16 14:06:28 UTC

Some note to help with verification of this RFE.

Overview
This RFE removes a bottleneck we identified when the server is managing a large number of concurrent connections, but we didnt see an improvement
in performance as we have now identified another bottleneck. Although we can improve the number of connections the server can handle, the work queue
is the next throughput limiting bottleneck. This makes verification of this RFE is rather tricky.

Here is what I used when testing this RFE, if you can think of an easier/better way then go with that. Also, the client is a WIP, so improve it if you like.

Test environment
Ideally we would use two physical machines to test this, the server running on one machine and the other machine functions as the client. But, if physical
machines are an issue we could run both the server and client on one machine.

Server
Install DS with the numlistener RFE included.

Client
You can find the ldapclient script here - https://github.com/jchapma/tools
Get script argument info - python ldapclient.py -h
usage: ldapclient.py [-h] [-u URL] [-p PROCS] [-c NCONNS] [-s NSRCH] [-n NCONNSPERTHREAD]

ds search load client, used to test multi listener. Default values are a 10k connection throughput test, just set your url. Expect a high cpu load when running this.

optional arguments:
-h, --help show this help message and exit
-u URL Instance URL. (Default: ldap://localhost:389)
-p PROCS Number of load processes to spawn. (Default: 50)
-c NCONNS Number of connections per process. (Default: 200)
-s NSRCH Number of searches per connection. (Default: 100)
-n NCONNSPERTHREAD Number of connections per process thread. (Default: 10)

The defaults are for 10K connections, you will most likely need to define the server URI as the default is localhost.
It takes a while to run, I think around 15min on a high powered machine.

Test methodology
Use the default value (1)
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 2
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 3
Run the client an record its reported throughout

Config the server for nsslapd-numlisteners 4
Run the client an record its reported throughout

I guess verify that there is no performance drop across the four test runs.

Test commands
Default value for numlisteners (1)
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=2
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=3
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

# dsconf -D "cn=Directory Manager" ldap://server.example.com config replace nsslapd-numlisteners=4
# dsctl instance_name restart
# dsconf -D "cn=Directory Manager" ldap://server.example.com config get nsslapd-numlisteners

Client
python ldapclient.py -u ldap://10.19.41.17:389
The client will report its measured througuput rates.
I verified this using logconv - logconv.pl /var/log/dirsrv/slapd-inst01/access* | grep -E "Connections|Searches|Average"

Comment 14 RHEL Program Management 2023-07-02 07:27:57 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 18 Viktor Ashirov 2023-08-08 14:07:35 UTC

Build tested: 389-ds-base-2.3.4-3.el9.x86_64

I used a VM with 8 CPUs and 32 GB of RAM.

With each additional listener thread I see about 3% decrease of # of connections per second, as well as the # of searches per second:

# grep "LDAP Connections:" *log
1.log: - LDAP Connections:           10000         (124.24/sec)  (7454.50/min)
2.log: - LDAP Connections:           10000         (120.48/sec)  (7229.08/min)
3.log: - LDAP Connections:           10000         (115.95/sec)  (6956.90/min)
4.log: - LDAP Connections:           10000         (111.59/sec)  (6695.33/min)
5.log: - LDAP Connections:           10000         (110.61/sec)  (6636.54/min)
6.log: - LDAP Connections:           10000         (108.31/sec)  (6498.86/min)
7.log: - LDAP Connections:           10000         (109.14/sec)  (6548.36/min)
8.log: - LDAP Connections:           10000         (112.45/sec)  (6747.17/min)

# grep ^Searches: *log
1.log:Searches:                      1000000       (12424.17/sec)  (745450.49/min)
2.log:Searches:                      1000000       (12048.46/sec)  (722907.69/min)
3.log:Searches:                      1000000       (11594.84/sec)  (695690.40/min)
4.log:Searches:                      1000000       (11158.89/sec)  (669533.47/min)
5.log:Searches:                      1000000       (11060.90/sec)  (663654.16/min)
6.log:Searches:                      1000000       (10831.43/sec)  (649885.56/min)
7.log:Searches:                      1000000       (10913.93/sec)  (654835.70/min)
8.log:Searches:                      1000000       (11245.28/sec)  (674716.66/min)

Is this expected/acceptable? Is it a result of work queue contention?

Thanks.

Comment 21 Viktor Ashirov 2023-08-11 21:37:24 UTC

I tested with https://github.com/389ds/389-ds-base/blob/main/dirsrvtests/tests/perf/ltest.py to measure latency instead of throughput.

With 20000 connections:
1 listener thread: 9.22 ms
2 listener threads: 7.88 ms
3 listener threads: 6.33 ms

With 4 listener threads I hit another issue where test was not proceeding: https://bugzilla.redhat.com/show_bug.cgi?id=2231559, but with a smaller number of connections it worked and there was an improvement too.

Another minor issue is https://bugzilla.redhat.com/show_bug.cgi?id=2231269

Marking as Verified:Tested

Comment 24 bsmejkal 2023-08-14 07:42:46 UTC

As per comment #c21 marking as VERIFIED.

Comment 29 errata-xmlrpc 2023-11-07 08:25:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6350

Note You need to log in before you can comment on or make changes to this bug.