RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1889562 - client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is under sized
Summary: client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is un...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: 389-ds-base
Version: 8.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: mreynolds
QA Contact: RHDS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-20 01:54 UTC by Marc Sauton
Modified: 2023-12-15 19:50 UTC (History)
6 users (show)

Fixed In Version: 389-ds-1.4-8050020210514191740-d5c171fc
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 18:10:45 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4203 0 None None None 2021-11-09 18:11:08 UTC

Description Marc Sauton 2020-10-20 01:54:42 UTC
Description of problem:

client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is under sized.

the situation does NOT happen on RHEL-5 with RHDS-8.2 , and seem related to the change:

https://github.com/389ds/389-ds-base/commit/bbe198d8406b64b06bc52e21f378eb0d96e140f8
https://github.com/389ds/389-ds-base/issues/719
"Add a warning message when a connection hits the max number of threads"

there was a decision at that time, to only have the statistic and status information, not to do anything to a connection, as it was only delayed, "harmless", along with a sizing configuration issue on the server side.

But this is different with the persistent search connections.
With this feature, the server is stuck, and hangs the client, so it becomes a problem.


Version-Release number of selected component (if applicable):
RHEL-8.2 with 389-ds-base-1.4.2.12-3.module+el8dsrv+6923+6ab1d5c5.x86_64
RHEL-7.8 with 389-ds-base-1.3.10.2-6.el7.x86_64


How reproducible:
on demand


Steps to Reproduce:

1. Have a LDAP server up and runing

2. tune nsslapd-maxthreadsperconn to a very small value, e.g. 2
dn: cn=config
changetype: modify
replace: nsslapd-maxthreadsperconn
nsslapd-maxthreadsperconn: 2


3. set some data

dn: ou=sf02712629,dc=example,dc=test
objectClass: organizationalUnit
objectClass: top
ou: people
ou: sf02712629

and

dn: ou=sf02712629,dc=example,dc=test
changetype: modify
replace: aci
aci: (targetattr="*")(version 3.0; acl "all access"; allow (all) userdn="ldap:///anyone";)


4. run a LDAP client to run persistent searches in a loop, set with more threads in the connection than the and nsslapd-maxthreadsperconn, and with a control for ADD, DEL

or run the test Java client provided:
./build.sh
time ./run.sh 2>&1 | tee ./out.txt



Actual results:

the Java client hangs quickly, the conn thread is in blocked state from cn=cmonitor


Expected results:
yes


Additional info:


ldapsearch -o ldif-wrap=no -xLLLD "cn=directory manager" -W -b cn=monitor -s base connection threads currentconnections currentconnectionsatmaxthreads maxthreadsperconnhits opsinitiated opscompleted dtablesize

I had connections hitting the max thread number per connection, and blocked, for example:

connection: 64:20201007184004Z:20806:20805:r:NULLDN:1:15:119191:150:ip=192.168.122.51

which means the conn id 150 is in the "max thread state", hit the max thread value 15 times, and had operation attempts blocked 119191 times.


then if I tune up nsslapd-maxthreadsperconn on the fly to 100 , the Java client "resumes" ( do not remember if server or client started to speak first), and progresses again, the corresponding connection is no longer in a "max thread state"



in a TCP dump, the last LDAP activity is from the LDAP client:

 47          1 192.168.122.51 -> 192.168.122.51 LDAP 212 searchRequest(10) "ou=sf02712629,dc=example,dc=test" singleLevel
Frame 47: 212 bytes on wire (1696 bits), 212 bytes captured (1696 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: 44126 (44126), Dst Port: ldap (389), Seq: 947, Ack: 523, Len: 144
Lightweight Directory Access Protocol
    LDAPMessage searchRequest(10) "ou=sf02712629,dc=example,dc=test" singleLevel
        messageID: 10
        protocolOp: searchRequest (3)
            searchRequest
                baseObject: ou=sf02712629,dc=example,dc=test
                scope: singleLevel (1)
                derefAliases: derefAlways (3)
                sizeLimit: 0
                timeLimit: 0
                typesOnly: False
                Filter: (objectclass=*)
                    filter: present (7)
                        present: objectclass
                attributes: 0 items
        controls: 2 items
            Control
                controlType: 2.16.840.1.113730.3.4.3 (joint-iso-itu-t.16.840.1.113730.3.4.3)
                criticality: True
                controlValue: 300902010f0101ff0101ff
            Control
                controlType: 2.16.840.1.113730.3.4.2 (joint-iso-itu-t.16.840.1.113730.3.4.2)

 48          1 192.168.122.51 -> 192.168.122.51 TCP 68 ldap > 44126 [ACK] Seq=523 Ack=1091 Win=46976 Len=0 TSval=3694135261 TSecr=3694135261
Frame 48: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: ldap (389), Dst Port: 44126 (44126), Seq: 523, Ack: 1091, Len: 0

 49          7 192.168.122.51 -> 192.168.122.51 TCP 68 44126 > ldap [FIN, ACK] Seq=1091 Ack=523 Win=43776 Len=0 TSval=3694140820 TSecr=3694135261
Frame 49: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: 44126 (44126), Dst Port: ldap (389), Seq: 1091, Ack: 523, Len: 0

 50          7 192.168.122.51 -> 192.168.122.51 TCP 68 ldap > 44126 [ACK] Seq=523 Ack=1092 Win=46976 Len=0 TSval=3694140860 TSecr=3694140820
Frame 50: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: ldap (389), Dst Port: 44126 (44126), Seq: 523, Ack: 1092, Len: 0

(END)



less /var/log/dirsrv/slapd-m1/access
...
[13/Oct/2020:22:10:58.059364477 +0000] conn=274 fd=66 slot=66 connection from 192.168.122.51 to 192.168.122.51
[13/Oct/2020:22:10:58.062426524 +0000] conn=274 op=0 BIND dn="" method=128 version=3
[13/Oct/2020:22:10:58.062488982 +0000] conn=274 op=0 RESULT err=0 tag=97 nentries=0 etime=0.000132738 dn=""
[13/Oct/2020:22:10:58.066787227 +0000] conn=274 op=1 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.068677808 +0000] conn=274 op=1 RESULT err=0 tag=101 nentries=1 etime=0.002671829
[13/Oct/2020:22:10:58.082943767 +0000] conn=274 op=2 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.083571509 +0000] conn=274 op=2 RESULT err=0 tag=101 nentries=1 etime=0.001513465
[13/Oct/2020:22:10:58.087329407 +0000] conn=274 op=3 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.088454889 +0000] conn=274 op=3 RESULT err=0 tag=101 nentries=1 etime=0.002549361
[13/Oct/2020:22:10:58.089116514 +0000] conn=274 op=4 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
[13/Oct/2020:22:10:58.128375675 +0000] conn=274 op=5 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.128909872 +0000] conn=274 op=5 RESULT err=0 tag=101 nentries=1 etime=0.001004205
[13/Oct/2020:22:10:58.134667523 +0000] conn=274 op=6 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
[13/Oct/2020:22:10:58.136104758 +0000] conn=274 op=7 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
(END)

Comment 13 mreynolds 2021-02-12 21:12:39 UTC
Added a new logging message to warn about psearches triggering "max threads per connection"

https://github.com/389ds/389-ds-base/pull/4617#pullrequestreview-589796480

Comment 14 mreynolds 2021-02-12 21:13:45 UTC
Upstream ticket:

https://github.com/389ds/389-ds-base/issues/4615

Comment 21 sgouvern 2021-05-27 14:47:37 UTC
As per comment 20, marking as VERIFIED

Comment 24 errata-xmlrpc 2021-11-09 18:10:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds-base bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4203


Note You need to log in before you can comment on or make changes to this bug.