Bug 1889562 - client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is under sized
Summary: client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is un...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: 389-ds-base
Version: 8.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: mreynolds
QA Contact: RHDS QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-20 01:54 UTC by Marc Sauton
Modified: 2020-11-25 15:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Marc Sauton 2020-10-20 01:54:42 UTC
Description of problem:

client psearch with multiple threads hangs if nsslapd-maxthreadsperconn is under sized.

the situation does NOT happen on RHEL-5 with RHDS-8.2 , and seem related to the change:

https://github.com/389ds/389-ds-base/commit/bbe198d8406b64b06bc52e21f378eb0d96e140f8
https://github.com/389ds/389-ds-base/issues/719
"Add a warning message when a connection hits the max number of threads"

there was a decision at that time, to only have the statistic and status information, not to do anything to a connection, as it was only delayed, "harmless", along with a sizing configuration issue on the server side.

But this is different with the persistent search connections.
With this feature, the server is stuck, and hangs the client, so it becomes a problem.


Version-Release number of selected component (if applicable):
RHEL-8.2 with 389-ds-base-1.4.2.12-3.module+el8dsrv+6923+6ab1d5c5.x86_64
RHEL-7.8 with 389-ds-base-1.3.10.2-6.el7.x86_64


How reproducible:
on demand


Steps to Reproduce:

1. Have a LDAP server up and runing

2. tune nsslapd-maxthreadsperconn to a very small value, e.g. 2
dn: cn=config
changetype: modify
replace: nsslapd-maxthreadsperconn
nsslapd-maxthreadsperconn: 2


3. set some data

dn: ou=sf02712629,dc=example,dc=test
objectClass: organizationalUnit
objectClass: top
ou: people
ou: sf02712629

and

dn: ou=sf02712629,dc=example,dc=test
changetype: modify
replace: aci
aci: (targetattr="*")(version 3.0; acl "all access"; allow (all) userdn="ldap:///anyone";)


4. run a LDAP client to run persistent searches in a loop, set with more threads in the connection than the and nsslapd-maxthreadsperconn, and with a control for ADD, DEL

or run the test Java client provided:
./build.sh
time ./run.sh 2>&1 | tee ./out.txt



Actual results:

the Java client hangs quickly, the conn thread is in blocked state from cn=cmonitor


Expected results:
yes


Additional info:


ldapsearch -o ldif-wrap=no -xLLLD "cn=directory manager" -W -b cn=monitor -s base connection threads currentconnections currentconnectionsatmaxthreads maxthreadsperconnhits opsinitiated opscompleted dtablesize

I had connections hitting the max thread number per connection, and blocked, for example:

connection: 64:20201007184004Z:20806:20805:r:NULLDN:1:15:119191:150:ip=192.168.122.51

which means the conn id 150 is in the "max thread state", hit the max thread value 15 times, and had operation attempts blocked 119191 times.


then if I tune up nsslapd-maxthreadsperconn on the fly to 100 , the Java client "resumes" ( do not remember if server or client started to speak first), and progresses again, the corresponding connection is no longer in a "max thread state"



in a TCP dump, the last LDAP activity is from the LDAP client:

 47          1 192.168.122.51 -> 192.168.122.51 LDAP 212 searchRequest(10) "ou=sf02712629,dc=example,dc=test" singleLevel
Frame 47: 212 bytes on wire (1696 bits), 212 bytes captured (1696 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: 44126 (44126), Dst Port: ldap (389), Seq: 947, Ack: 523, Len: 144
Lightweight Directory Access Protocol
    LDAPMessage searchRequest(10) "ou=sf02712629,dc=example,dc=test" singleLevel
        messageID: 10
        protocolOp: searchRequest (3)
            searchRequest
                baseObject: ou=sf02712629,dc=example,dc=test
                scope: singleLevel (1)
                derefAliases: derefAlways (3)
                sizeLimit: 0
                timeLimit: 0
                typesOnly: False
                Filter: (objectclass=*)
                    filter: present (7)
                        present: objectclass
                attributes: 0 items
        controls: 2 items
            Control
                controlType: 2.16.840.1.113730.3.4.3 (joint-iso-itu-t.16.840.1.113730.3.4.3)
                criticality: True
                controlValue: 300902010f0101ff0101ff
            Control
                controlType: 2.16.840.1.113730.3.4.2 (joint-iso-itu-t.16.840.1.113730.3.4.2)

 48          1 192.168.122.51 -> 192.168.122.51 TCP 68 ldap > 44126 [ACK] Seq=523 Ack=1091 Win=46976 Len=0 TSval=3694135261 TSecr=3694135261
Frame 48: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: ldap (389), Dst Port: 44126 (44126), Seq: 523, Ack: 1091, Len: 0

 49          7 192.168.122.51 -> 192.168.122.51 TCP 68 44126 > ldap [FIN, ACK] Seq=1091 Ack=523 Win=43776 Len=0 TSval=3694140820 TSecr=3694135261
Frame 49: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: 44126 (44126), Dst Port: ldap (389), Seq: 1091, Ack: 523, Len: 0

 50          7 192.168.122.51 -> 192.168.122.51 TCP 68 ldap > 44126 [ACK] Seq=523 Ack=1092 Win=46976 Len=0 TSval=3694140860 TSecr=3694140820
Frame 50: 68 bytes on wire (544 bits), 68 bytes captured (544 bits)
Linux cooked capture
Internet Protocol Version 4, Src: 192.168.122.51 (192.168.122.51), Dst: 192.168.122.51 (192.168.122.51)
Transmission Control Protocol, Src Port: ldap (389), Dst Port: 44126 (44126), Seq: 523, Ack: 1092, Len: 0

(END)



less /var/log/dirsrv/slapd-m1/access
...
[13/Oct/2020:22:10:58.059364477 +0000] conn=274 fd=66 slot=66 connection from 192.168.122.51 to 192.168.122.51
[13/Oct/2020:22:10:58.062426524 +0000] conn=274 op=0 BIND dn="" method=128 version=3
[13/Oct/2020:22:10:58.062488982 +0000] conn=274 op=0 RESULT err=0 tag=97 nentries=0 etime=0.000132738 dn=""
[13/Oct/2020:22:10:58.066787227 +0000] conn=274 op=1 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.068677808 +0000] conn=274 op=1 RESULT err=0 tag=101 nentries=1 etime=0.002671829
[13/Oct/2020:22:10:58.082943767 +0000] conn=274 op=2 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.083571509 +0000] conn=274 op=2 RESULT err=0 tag=101 nentries=1 etime=0.001513465
[13/Oct/2020:22:10:58.087329407 +0000] conn=274 op=3 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.088454889 +0000] conn=274 op=3 RESULT err=0 tag=101 nentries=1 etime=0.002549361
[13/Oct/2020:22:10:58.089116514 +0000] conn=274 op=4 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
[13/Oct/2020:22:10:58.128375675 +0000] conn=274 op=5 SRCH base="ou=sf02712629,dc=example,dc=test" scope=0 filter="(objectClass=*)" attrs=ALL
[13/Oct/2020:22:10:58.128909872 +0000] conn=274 op=5 RESULT err=0 tag=101 nentries=1 etime=0.001004205
[13/Oct/2020:22:10:58.134667523 +0000] conn=274 op=6 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
[13/Oct/2020:22:10:58.136104758 +0000] conn=274 op=7 SRCH base="ou=sf02712629,dc=example,dc=test" scope=1 filter="(objectClass=*)" attrs=ALL options=persistent
(END)


Note You need to log in before you can comment on or make changes to this bug.