Bug 497556
Summary: | LDAPI: activation of LDAPI UNIX socket causes serious performance issues in TCP/IP searches | ||||||
---|---|---|---|---|---|---|---|
Product: | [Retired] 389 | Reporter: | Andrey Ivanov <andrey.ivanov> | ||||
Component: | Directory Server | Assignee: | Nathan Kinder <nkinder> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Viktor Ashirov <vashirov> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 1.2.0 | CC: | jgalipea, nhosoi, nkinder, rmeggins | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2015-12-07 16:40:49 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 434914, 533025 | ||||||
Attachments: |
|
Description
Andrey Ivanov
2009-04-24 17:47:19 UTC
The system version is CentOS 5.3 x86-64 I am unable to reproduce this issue using the current code. Here are the steps I performed: - Turn on microsecond logging by setting nsslapd-accesslog-level to 131328. - Add 100 inetorgperson entries (user1 through user 100). - Run the following ldclt command: ldclt -V -D "cn=directory manager" -w Secret12 -h localhost.localdomain -n 1 -T 100 -e esearch -e random -r 1 -R 100 -b "dc=example,dc=com" -f "uid=userXX" - Enable LDAPI (nsslapd-ldapilisten: on) and restart DS - Run the same ldclt command again. In both cases (LDAPI on or off), the largest reported e-time I see is 0.001000. I even tried upping the number of operations to 1000 and the results were the same. Can you try this with the current code to see if you still have this issue? I don't know whether it is important but in my case it was a kerberos GSSAPI SASL connection. Anyway i'll try the latest code (next week if i have enough time) and tell you if the problem is corrected (there were a lot of changes in sasl code during the last 8 months :))) I've tried to reproduce the bug with the old code (april 2009) following your instructions and i didn't succeed. Everything was perfectly normal. I've found out that in order to see the performance slowdown you need to make at least one search with ldapi. So here's how i've proceeded (still with the old code): - Enable LDAPI (nsslapd-ldapilisten: on) and restart DS ldclt -D "cn=directory manager" -w Secret12 -h localhost.localdomain -n 1 -T 1000000 -e esearch -e random -r 1 -R 100 -b "dc=example,dc=com" -f "uid=userXX" ... ldclt[10002]: Average rate: 51706.00/thr (5170.60/sec), total: 51706 ldclt[10002]: Average rate: 51316.00/thr (5131.60/sec), total: 51316 ldclt[10002]: Average rate: 51334.00/thr (5133.40/sec), total: 51334 ldclt[10002]: T000: thread is dead. ldclt[10002]: Average rate: 25166.00/thr (2516.60/sec), total: 25166 ldclt[10002]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[10002]: All threads are dead - exit. ldclt[10002]: Global average rate: 1000000.00/thr (4761.90/sec), total: 1000000 ldclt[10002]: Global number times "no activity" reports: never ldclt[10002]: Global number of dead threads: 1 ldclt[10002]: Global no error occurs during this session. So far so good (etime is always < 0.001000). Now make AT LEAST ONE successful ldapi search (with autobind mapping activated and bound to directory manager): ldapsearch -Y EXTERNAL -H ldapi://%2fLocal%2fdirsrv%2fvar%2frun%2fslapd-dmz.socket -b 'dc=example,dc=com' uid=user01 And now, restart the same ldclt load : ldclt -V -D "cn=directory manager" -w Secret12 -h localhost.localdomain -n 1 -T 5000 -e esearch -e random -r 1 -R 100 -b "dc=example,dc=com" -f "uid=userXX" For me, the problem starts at the operation number 144 : [05/Dec/2009:23:47:41 +0100] conn=10 op=141 RESULT err=0 tag=101 nentries=1 etime=0.000000 [05/Dec/2009:23:47:41 +0100] conn=10 op=142 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(uid=pj)" attrs=ALL [05/Dec/2009:23:47:41 +0100] conn=10 op=142 RESULT err=0 tag=101 nentries=1 etime=0.000000 [05/Dec/2009:23:47:42 +0100] conn=10 op=143 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(uid=pj)" attrs=ALL [05/Dec/2009:23:47:42 +0100] conn=10 op=143 RESULT err=0 tag=101 nentries=1 etime=0.000000 [05/Dec/2009:23:47:42 +0100] conn=10 op=144 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(uid=pj)" attrs=ALL [05/Dec/2009:23:47:42 +0100] conn=10 op=144 RESULT err=0 tag=101 nentries=1 etime=0.041000 [05/Dec/2009:23:47:42 +0100] conn=10 op=145 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(uid=pj)" attrs=ALL [05/Dec/2009:23:47:42 +0100] conn=10 op=145 RESULT err=0 tag=101 nentries=1 etime=0.031000 [05/Dec/2009:23:47:42 +0100] conn=10 op=146 SRCH base="dc=id,dc=polytechnique,dc=edu" scope=2 filter="(uid=pj)" attrs=ALL [05/Dec/2009:23:47:42 +0100] conn=10 op=146 RESULT err=0 tag=101 nentries=1 etime=0.040000 And the ldclt performance results speak for themselves (the performance goes down by the factor of ~200) : ldclt[10036]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[10036]: Average rate: 250.00/thr ( 25.00/sec), total: 250 ldclt[10036]: Average rate: 248.00/thr ( 24.80/sec), total: 248 ldclt[10036]: T000: thread is dead. ldclt[10036]: Average rate: 23.00/thr ( 2.30/sec), total: 23 ldclt[10036]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[10036]: All threads are dead - exit. ldclt[10036]: Global average rate: 5000.00/thr ( 22.73/sec), total: 5000 ldclt[10036]: Global number times "no activity" reports: never ldclt[10036]: Global number of dead threads: 1 ldclt[10036]: Global no error occurs during this session. I think you can reproduce the same behavior with the latest code, but as i said i'll try to test the latest code next week anyway. I have downloaded and tested the latest rc1 (CentOS 5.4, x86_64) : [root@ldap-model tmp]# rpm -qa |grep 389 389-admin-1.1.9-1.el5 389-ds-base-1.2.5-0.2.rc1.el5 389-adminutil-1.1.8-3.el5 setup-ds-admin.pl with "typical" settings and the suffix "dc=example,dc=com". I wanted to generate a sample database with dbgen.pl but the server crashes when i try to import it by ldif2db (i've sent a detailed description of this crash to 389-users mailing list). So finally i just installed the 389 with the default config and DIT (groups like cn=Accounting Managers etc). 1. Activate "nsslapd-ldapilisten: on" and /etc/init.d/dirsrv restart 2. Start ldclt [root@ldap-model tmp]# ldclt -D "cn=Directory Manager" -w secret12 -h localhost.localdomain -n 1 -T 500000 -e esearch -e random -r 1 -R 100 -b "dc=example,dc=com" -f "cn=PD Managers" ldclt version 4.23 ldclt[5831]: Starting at Sun Dec 6 16:51:45 2009 ldclt[5831]: Average rate: 62207.00/thr (6220.70/sec), total: 62207 ldclt[5831]: Average rate: 67382.00/thr (6738.20/sec), total: 67382 ldclt[5831]: Average rate: 66988.00/thr (6698.80/sec), total: 66988 ldclt[5831]: Average rate: 67514.00/thr (6751.40/sec), total: 67514 ldclt[5831]: Average rate: 67304.00/thr (6730.40/sec), total: 67304 ldclt[5831]: Average rate: 67031.00/thr (6703.10/sec), total: 67031 ldclt[5831]: Average rate: 67854.00/thr (6785.40/sec), total: 67854 ldclt[5831]: T000: thread is dead. ldclt[5831]: Average rate: 33720.00/thr (3372.00/sec), total: 33720 ldclt[5831]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[5831]: All threads are dead - exit. ldclt[5831]: Global average rate: 500000.00/thr (5555.56/sec), total: 500000 ldclt[5831]: Global number times "no activity" reports: never ldclt[5831]: Global number of dead threads: 1 ldclt[5831]: Global no error occurs during this session. ldclt[5831]: Ending at Sun Dec 6 16:53:15 2009 ldclt[5831]: Exit status 0 - No problem during execution. 3. A single search over ldapi : ldapsearch -Y EXTERNAL -H ldapi://%2fvar%2frun%2fslapd-dmz.socket -b 'dc=example,dc=com' cn=PD* 4. Start ldclt and compare the performance : [root@ldap-model tmp]# ldclt -D "cn=Directory Manager" -w secret12 -h localhost.localdomain -n 1 -T 5000 -e esearch -e random -r 1 -R 100 -b "dc=example,dc=com" -f "cn=PD Managers" ldclt version 4.23 ldclt[5852]: Starting at Sun Dec 6 16:55:59 2009 ldclt[5852]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[5852]: Average rate: 250.00/thr ( 25.00/sec), total: 250 ldclt[5852]: Average rate: 247.00/thr ( 24.70/sec), total: 247 ldclt[5852]: Average rate: 250.00/thr ( 25.00/sec), total: 250 ... ldclt[5852]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[5852]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[5852]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[5852]: Average rate: 249.00/thr ( 24.90/sec), total: 249 ldclt[5852]: T000: thread is dead. ldclt[5852]: Average rate: 21.00/thr ( 2.10/sec), total: 21 ldclt[5852]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[5852]: All threads are dead - exit. ldclt[5852]: Global average rate: 5000.00/thr ( 22.73/sec), total: 5000 ldclt[5852]: Global number times "no activity" reports: never ldclt[5852]: Global number of dead threads: 1 ldclt[5852]: Global no error occurs during this session. ldclt[5852]: Ending at Sun Dec 6 16:59:39 2009 ldclt[5852]: Exit status 0 - No problem during execution. Thanks for the detail Andrey. I am able to reproduce this issue now. It turns out that the cause for this performance degradation is pretty simple. If a slot from the connection table had previously been used for an LDAPI connection, it would cause serious performance degradation (20 times worse on my system) to a TCP connection that reused the same slot. The problem was that we were not clearing the flag that indicates that a connection is a UNIX local socket in the connection cleanup function. Clearing this flag makes the reused slots perform as expected. Created attachment 377332 [details]
Patch
Patch from comment #7 pushed to master. Thanks to Rich for his review! Counting objects: 11, done. Delta compression using 2 threads. Compressing objects: 100% (6/6), done. Writing objects: 100% (6/6), 805 bytes, done. Total 6 (delta 4), reused 0 (delta 0) To ssh://git.fedorahosted.org/git/389/ds.git c3120fc..99ac483 master -> master verified - RHEL 4 version: redhat-ds-base-8.2.0-2010052104.el4dsrv Before LDAPI search: ldclt[30193]: Average rate: 5000.00/thr ( 500.00/sec), total: 5000 ldclt[30193]: T000: nbOpers = 0 ldclt[30193]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[30193]: All threads are dead - exit. ldclt[30193]: Global average rate: 5000.00/thr (250.00/sec), total: 5000 ldclt[30193]: Global number times "no activity" reports: never ldclt[30193]: Global number of dead threads: 1 ldclt[30193]: Global no error occurs during this session. ldclt[30193]: Ending at Fri May 21 09:02:21 2010 ldclt[30193]: Exit status 0 - No problem during execution. LDAPI search: ldapsearch -x -H ldapi://%2fvar%2frun%2fslapd-jgalipea-rhel4.socket -b "dc=example,dc=com" After LDAPI search: ldclt[30199]: Average rate: 5000.00/thr ( 500.00/sec), total: 5000 ldclt[30199]: T000: nbOpers = 0 ldclt[30199]: Average rate: 0.00/thr ( 0.00/sec), total: 0 ldclt[30199]: All threads are dead - exit. ldclt[30199]: Global average rate: 5000.00/thr (250.00/sec), total: 5000 ldclt[30199]: Global number times "no activity" reports: never ldclt[30199]: Global number of dead threads: 1 ldclt[30199]: Global no error occurs during this session. ldclt[30199]: Ending at Fri May 21 09:05:22 2010 ldclt[30199]: Exit status 0 - No problem during execution. |