Bug 1913199
Summary: | IPA server (389ds) is very slow in execution of some searches (`&(memberOf=...)(objectClass=ipaHost)` in particular) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Akshay Sakure <asakure> | |
Component: | 389-ds-base | Assignee: | LDAP Maintainers <ldap-maint> | |
Status: | CLOSED ERRATA | QA Contact: | RHDS QE <ds-qe-bugs> | |
Severity: | urgent | Docs Contact: | ||
Priority: | high | |||
Version: | 8.4 | CC: | adumitru, atikhono, dminnich, dwysocha, grajaiya, jhrozek, ldap-maint, lslebodn, mreynolds, msauton, mzidek, pbrezina, pkulkarn, ppolawsk, rcritten, sbose, sgouvern, spichugi, tbordaz, tmihinto, toneata, tscherf, vvasilev | |
Target Milestone: | rc | Keywords: | Performance, ZStream | |
Target Release: | --- | Flags: | toneata:
needinfo+
|
|
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | sync-to-jira | |||
Fixed In Version: | 389-ds-1.4-8060020211118202403.ce3e8c9c | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2010976 2021998 2066801 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-10 13:43:00 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2010976, 2021998, 2066801 |
Description
Akshay Sakure
2021-01-06 09:00:17 UTC
Quotes from the thread: > Alexey Tikhonov: -- step: (11:56:07:952043): [sdap_process_result] (0x2000): Trace: sh[0x561d901854e0], connected[1], ops[0x561d90179850], ldap[0x561d90184990] (11:56:07:952047): [sdap_process_result] (0x2000): going to execute ldap_result()... (11:56:07:986317): [sdap_process_result] (0x2000): ldap_result() was executed (11:56:07:986334): [sdap_process_result] (0x2000): memset() was executed ... (11:56:07:986465): [sdap_process_result] (0x2000): sdap_process_message() was executed, returning -- next step: (11:56:07:986483): [sdap_process_result] (0x2000): Trace: sh[0x561d901854e0], connected[1], ops[0x561d90179850], ldap[0x561d90184990] ``` -- most of the time is spent in `ldap_result()` ~ almost all 34 msecs This function is called with ` timeout == {0, 0}`. According to the man page this means polling. Moreover, other invocations of `ldap_result()` in the same log are fast (within 0.1 ms or less). So it seems it depends on the content in the socket (or other internal openldap structures). Frankly this seems like an issue in openldap lib. I think chances this could be addressed there are low, but... Simon, what would you think? Probably another option could be to rework sssd code to use non-zero `all` parameter of `ldap_result()`... > Sumit Bose: If I understand the case correctly most of the time is spend in processing the results in SSSD code. Given the number of results 34 msecs sounds reasonable to me for decoding the LDAP reply and preparing the libldap structs holding the results. > Alexey Tikhonov: This is 34 msec for every entry in 450 total. This results in 12..14 secs for processing BE_REQ_NETGROUP results. That's why I say reading everything at once (`all` param of ldap_result()) might help (but not sure) > Sumit Bose: thanks for the explanation. I didn't realize that the ldap_result() call was in a loop. Yes, it might be more efficient to use LDAP_MSG_ALL at least for the case where expect many results. But this requires an additional ldap_first_message(), ldap_next_message() processing which we currently can avoid because of LDAP_MSG_ONE. Maybe this can be integrated into sdap_process_result() by adding a LDAPMessage member to struct sdap_handle and iterate over the result return by ldap_next_message() before calling ldap_result() again. I guess it has to be tested if LDAP_MSG_ALL or LDAP_MSG_RECEIVED is better suited for out use case. If I understand the latter correctly it would return as many results as there are received at the time ldap_result() is called. So I guess it might not block as long as LDAP_MSG_ALL which should wait until everything is received. But it has to be tested. 1. I checked the LDAP searches done when a sudo command is executed, and none is showing unindexed attributes (they would appear with notes=U or notes=A in the access log). This means we can eliminate unindexed searches as the root cause of the performance issue. 2. the LDAP searches are bound with fqdn=client.ipa.test,cn=computers,cn=accounts,dc=ipa,dc=test Indeed we can see that the group membership takes significantly more time with this bind dn, compared to similar search bound with cn=directory manager: (for a group with ~700 hosts): Search authenticated as fqdn=host: RESULT err=0 tag=101 nentries=708 etime=0.184823245 ---------------------------------- echo $PASSWORD | kinit -kt /etc/krb5.keytab; time ldapsearch -Y GSSAPI -b "cn=accounts,dc=ipa,dc=test" "(&(|(memberOf=cn=myhostgroup,cn=ng,cn=alt,dc=ipa,dc=test))(objectClass=ipaHost))" objectClass cn fqdn serverHostName memberOf ipaSshPubKey ipaUniqueID > /dev/null real 0m0.251s user 0m0.045s sys 0m0.012s Search authenticated as Dir Mgr: RESULT err=0 tag=101 nentries=708 etime=0.021605313 -------------------------------- time ldapsearch -D cn=directory\ manager -w Secret123 -b "cn=accounts,dc=ipa,dc=test" "(&(|(memberOf=cn=myhostgroup,cn=ng,cn=alt,dc=ipa,dc=test))(objectClass=ipaHost))" objectClass cn fqdn serverHostName memberOf ipaSshPubKey ipaUniqueID > /dev/null real 0m0.073s user 0m0.042s sys 0m0.008s The results are really far from the customer's observations and IMO it may be explained by the permissions added by the customer. Are the custom ACIs/permissions available somewhere in the case? With 389-ds-base-1.4.3.28-2.module+el8.6.0+13331+9eba6551.x86_64 Tested with the steps from https://bugzilla.redhat.com/show_bug.cgi?id=1913199#c54 : With targetfilter cache enabled : # time ldapsearch -LLL -h localhost -p 389 -D "uid=demo_user,ou=people,dc=example,dc=com" -w password -b "ou=aci,dc=example,dc=com" ... real 0m0.436s user 0m0.012s sys 0m0.013s In the access log, etime for the operation :0.395336762 [15/Dec/2021:05:51:34.156167709 -0500] conn=505 op=1 SRCH base="ou=aci,dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL [15/Dec/2021:05:51:34.551444685 -0500] conn=505 op=1 RESULT err=0 tag=101 nentries=501 wtime=0.000062420 optime=0.395276718 etime=0.395336762 notes=U details="Partially Unindexed Filter" With targetfilter cache disabled : # time ldapsearch -LLL -h localhost -p 389 -D "uid=demo_user,ou=people,dc=example,dc=com" -w password -b "ou=aci,dc=example,dc=com" ... real 0m3.922s user 0m0.032s sys 0m0.006s In the access log, etime for the operation : 3.881943696 [15/Dec/2021:05:53:52.076649623 -0500] conn=508 op=1 SRCH base="ou=aci,dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL [15/Dec/2021:05:53:55.958511049 -0500] conn=508 op=1 RESULT err=0 tag=101 nentries=501 wtime=0.000079718 optime=3.881866800 etime=3.881943696 notes=U details="Partially Unindexed Filter" The operation is much faster when targetfilter cache is enabled. Marking as verified:tested / VERIFIED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1815 |