1913199 – IPA server (389ds) is very slow in execution of some searches (`&(memberOf=...)(objectClass=ipaHost)` in particular)

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1913199 - IPA server (389ds) is very slow in execution of some searches (`&(memberOf=...)(objectClass=ipaHost)` in particular)

Summary: IPA server (389ds) is very slow in execution of some searches (`&(memberOf=.....

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	8.4
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	rc
Target Release:	---
Assignee:	LDAP Maintainers
QA Contact:	RHDS QE
Docs Contact:
URL:
Whiteboard:	sync-to-jira
Depends On:
Blocks:	2010976 2021998 2066801
TreeView+	depends on / blocked

Reported:	2021-01-06 09:00 UTC by Akshay Sakure
Modified:	2022-06-21 06:53 UTC (History)
CC List:	23 users (show)
Fixed In Version:	389-ds-1.4-8060020211118202403.ce3e8c9c
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2010976 2021998 2066801 (view as bug list)
Environment:
Last Closed:	2022-05-10 13:43:00 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	toneata: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 4925	None	open	Performance ACI: targetfilter evaluation result can be reused	2021-09-17 08:40:41 UTC
Red Hat Issue Tracker	IDMDS-1561	None	None	None	2021-08-23 15:39:10 UTC
Red Hat Issue Tracker	IDMDS-1776	None	None	None	2021-11-10 14:47:00 UTC
Red Hat Product Errata	RHBA-2022:1815	None	None	None	2022-05-10 13:43:29 UTC

Description Akshay Sakure 2021-01-06 09:00:17 UTC

Description of problem:
Improve performance in sssd to avoid slowness in sudo on IPA client

Version-Release number of selected component (if applicable):
RHEL 7.9
sssd-1.16.5-10.el7_9.5.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Create sudo rules in IPA 
2. Create hostgroups in IPA with a large number of IPA clients (~500)
3. Run sudo on IPA client, it takes close to 10-15 seconds for execution.

Actual results:
Sudo taking longer time for execution on IPA clients.
Ansible Tower jobs are failing because sudo prompt takes a lot of time.
This mostly happens for the first time when information is not cached. 

Expected results:
Sudo should not take a longer time for execution on IPA clients.

Additional info:
This issue has already been discussed with IPA/SSSD developers, and this BZ is filed to keep track of this issue for future.

Comment 3 Alexey Tikhonov 2021-01-06 11:18:10 UTC

Quotes from the thread:

> Alexey Tikhonov:
  -- step:
(11:56:07:952043): [sdap_process_result] (0x2000): Trace: sh[0x561d901854e0], connected[1], ops[0x561d90179850], ldap[0x561d90184990]
(11:56:07:952047): [sdap_process_result] (0x2000): going to execute ldap_result()...
(11:56:07:986317): [sdap_process_result] (0x2000): ldap_result() was executed
(11:56:07:986334): [sdap_process_result] (0x2000): memset() was executed
...
(11:56:07:986465): [sdap_process_result] (0x2000): sdap_process_message() was executed, returning
  --  next step:
(11:56:07:986483): [sdap_process_result] (0x2000): Trace: sh[0x561d901854e0], connected[1], ops[0x561d90179850], ldap[0x561d90184990]
```
  --  most of the time is spent in `ldap_result()`  ~  almost all 34 msecs
This function is called with ` timeout == {0, 0}`. According to the man page this means polling.

Moreover, other invocations of `ldap_result()` in the same log are fast (within 0.1 ms or less). So it seems it depends on the content in the socket (or other internal openldap structures).

Frankly this seems like an issue in openldap lib. I think chances this could be addressed there are low, but... Simon, what would you think?

Probably another option could be to rework sssd code to use non-zero `all` parameter of `ldap_result()`...


> Sumit Bose:
If I understand the case correctly most of the time is spend in processing the results
in SSSD code. Given the number of results 34 msecs sounds reasonable to
me for decoding the LDAP reply and preparing the libldap structs holding
the results.


> Alexey Tikhonov:
This is 34 msec for every entry in 450 total. This results in 12..14 secs for processing BE_REQ_NETGROUP results.
That's why I say reading everything at once (`all` param of ldap_result()) might help (but not sure)


> Sumit Bose:
thanks for the explanation. I didn't realize that the ldap_result() call
was in a loop.

Yes, it might be more efficient to use LDAP_MSG_ALL at least for the
case where expect many results. But this requires an additional
ldap_first_message(), ldap_next_message() processing which we currently
can avoid because of LDAP_MSG_ONE. Maybe this can be integrated into
sdap_process_result() by adding a LDAPMessage member to struct
sdap_handle and iterate over the result return by ldap_next_message()
before calling ldap_result() again.

I guess it has to be tested if LDAP_MSG_ALL or LDAP_MSG_RECEIVED is
better suited for out use case. If I understand the latter correctly it
would return as many results as there are received at the time
ldap_result() is called. So I guess it might not block as long as
LDAP_MSG_ALL which should wait until everything is received. But it has
to be tested.

Comment 32 Florence Blanc-Renaud 2021-08-23 15:33:23 UTC

1. I checked the LDAP searches done when a sudo command is executed, and none is showing unindexed attributes (they would appear with notes=U or notes=A in the access log). This means we can eliminate unindexed searches as the root cause of the performance issue.

2. the LDAP searches are bound with fqdn=client.ipa.test,cn=computers,cn=accounts,dc=ipa,dc=test
Indeed we can see that the group membership takes significantly more time with this bind dn, compared to similar search bound with cn=directory manager:

(for a group with ~700 hosts):
Search authenticated as fqdn=host: RESULT err=0 tag=101 nentries=708 etime=0.184823245
----------------------------------
echo $PASSWORD | kinit -kt /etc/krb5.keytab; time ldapsearch -Y GSSAPI -b "cn=accounts,dc=ipa,dc=test" "(&(|(memberOf=cn=myhostgroup,cn=ng,cn=alt,dc=ipa,dc=test))(objectClass=ipaHost))" objectClass cn fqdn serverHostName memberOf ipaSshPubKey ipaUniqueID > /dev/null

real	0m0.251s
user	0m0.045s
sys	0m0.012s


Search authenticated as Dir Mgr: RESULT err=0 tag=101 nentries=708 etime=0.021605313
--------------------------------
time ldapsearch -D cn=directory\ manager -w Secret123 -b "cn=accounts,dc=ipa,dc=test" "(&(|(memberOf=cn=myhostgroup,cn=ng,cn=alt,dc=ipa,dc=test))(objectClass=ipaHost))" objectClass cn fqdn serverHostName memberOf ipaSshPubKey ipaUniqueID > /dev/null

real	0m0.073s
user	0m0.042s
sys	0m0.008s

The results are really far from the customer's observations and IMO it may be explained by the permissions added by the customer. Are the custom ACIs/permissions available somewhere in the case?

Comment 74 sgouvern 2021-12-15 11:05:12 UTC

With 389-ds-base-1.4.3.28-2.module+el8.6.0+13331+9eba6551.x86_64

Tested with the steps from https://bugzilla.redhat.com/show_bug.cgi?id=1913199#c54 :

With targetfilter cache enabled :

# time ldapsearch -LLL -h localhost -p 389 -D "uid=demo_user,ou=people,dc=example,dc=com" -w password -b "ou=aci,dc=example,dc=com"
...
real	0m0.436s
user	0m0.012s
sys	0m0.013s

In the access log, etime for the operation :0.395336762

[15/Dec/2021:05:51:34.156167709 -0500] conn=505 op=1 SRCH base="ou=aci,dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[15/Dec/2021:05:51:34.551444685 -0500] conn=505 op=1 RESULT err=0 tag=101 nentries=501 wtime=0.000062420 optime=0.395276718 etime=0.395336762 notes=U details="Partially Unindexed Filter"


With targetfilter cache disabled :

# time ldapsearch -LLL -h localhost -p 389 -D "uid=demo_user,ou=people,dc=example,dc=com" -w password -b "ou=aci,dc=example,dc=com"
...
real	0m3.922s
user	0m0.032s
sys	0m0.006s


In the access log, etime for the operation : 3.881943696

[15/Dec/2021:05:53:52.076649623 -0500] conn=508 op=1 SRCH base="ou=aci,dc=example,dc=com" scope=2 filter="(objectClass=*)" attrs=ALL
[15/Dec/2021:05:53:55.958511049 -0500] conn=508 op=1 RESULT err=0 tag=101 nentries=501 wtime=0.000079718 optime=3.881866800 etime=3.881943696 notes=U details="Partially Unindexed Filter"

The operation is much faster when targetfilter cache is enabled.
Marking as verified:tested / VERIFIED

Comment 79 errata-xmlrpc 2022-05-10 13:43:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:1815

Note You need to log in before you can comment on or make changes to this bug.

adumitru
atikhono
dminnich
dwysocha
grajaiya
jhrozek
ldap-maint
lslebodn
mreynolds
msauton
mzidek
pbrezina
pkulkarn
ppolawsk
rcritten
sbose
sgouvern
spichugi
tbordaz
tmihinto
toneata
tscherf
vvasilev