Bug 1886492
| Summary: | Lookups in SSSD cache of AD accounts entries took long time if cache size is above 100MB | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | PALLAVI <palsoni> |
| Component: | sssd | Assignee: | Alexey Tikhonov <atikhono> |
| Status: | CLOSED WONTFIX | QA Contact: | sssd-qe |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | 8.3 | CC: | aboscatt, atikhono, grajaiya, jhrozek, lslebodn, mzidek, pbrezina, sbose, tscherf |
| Target Milestone: | rc | Keywords: | Triaged |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | sync-to-jira | ||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-05-31 16:48:58 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Comment 6
Alexey Tikhonov
2020-10-08 18:24:18 UTC
Hi, To you query below,
---------------
Created By: (10/9/2020 12:03 AM) Last Modified By: Alexey Tikhonov (10/9/2020 12:03 AM)
Hi,
> Whatever SSSD configuration was used the results after 500 user lookups are always same, taking more than minute
Does lookup time grow gradually (i.e. a little bit slower with every new user) or does it happen abruptly (i.e. it is fast until ~500 users and then suddenly slow)?
(I suspect it grows gradually.)
---------------
Cu has replied,
-----------------------------------------------------------
Yes bigger the cache it is slower the lookups are.
Size of cache grows with more distinct users are logged on and we need to clear cache from time to time. Systems with many customer contact require this cleanup every week or even ealier.
If we used 'ldap_purge_cache_timeout = 10800'. SSSD stopped to work properly after 3-6 hours as the purge cache operation caused sssd_be to be 100% CPU consumption and so lookup were very slow. And so slower then no purge cache was done.
Technically on idle systems this worked fine. I've run test on 500 users. And after 3 hours the cache was purged and lookups where fine. We had implemented on 300 servers and it looks promising. But last 20 highest utilized systems had almost not working SSSD authentication. So we've roll back this change and situation is stabilized but still not good.
-------------------------------------------------------
Thanks & Regards,
Pallavi Soni
Hi,
due to the correlation between cache-size and delay it might be a missing index. I would suggest to add
LDB_WARN_UNINDEXED=1
LDB_WARN_REINDEX=1
which will add log messages like:
... ldb FULL SEARCH: (|(objectClass=*)(distinguishedName=*)) SCOPE: sub DN: cn=config ...
or
... [sssd] [ldb] (0x0020): Reindexing /var/lib/sss/db/config.ldb due to modification on ...
respectively.
Some of the messages are expected but it would be nice to have some debug logs with those messages enabled if the system is slow to understand if adding an index might help to speed things up.
bye,
Sumit
Hi, as a work around please try to set lower value of `entry_cache_timeout` and `ldap_purge_cache_timeout`. The idea behind this work-around is that while you can't disable cache completely, you may try to setup its expiration and purging to prevent group cache growth (that results in poor performance). Specific value for those options should be tuned depending on machine payload. I would start with setting fairly low `entry_cache_timeout` value (depends on how often users do log in) and see if this already helps. If that's not enough than I would add `ldap_purge_cache_timeout` with value comparable to the value of `entry_cache_timeout` to actually remove expired entries from the cache. Please take a note this is not a proper fix (described in the comment 6) but merely a work-around. |