Bug 1693379
Summary: | sssd_be and sss_cache too heavy on CPU | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Dalibor Pospíšil <dapospis> |
Component: | sssd | Assignee: | Sumit Bose <sbose> |
Status: | CLOSED ERRATA | QA Contact: | Anuj Borah <aborah> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | aborah, aboscatt, abroy, asakure, atikhono, dlavu, grajaiya, honli, jhrozek, lslebodn, mzidek, pbrezina, pdwyer, pkettman, ppolawsk, sbose, sgoveas, striker, swachira, tmihinto, tscherf, xhejtman |
Target Milestone: | rc | Keywords: | Performance, Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | sync-to-jira | ||
Fixed In Version: | sssd-2.5.2-1.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-09 19:46:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Deadline: | 2021-07-19 |
Description
Dalibor Pospíšil
2019-03-27 17:03:12 UTC
after stopping sssd service, the operations take approximatelly 3x less time Hi Dalibor, Would you please provide content of sssd.conf used in those test cases? Test: `time for N in `seq 100`; do useradd tmpuser$N; done` (run on single core VM) 1) nsswitch: files sss; sss_cache -> /bin/true - real 0m8.915s 2) nsswitch: files sss (with original sss_cache) - real 0m13.817s 3) nsswitch: sss files; sssd.conf: enable_files_domain=false - real 0m14.420s 4) nsswitch: sss files (with files_domain) real 0m57.294s So, there are two components contributing: I) directly: 'sss_cache' being invoked from shadow-utils to invalidate sysdb cache II) indirectly: update of corresponding db file triggers 'files domain' provider to re-read db content thus eating CPU and greatly slowing everything (including useradd) Influence of those two factors might be different in different environments (depends on size of sysdb cache and size of db files). In regards of (I) 'sss_cache': 1) (Not directly related to this bug, but list it here not to forget:) Currently NSS responder doesn't clear negative cache in the handler of SIGHUP signal sent from 'sss_cache'. This renders shadows-utils patch[1] to be not fully functional. E.g. example given in the PR description is *NOT* fixed: "combination of commands like this: 'getent passwd user || useradd user; getent passwd user' can result in the second 'getent passwd' not finding the newly added user as the racy behaviour might still return the cached negative hit from the first getent passwd." 2) man page for 'sss_cache' lists -u and -g options to invalidate specific user/group. But (from a first glance) according to the source code those options are doing exactly the same as -U/-G ignoring user/group supplied. This is definitely a bug. Easy fix is to drop -u/-g from options list and man page. Good solution would be to implement it. But as Jakub pointed out it is tricky (e.g. if you delete a user, you must also refresh the groups so that getent group doesn't return the member). But correct implementation of those options would allow to use it from shadow-utils invocation thus greatly reducing overhead (and also avoiding unnecessary sysdb cache desctruction). 3) To the very least (from my point of view) we could use -d option to invalidate only 'files' domain. 4) 'sss_cache' not only sends SIGHUP: sss_cache -> sss_memcache_clear_all() -> sss_signal(SIGHUP) && wait_till_nss_responder_invalidate_cache (up to 1 sec!) Is it really necessary? Is race possible if we remove `wait_till_nss_responder_invalidate_cache()`? 5) sss_cache::invalidate_entry(): so far I do not understand why both `sysdb_set_user_attr(SYSDB_CACHE_EXPIRE)` and `sysdb_invalidate_user_cache_entry() -> sysdb_set_cache_entry_attr(SYSDB_CACHE_EXPIRE)` are required. From a quick glance they are (partially?) duplicating each other. [1] https://github.com/shadow-maint/shadow/pull/128 (In reply to Alexey Tikhonov from comment #4) > In regards of (I) 'sss_cache': > > 1) (Not directly related to this bug, but list it here not to forget:) > Currently NSS responder doesn't clear negative cache in the handler of > SIGHUP signal sent from 'sss_cache'. > This renders shadows-utils patch[1] to be not fully functional. E.g. example > given in the PR description is *NOT* fixed: > "combination of commands like this: 'getent passwd user || useradd user; > getent passwd user' can result in the second 'getent passwd' not finding the > newly added user as the racy behaviour might still return the cached > negative hit from the first getent passwd." Yes, I don't remember if I wanted to add dropping of the negcache to sssd since I mentioned the test case, but I think it would not hurt. > > 2) man page for 'sss_cache' lists -u and -g options to invalidate specific > user/group. But (from a first glance) according to the source code those > options are doing exactly the same as -U/-G ignoring user/group supplied. > This is definitely a bug. Easy fix is to drop -u/-g from options list and > man page. Are you sure? I just ran a quick test and -u invalidates only one user for me. > Good solution would be to implement it. But as Jakub pointed out it is > tricky (e.g. if you delete a user, you must also refresh the groups so that > getent group doesn't return the member). > But correct implementation of those options would allow to use it from > shadow-utils invocation thus greatly reducing overhead (and also avoiding > unnecessary sysdb cache desctruction). > > 3) To the very least (from my point of view) we could use -d option to > invalidate only 'files' domain. ...as long as you know what the domain name is. But it could be possible to add a new switch to only find and invalidate domains with a specific type. > > 4) 'sss_cache' not only sends SIGHUP: > sss_cache -> sss_memcache_clear_all() -> sss_signal(SIGHUP) && > wait_till_nss_responder_invalidate_cache (up to 1 sec!) > Is it really necessary? Is race possible if we remove > `wait_till_nss_responder_invalidate_cache()`? > Git blame says it is, but sadly I no longer remember details, git blame just points to https://pagure.io/SSSD/sssd/issue/2748 > 5) sss_cache::invalidate_entry(): so far I do not understand why both > `sysdb_set_user_attr(SYSDB_CACHE_EXPIRE)` and > `sysdb_invalidate_user_cache_entry() -> > sysdb_set_cache_entry_attr(SYSDB_CACHE_EXPIRE)` are required. From a quick > glance they are (partially?) duplicating each other. > There is a comment above sysdb_invalidate_user_cache_entry() that says that it is writing into the persistent db. This can also explain the slowdown partially. But I admit I also no longer remember here why does the function write to both the persistent database and the git blame records points to https://pagure.io/SSSD/sssd/issue/3164 which just seems like a lazy test to me :-) > > > > [1] https://github.com/shadow-maint/shadow/pull/128 In regards of (I) 'sss_cache' (again): "Normal" mode of operation: (1) shadow-util modifies local files -> (2) sssd_be[files] notices the change -> (3) sssd_be[files] notifies sssd_nss to stop serve "files" domain while it is being re-indexed. There is a chance that NSS request issued after (1) and before (2) will return stale data from (memory) cache. To mitigate this problem shadow-utils were patched (https://github.com/shadow-maint/shadow/pull/128) to call sss_cache before any update to force sssd to drop memcache and invalidate sysdb cache. If I understand things correctly, this solution has a number of issues: 1) it doesn't drop negative cache, but it should do so (https://pagure.io/SSSD/sssd/issue/4002) 2) solution is based on the idea that with memcache being dropped NSS request will be served long enough to allow sssd_be to "notice" a change and "prevent" return of stale data. But this is unreliable/unsafe approach. Race is still possible. 3) shadow-utils calling sss_cache invalidate cache for *all* users and groups (not only "files" domain) for no reason. This may lead to performance degradation if, for example, large sysdb cache of IPA/AD data will be invalidated while updating local user/group. This is especially unreasonable taking into an account that for "files" provider expiration time doesn't have any affect, so invalidating sysdb cache: - doesn't do anything useful for "files" provider - destroys cache of other domains - eats CPU 4) while invalidating sysdb cache, sss_cache not only updates expiration time in the fast "timestamp" db, but also does the same in "persistent" db. The reason of this is https://bugzilla.redhat.com/show_bug.cgi?id=1371538, but from my point of view this is *NOT* an enough justification for doing so. And this (update of persistent db) is exactly the reason why invocation of sss_cache consumes considerable time. Moreover, I do not know what is the reason to have duplication of "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of view this is design flaw. So, what I propose is: 1) to extend sss_cache to support new options: - to drop memcache only (not touching sysdb at all) - to notify sssd_nss to stop serve (switch off) "files" domain - this might be done using the same mechanism as used by sssd_be[files] 2) to patch shadow-utils to use those new options This already should resolve all the issues: - do not invalidate cache of other domains for no reason - do not perform CPU heavy operation - reliably resolves race condition But in addition I would consider doing one of the following: either - to get rid of duplicate attributes in "persistent" and "timestamp" DBs or - to stop update "persistent" DB in sss_cache at least (this step may have little value if steps 1&2 will be completed) Alternative solution to 1) could be modification and usage of existing sss_cache options --domain or -u/-g (instead of -U/-G). But from my point of view both require more work for less result. (In reply to Alexey Tikhonov from comment #6) > Moreover, I do not know what is the reason to have duplication of > "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of > view this is design flaw. From my point of view this - https://github.com/SSSD/sssd/pull/812#issuecomment-494811145 - is an example why having duplicate ts is not the best idea > But in addition I would consider doing one of the following: > either > - to get rid of duplicate attributes in "persistent" and "timestamp" DBs (In reply to Alexey Tikhonov from comment #7) > (In reply to Alexey Tikhonov from comment #6) > > Moreover, I do not know what is the reason to have duplication of > > "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of > > view this is design flaw. > > From my point of view this - > https://github.com/SSSD/sssd/pull/812#issuecomment-494811145 - is an example > why having duplicate ts is not the best idea > I was thinking about this and the only reason I could think of was that the code might not behave well if some of these attributes are missing. But this is probably not a very string points since the attributes are typically numerical and we would just end up with a 0 instead of e.g. NULL for string values which might cause a hard error. The reason why it takes so long to add multiple users is because inotify callback is called with each change of the file. So if we add 1000 users, /etc/passwd is changed 1000 times, inotify callback is called 1000 times and SSSD refreshes cache 1000 times. So this obviously takes some time. To solve this, we must reduce the number of cache refreshes in this scenario. For example: - SSSD must publish an interface that will mark files domain as inconsistent (this also needs to propagate to in-memory cache to avoid serving old data) - shadow-utils will mark files domain as inconsistent prior adding user/group to avoid race condition where /etc/passwd is changed by inotify callback has not yet been called - if files domain is inconsistent, sssd must resolve the user/group directly from files instead of looking it up in its cache to provide correct result - inotify callback will schedule a timed event to run cache refresh with some delay (say 15 seconds for example) so we can aggregate multiple inotify callbacks into one cache refresh (In reply to Pavel Březina from comment #9) > The reason why it takes so long to add multiple users is because inotify > callback is called with each change of the file. Right, this is one of the reasons. I wrote about this in the comment 3. *** Bug 1870301 has been marked as a duplicate of this bug. *** *** Bug 1870178 has been marked as a duplicate of this bug. *** *** Bug 1875981 has been marked as a duplicate of this bug. *** Another incarnation of this problem can be seen in bz 1875981 and in a number of other BZs: even without re-scan triggered, merely initial scan can timeout and sssd_be[files] be terminated by internal watchdog if /etc/passwd is large enough (think of 100+k entries). *** Bug 1913086 has been marked as a duplicate of this bug. *** (In reply to Pavel Březina from comment #9) > - inotify callback will schedule a timed event to run cache refresh with > some delay (say 15 seconds for example) so we can aggregate multiple inotify > callbacks into one cache refresh This might be a solution for part II from comment 3 , but imo it requires https://github.com/SSSD/sssd/issues/5091 to be fixed. https://github.com/SSSD/sssd/pull/5552 should address most of the issues described in this ticket. Pushed PR: https://github.com/SSSD/sssd/pull/5552 * `master` * b4ee698ac078e74df51197b5f92432b4ed712d99 - cache_req: do not return cached data if domain is inconsistent * 19b850636399fdf5f1018671ba5e2ff7c9deaa2f - files: queue certmap requests if a refresh is running * dd1aa57950294e0b821a0be2893a159c5e5488a6 - files: delay refresh and not run in parallel * 0fbd67404a4b48b76e8750f0cdc727ed0f8de424 - files: add new option fallback_to_nss * 5288ddaa283bb5e710a2864ff3866bf87f56d03f - files: split update into batches Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4435 |