Bug 1693379

Summary:	sssd_be and sss_cache too heavy on CPU
Product:	Red Hat Enterprise Linux 8	Reporter:	Dalibor Pospíšil <dapospis>
Component:	sssd	Assignee:	Sumit Bose <sbose>
Status:	CLOSED ERRATA	QA Contact:	Anuj Borah <aborah>
Severity:	high	Docs Contact:
Priority:	high
Version:	8.0	CC:	aborah, aboscatt, abroy, asakure, atikhono, dlavu, grajaiya, honli, jhrozek, lslebodn, mzidek, pbrezina, pdwyer, pkettman, ppolawsk, sbose, sgoveas, striker, swachira, tmihinto, tscherf, xhejtman
Target Milestone:	rc	Keywords:	Performance, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	sync-to-jira
Fixed In Version:	sssd-2.5.2-1.el8	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-09 19:46:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Deadline:	2021-07-19

Description Dalibor Pospíšil 2019-03-27 17:03:12 UTC

Description of problem:
While adding users, processes sssd_be and sss_cached eat a lot of CPU. Adding a bigger number of users take literally ages.

Version-Release number of selected component (if applicable):
sssd-2.0.0-43.el8

How reproducible:
100%

Steps to Reproduce:
[0 root@sopos-rhel8-brq ~]# time for N in `seq 10`; do useradd user$N; done
real    0m20.176s
user    0m2.824s
sys     0m2.419s
[0 root@sopos-rhel8-brq ~]# time for N in `seq 10`; do userdel -rf user$N; done
real    0m18.836s
user    0m2.715s
sys     0m2.383s
[0 root@sopos-rhel8-brq ~]# time for N in `seq 1000`; do echo "user$N::::::"; done | newusers                                                                                                                                                 

real    1m7.951s
user    0m21.023s
sys     0m10.072s

[0 root@sopos-rhel8-brq ~]# rpm -qa | grep sssd
sssd-kcm-2.0.0-43.el8.x86_64
sssd-nfs-idmap-2.0.0-43.el8.x86_64
sssd-client-2.0.0-43.el8.x86_64
python3-sssdconfig-2.0.0-43.el8.noarch
sssd-common-2.0.0-43.el8.x86_64
[0 root@sopos-rhel8-brq ~]# grep -v -e '^#' -e '^$' /etc/nsswitch.conf
passwd:      sss files systemd
shadow:     files sss
group:       sss files systemd
hosts:      files dns myhostname
bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss
netgroup:   nisplus sss
publickey:  nisplus
automount:  files nisplus
aliases:    files nisplus


Expected results:
the sss service should not have that big resources footprint

rhel7:
[0 root@sopos-rhel7-brq ~]# time for N in `seq 10`; do useradd user$N; done
real	0m1.375s
user	0m0.622s
sys	0m0.590s
[0 root@sopos-rhel7-brq ~]# time for N in `seq 10`; do userdel -rf user$N; done
real	0m1.740s
user	0m0.731s
sys	0m0.880s
[0 root@sopos-rhel7-brq ~]# time for N in `seq 1000`; do echo "user$N::::::"; done | newusers

real	0m8.744s
user	0m7.502s
sys	0m1.208s


Additional info:
both VMs run od the same physical HV so the results are comparable

Comment 1 Dalibor Pospíšil 2019-03-28 09:43:47 UTC

after stopping sssd service, the operations take approximatelly 3x less time

Comment 2 Alexey Tikhonov 2019-04-26 13:49:50 UTC

Hi Dalibor,

Would you please provide content of sssd.conf used in those test cases?

Comment 3 Alexey Tikhonov 2019-05-06 16:19:15 UTC

Test: `time for N in `seq 100`; do useradd tmpuser$N; done` (run on single core VM)

1) nsswitch: files sss; sss_cache -> /bin/true
 - real    0m8.915s

2) nsswitch: files sss (with original sss_cache)
 - real    0m13.817s

3) nsswitch: sss files; sssd.conf: enable_files_domain=false
 - real    0m14.420s

4) nsswitch: sss files (with files_domain)
real    0m57.294s


So, there are two components contributing:
I) directly: 'sss_cache' being invoked from shadow-utils to invalidate sysdb cache
II) indirectly: update of corresponding db file triggers 'files domain' provider to re-read db content thus eating CPU and greatly slowing everything (including useradd)

Influence of those two factors might be different in different environments (depends on size of sysdb cache and size of db files).

Comment 4 Alexey Tikhonov 2019-05-06 17:10:28 UTC

In regards of (I) 'sss_cache':

1) (Not directly related to this bug, but list it here not to forget:)
Currently NSS responder doesn't clear negative cache in the handler of SIGHUP signal sent from 'sss_cache'.
This renders shadows-utils patch[1] to be not fully functional. E.g. example given in the PR description is *NOT* fixed:
"combination of commands like this: 'getent passwd user || useradd user; getent passwd user' can result in the second 'getent passwd' not finding the newly added user as the racy behaviour might still return the cached negative hit from the first getent passwd."

2) man page for 'sss_cache' lists -u and -g options to invalidate specific user/group. But (from a first glance) according to the source code those options are doing exactly the same as -U/-G ignoring user/group supplied.
This is definitely a bug. Easy fix is to drop -u/-g from options list and man page.
Good solution would be to implement it. But as Jakub pointed out it is tricky (e.g. if you delete a user, you must also refresh the groups so that getent group doesn't return the member).
But correct implementation of those options would allow to use it from shadow-utils invocation thus greatly reducing overhead (and also avoiding unnecessary sysdb cache desctruction).

3) To the very least (from my point of view) we could use -d option to invalidate only 'files' domain.

4) 'sss_cache' not only sends SIGHUP:
sss_cache -> sss_memcache_clear_all() -> sss_signal(SIGHUP) && wait_till_nss_responder_invalidate_cache (up to 1 sec!)
Is it really necessary? Is race possible if we remove `wait_till_nss_responder_invalidate_cache()`?

5) sss_cache::invalidate_entry(): so far I do not understand why both `sysdb_set_user_attr(SYSDB_CACHE_EXPIRE)` and `sysdb_invalidate_user_cache_entry() -> sysdb_set_cache_entry_attr(SYSDB_CACHE_EXPIRE)` are required. From a quick glance they are (partially?) duplicating each other.




[1] https://github.com/shadow-maint/shadow/pull/128

Comment 5 Jakub Hrozek 2019-05-06 19:11:58 UTC

(In reply to Alexey Tikhonov from comment #4)
> In regards of (I) 'sss_cache':
> 
> 1) (Not directly related to this bug, but list it here not to forget:)
> Currently NSS responder doesn't clear negative cache in the handler of
> SIGHUP signal sent from 'sss_cache'.
> This renders shadows-utils patch[1] to be not fully functional. E.g. example
> given in the PR description is *NOT* fixed:
> "combination of commands like this: 'getent passwd user || useradd user;
> getent passwd user' can result in the second 'getent passwd' not finding the
> newly added user as the racy behaviour might still return the cached
> negative hit from the first getent passwd."

Yes, I don't remember if I wanted to add dropping of the negcache to sssd since I mentioned the test case, but I think it would not hurt.

> 
> 2) man page for 'sss_cache' lists -u and -g options to invalidate specific
> user/group. But (from a first glance) according to the source code those
> options are doing exactly the same as -U/-G ignoring user/group supplied.
> This is definitely a bug. Easy fix is to drop -u/-g from options list and
> man page.

Are you sure? I just ran a quick test and -u invalidates only one user for me.

> Good solution would be to implement it. But as Jakub pointed out it is
> tricky (e.g. if you delete a user, you must also refresh the groups so that
> getent group doesn't return the member).
> But correct implementation of those options would allow to use it from
> shadow-utils invocation thus greatly reducing overhead (and also avoiding
> unnecessary sysdb cache desctruction).
> 
> 3) To the very least (from my point of view) we could use -d option to
> invalidate only 'files' domain.

...as long as you know what the domain name is. But it could be possible to add a new switch to only find and invalidate domains with a specific type.

> 
> 4) 'sss_cache' not only sends SIGHUP:
> sss_cache -> sss_memcache_clear_all() -> sss_signal(SIGHUP) &&
> wait_till_nss_responder_invalidate_cache (up to 1 sec!)
> Is it really necessary? Is race possible if we remove
> `wait_till_nss_responder_invalidate_cache()`?
> 

Git blame says it is, but sadly I no longer remember details, git blame just points to https://pagure.io/SSSD/sssd/issue/2748

> 5) sss_cache::invalidate_entry(): so far I do not understand why both
> `sysdb_set_user_attr(SYSDB_CACHE_EXPIRE)` and
> `sysdb_invalidate_user_cache_entry() ->
> sysdb_set_cache_entry_attr(SYSDB_CACHE_EXPIRE)` are required. From a quick
> glance they are (partially?) duplicating each other.
> 

There is a comment above sysdb_invalidate_user_cache_entry() that says that it is writing into the persistent db. This can also explain the slowdown partially.

But I admit I also no longer remember here why does the function write to both the persistent database and the git blame records points to https://pagure.io/SSSD/sssd/issue/3164 which just seems like a lazy test to me :-)

> 
> 
> 
> [1] https://github.com/shadow-maint/shadow/pull/128

Comment 6 Alexey Tikhonov 2019-05-10 19:07:35 UTC

In regards of (I) 'sss_cache' (again):

"Normal" mode of operation: (1) shadow-util modifies local files -> (2) sssd_be[files] notices the change -> (3) sssd_be[files] notifies sssd_nss to stop serve "files" domain while it is being re-indexed.

There is a chance that NSS request issued after (1) and before (2) will return stale data from (memory) cache.
To mitigate this problem shadow-utils were patched (https://github.com/shadow-maint/shadow/pull/128) to call sss_cache before any update to force sssd to drop memcache and invalidate sysdb cache.


If I understand things correctly, this solution has a number of issues:

1) it doesn't drop negative cache, but it should do so (https://pagure.io/SSSD/sssd/issue/4002)

2) solution is based on the idea that with memcache being dropped NSS request will be served long enough to allow sssd_be to "notice" a change and "prevent" return of stale data. But this is unreliable/unsafe approach. Race is still possible.

3) shadow-utils calling sss_cache invalidate cache for *all* users and groups (not only "files" domain) for no reason. This may lead to performance degradation if, for example, large sysdb cache of IPA/AD data will be invalidated while updating local user/group.
This is especially unreasonable taking into an account that for "files" provider expiration time doesn't have any affect, so invalidating sysdb cache:
 - doesn't do anything useful for "files" provider
 - destroys cache of other domains
 - eats CPU

4) while invalidating sysdb cache, sss_cache not only updates expiration time in the fast "timestamp" db, but also does the same in "persistent" db.
The reason of this is https://bugzilla.redhat.com/show_bug.cgi?id=1371538, but from my point of view this is *NOT* an enough justification for doing so. And this (update of persistent db) is exactly the reason why invocation of sss_cache consumes considerable time.
Moreover, I do not know what is the reason to have duplication of "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of view this is design flaw.


So, what I propose is:
1) to extend sss_cache to support new options:
 - to drop memcache only (not touching sysdb at all)
 - to notify sssd_nss to stop serve (switch off) "files" domain - this might be done using the same mechanism as used by sssd_be[files]
2) to patch shadow-utils to use those new options

This already should resolve all the issues:
 - do not invalidate cache of other domains for no reason
 - do not perform CPU heavy operation
 - reliably resolves race condition

But in addition I would consider doing one of the following:
 either
 - to get rid of duplicate attributes in "persistent" and "timestamp" DBs
 or
 - to stop update "persistent" DB in sss_cache at least (this step may have little value if steps 1&2 will be completed)

Alternative solution to 1) could be modification and usage of existing sss_cache options --domain or -u/-g (instead of -U/-G). But from my point of view both require more work for less result.

Comment 7 Alexey Tikhonov 2019-05-22 14:03:02 UTC

(In reply to Alexey Tikhonov from comment #6)
> Moreover, I do not know what is the reason to have duplication of
> "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of
> view this is design flaw.

From my point of view this - https://github.com/SSSD/sssd/pull/812#issuecomment-494811145 - is an example why having duplicate ts is not the best idea


> But in addition I would consider doing one of the following:
>  either
>  - to get rid of duplicate attributes in "persistent" and "timestamp" DBs

Comment 8 Jakub Hrozek 2019-05-22 19:11:08 UTC

(In reply to Alexey Tikhonov from comment #7)
> (In reply to Alexey Tikhonov from comment #6)
> > Moreover, I do not know what is the reason to have duplication of
> > "expiration time" (SYSDB_CACHE_EXPIRE) in two DBs at all. From my point of
> > view this is design flaw.
> 
> From my point of view this -
> https://github.com/SSSD/sssd/pull/812#issuecomment-494811145 - is an example
> why having duplicate ts is not the best idea
>

I was thinking about this and the only reason I could think of was that the code might not behave well if some of these attributes are missing. But this is probably not a very string points since the attributes are typically numerical and we would just end up with a 0 instead of e.g. NULL for string values which might cause a hard error.

Comment 9 Pavel Březina 2019-12-03 10:12:44 UTC

The reason why it takes so long to add multiple users is because inotify callback is called with each change of the file. So if we add 1000 users, /etc/passwd is changed 1000 times, inotify callback is called 1000 times and SSSD refreshes cache 1000 times. So this obviously takes some time.

To solve this, we must reduce the number of cache refreshes in this scenario. For example:

- SSSD must publish an interface that will mark files domain as inconsistent (this also needs to propagate to in-memory cache to avoid serving old data)
- shadow-utils will mark files domain as inconsistent prior adding user/group to avoid race condition where /etc/passwd is changed by inotify callback has not yet been called
- if files domain is inconsistent, sssd must resolve the user/group directly from files instead of looking it up in its cache to provide correct result
- inotify callback will schedule a timed event to run cache refresh with some delay (say 15 seconds for example) so we can aggregate multiple inotify callbacks into one cache refresh

Comment 10 Alexey Tikhonov 2019-12-03 11:54:55 UTC

(In reply to Pavel Březina from comment #9)
> The reason why it takes so long to add multiple users is because inotify
> callback is called with each change of the file.

Right, this is one of the reasons. I wrote about this in the comment 3.

Comment 12 Alexey Tikhonov 2020-08-19 18:51:30 UTC

*** Bug 1870301 has been marked as a duplicate of this bug. ***

Comment 15 Alexey Tikhonov 2020-10-15 17:00:59 UTC

*** Bug 1870178 has been marked as a duplicate of this bug. ***

Comment 16 Iker Pedrosa 2020-11-04 09:43:55 UTC

*** Bug 1875981 has been marked as a duplicate of this bug. ***

Comment 17 Alexey Tikhonov 2020-11-05 11:13:20 UTC

Another incarnation of this problem can be seen in bz 1875981 and in a number of other BZs: even without re-scan triggered, merely initial scan can timeout and sssd_be[files] be terminated by internal watchdog if /etc/passwd is large enough (think of 100+k entries).

Comment 23 Alexey Tikhonov 2021-01-06 10:20:44 UTC

*** Bug 1913086 has been marked as a duplicate of this bug. ***

Comment 24 Alexey Tikhonov 2021-03-04 14:37:27 UTC

(In reply to Pavel Březina from comment #9)
> - inotify callback will schedule a timed event to run cache refresh with
> some delay (say 15 seconds for example) so we can aggregate multiple inotify
> callbacks into one cache refresh

This might be a solution for part II from comment 3 , but imo it requires https://github.com/SSSD/sssd/issues/5091 to be fixed.

Comment 25 Alexey Tikhonov 2021-06-09 12:48:36 UTC

https://github.com/SSSD/sssd/pull/5552 should address most of the issues described in this ticket.

Comment 26 Alexey Tikhonov 2021-07-09 09:46:49 UTC

Pushed PR: https://github.com/SSSD/sssd/pull/5552

* `master`
    * b4ee698ac078e74df51197b5f92432b4ed712d99 - cache_req: do not return cached data if domain is inconsistent
    * 19b850636399fdf5f1018671ba5e2ff7c9deaa2f - files: queue certmap requests if a refresh is running
    * dd1aa57950294e0b821a0be2893a159c5e5488a6 - files: delay refresh and not run in parallel
    * 0fbd67404a4b48b76e8750f0cdc727ed0f8de424 - files: add new option fallback_to_nss
    * 5288ddaa283bb5e710a2864ff3866bf87f56d03f - files: split update into batches

Comment 60 errata-xmlrpc 2021-11-09 19:46:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4435