Bug 1463256

Summary: Files in mc folder re-generated every new ID lookup while the old files are not deleted
Product: Red Hat Enterprise Linux 7 Reporter: Andrey Bondarenko <abondare>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED NOTABUG QA Contact: sssd-qe <sssd-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: atikhono, a.v.miroshnichenko, grajaiya, jhrozek, joniknsk, lslebodn, mkosek, mzidek, pbrezina, s.egbert, tscherf
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-20 13:54:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrey Bondarenko 2017-06-20 12:55:38 UTC
Description of problem:

Files in mc folder re-generated every new ID lookup while the old files are not deleted


$ ls -li /var/lib/sss/mc/passwd
2612392 -rw-r--r--. 1 root root 8406312 Jun 14 09:53 /var/lib/sss/mc/passwd

Inode is 2612392.

$ du -s /var/lib/sss/mc
24636	/var/lib/sss/mc

$ sss_cache -u tuser200

$ ls -li /var/lib/sss/mc/passwd
2612399 -rw-r--r--. 1 root root 8406312 Jun 14 09:54 /var/lib/sss/mc/passwd

Inode is 2612399.

Other processes still use the old file:

$ lsof|grep sss|grep passwd
systemd-l   665                 root  DEL       REG              253,0              2612392 /var/lib/sss/mc/passwd
systemd-l   665                 root   19r      REG              253,0   8406312    2612392 /var/lib/sss/mc/passwd (deleted)
sssd_nss  16103                 root  mem-w     REG              253,0   8406312    2612399 /var/lib/sss/mc/passwd
sssd_nss  16103                 root   22uw     REG              253,0   8406312    2612399 /var/lib/sss/mc/passwd

Filesystem usage grows:

$ du -s /var/lib/sss/mc
24692	/var/lib/sss/mc

Comment 2 Lukas Slebodnik 2017-06-20 13:54:23 UTC
Memory cache is removed immediately after starting sssd and new one is created. 
In theory, it can happen that there might be more version of removed memory 
cache then 2. e.g. SSSD was restarted every 5 minutes and each process did nss 
request(getpwnam, ...) just once and never again (so files cannot be refreshed 
because do not have a chance to do it) 
 
But it is really a corner case and we cannot do anything with it. 
SSSD usually runs for very long time. So there is not any reason to recreate 
old memory cache by SSSD. 
 
If there are more version of removed memory cache (different inodes for deleted 
files) then we should find a reason why it happened. Because I would not 
recommend to restart SSSD periodically.

Comment 3 Alexander Miroshnichenko 2017-06-21 06:45:47 UTC
Hello,

>SSSD usually runs for very long time. So there is not any reason to recreate 
old memory cache by SSSD. 
You wrong. Frequntly appears sssd cache bugs (https://pagure.io/SSSD/sssd/issue/3382, for example) requires restart sssd after remove /var/lib/sss/db/*
After update sssd package sssd restart requires too.

Hold file descriptiors in /var/lib/sss/mc/ by runned processes and opened user sessions is bad architecture design.

Comment 4 Lukas Slebodnik 2017-06-21 10:58:20 UTC
(In reply to Alexander Miroshnichenko from comment #3)
> Hello,
> 
> >SSSD usually runs for very long time. So there is not any reason to recreate 
> old memory cache by SSSD. 
> You wrong.

I am not

> (https://pagure.io/SSSD/sssd/issue/3382, for example) requires restart sssd
> after remove /var/lib/sss/db/*

It does not require restarting sssd. It requires backporting fix from upstream.
Restaring sssd.service is just a partial workaround for upstream bug 3382.
And it is a problem of downstream distribution that it takes ages to backport fix. e.g. It was fixed in Fedora 3 weeks ago. 

> After update sssd package sssd restart requires too.
> 

A) SSSD is not updated every day
B) After upgrade of sssd-client; It would be good (is recommended) to restart all services which uses it. Otherwise you needn't use fixed version of sssd-client. Because there is not other way how to force glibc to reload dynamic modules from sssd-client. 

e.g.
sh# lsof +c0 -d DEL | grep libnss_sss.so.2
polkitd          1137 polkitd DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
dbus-daemon      1138    dbus DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
accounts-daemon  1180    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
lightdm          1398    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
(sd-pam          1419    user DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
dbus-daemon      1436    user DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
screen           2527    user DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
screen           2532    user DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
su               2676    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
su               3232    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
libvirtd         7375    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
sudo             8474    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
su               8475    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
su              10480    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
su              20102    root DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2
bash            22375    user DEL    REG   0,38            6189769 /usr/lib64/libnss_sss.so.2

Comment 5 Lukas Slebodnik 2017-06-26 12:50:49 UTC
I think we need to a little bit explain how fast memory cache works in SSSD. Sometimes we also call it mmap cache. due to implementation details :-)

ATM we implement fast memory cache for passwd (getpwnam, getpwuid) groups (getgrnam, getgruid) and for initgrous(id -G, getent initgroups) All of them are stored in the directory /var/lib/sss/mc/. There is a read-write lock on these files and therefore only sssd-nss can update entries there. And as I wrote in previous comments, these files are removed when sssd_nss is started and new empty ones are created.

SSSD has a module in glibc (libnss_sss.so.2) which is loaded by glibc for any (name service switch) nss request based on the file /etc/nsswitch.conf. SSSD is enabled by default in nsswitch.conf due to other reasons.

So is glibc get a request to resolve name/ID it will try to check entries in files (/etc/passwd, /etc/group) and then it will try to find entries using sssd.
libnss_sss.so.2 will open (with mmap) fast cache in /var/lib/sss/mc in read-only mode and then try to search requested data there. By default; data are cached there for 5 minutes. Returning data from fast cache is really efficient because it does not require any communication with any daemon and data are returned from memory due to mmap. Fast memory cache is opened just once otherwise overhead with opening fast cache with every request would be quite high especially with long living clients and too many requests in short time 10.000 requests in a second. SSSD glibc nss module (libnss_sss.so.2) can detect before each request that file was removed and in that case it will closed cached file descriptor and  open newly created files.

If data are not cached on fast memory cache then SSSD glibc plugin will try to contact sssd daemon via unix pipe (/var/lib/sss/pipes/nss) SSSD can returned cached data or can even try to refresh data from directory server. Which is less efficient then returning data from memory by glibc plugin.


If you restart SSSD very often then it can happen then different daemon/client process will load different version of sssd fast cache. It is not a problem if daemon/client process does glibc request very often; because SSSD glibc nss module can detect that files were removed and refresh them. However if daemon/client process will do name service request just once then SSSD glibc nss  plugin cannot detect with next request that files with fast memory cache were removed and file system cannot remove files due to opened file descriptor.
 
You can restart that daemon to load SSSD glibc module one more time and therefore open SSSD fast memory cache again. Which is recommended way in case of upgrade (Comment 4) because you glibc should also load new version of SSSD plugin libnss_sss.so.2. But daemons which does not do many nss request probably does not require caching and you can disable fast memory cache in libnss_sss.so.2 by environment variable. In such case, SSSD nss plugin will always contact sssd daemon.

man sssd.conf says:
           NOTE: If the environment variable SSS_NSS_USE_MEMCACHE is set to
           "NO", client applications will not use the fast in-memory cache.


If someone wants disable sssd memory cache for accounts-daemon.service then it is possible to do it with systemd drop-in file.

sh# mkdir /etc/systemd/system/accounts-daemon.service.d
sh# echo 'Environment=SSS_NSS_USE_MEMCACHE=NO' > /etc/systemd/system/accounts-daemon.service.d/sssd_memcache.conf
sh# systemctl daemon-reload

Comment 6 Egbert S. 2021-12-12 19:22:56 UTC
It's happening again in  CentOS 8.4

Comment 7 Alexey Tikhonov 2021-12-13 09:30:06 UTC
(In reply to Egbert S. from comment #6)
> It's happening again in  CentOS 8.4

Please provide more details.