Bug 2069097
Summary: | libnss_sss: threads stuck at sss_nss_lock from initgroups | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Roberto Bergantinos <rbergant> | |
Component: | autofs | Assignee: | Ian Kent <ikent> | |
Status: | CLOSED ERRATA | QA Contact: | Kun Wang <kunwan> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.5 | CC: | atikhono, fweimer, grajaiya, ikent, jhrozek, lslebodn, mzidek, pbrezina, sbose, tscherf, xzhou | |
Target Milestone: | rc | Keywords: | CustomerScenariosInitiative, Triaged | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | autofs-5.1.4-83.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2087535 (view as bug list) | Environment: | ||
Last Closed: | 2022-11-08 10:52:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2087535 |
Description
Roberto Bergantinos
2022-03-28 09:29:00 UTC
Hi Roberto, if you have a stuck process at hand, could you please trigger and attach a coredump? (In reply to Alexey Tikhonov from comment #1) > Hi Roberto, > > if you have a stuck process at hand, could you please trigger and attach a > coredump? Hi Alexey Unfortunately i had to reboot the reproducer. I'm running it again, as soon i have it i'll attach it rgds roberto a bit of refinement for the reproducer, in order to stress sss_nss_lock, let's invalidate cached initgroup/group db with : # while true; do sleep 90; groupdel dummygroup; sleep 90; groupadd -g 99999 dummygroup; done still waiting to reproduce issue again @atikhono Just added the coredump to the case, it got reproduced faster after stressing sss_nss_lock change on reproducer from #3 (gdb) frame 2 #2 0x00007fc7ac17ec24 in sss_mt_lock (m=0x7fc7ac386100 <sss_nss_mtx>) at src/sss_client/common.c:1172 1172 sss_mt_lock(&sss_nss_mtx); (gdb) p sss_nss_mtx.mtx.__data $10 = {__lock = 2, __count = 0, __owner = 95766, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}} (gdb) info threads Id Target Id Frame * 1 Thread 0x7fc7ac5c7700 (LWP 95772) 0x00007fc7ac17ec24 in sss_mt_lock (m=0x7fc7ac386100 <sss_nss_mtx>) at src/sss_client/common.c:1172 -- there is only one thread 95772, that doesn't match __owner = 95766 It looks ("__nusers = 1" && "__lock = 2") like mutex was locked by the thread that was terminated somehow during obe of 'libnss_sss.so' calls... If the process forks while the mutex is acquired, it will remain acquired in the subprocess. But the thread that will eventually unlock the mutex in the original process does not exist in the forked subprocess, so the mutex is never unlocked there. This is the main reason why POSIX says that after forking a multi-threaded process, only async-signal-safe functions may be used in the subprocess. Mutex locking and unlocking is not among those functions. glibc makes reasonable efforts to protect its own locks during fork, so that more library functionality is available in forked processes. libnss_sss could do the same using fork handlers (see pthread_atfork). But I think we have fork protection gaps in the NSS framework itself (some of them will be addressed in Red Hat Enterprise Linux 9), and of course there are other NSS modules which may also lack full fork protection, so even if you fix libnss_sss, issues will remain. I think the fundamental issue is that the autofs daemon does not follow the rules related to multi-threading and fork. Maybe it is possible to do the NSS queries in the original process before forking, and only call async-signal-safe functions in the subprocess? Florian Many thanks, i see some locking on autofs code before forking based on a compile option ENABLE_MOUNT_LOCKING which is not enabled as i can see : [root@rhel8 ~]# automount -V Linux automount version 5.1.4-74.el8 Directories: config dir: /etc/sysconfig maps dir: /etc modules dir: /usr/lib64/autofs Compile options: DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_SYSTEMD WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC Let me check with Ian about this. (In reply to Florian Weimer from comment #16) > If the process forks while the mutex is acquired, it will remain acquired in > the subprocess. But the thread that will eventually unlock the mutex in the > original process does not exist in the forked subprocess, so the mutex is > never unlocked there. > > This is the main reason why POSIX says that after forking a multi-threaded > process, only async-signal-safe functions may be used in the subprocess. > Mutex locking and unlocking is not among those functions. > > glibc makes reasonable efforts to protect its own locks during fork, so that > more library functionality is available in forked processes. libnss_sss > could do the same using fork handlers (see pthread_atfork). But I think we > have fork protection gaps in the NSS framework itself (some of them will be > addressed in Red Hat Enterprise Linux 9), and of course there are other NSS > modules which may also lack full fork protection, so even if you fix > libnss_sss, issues will remain. > > I think the fundamental issue is that the autofs daemon does not follow the > rules related to multi-threading and fork. Maybe it is possible to do the > NSS queries in the original process before forking, and only call > async-signal-safe functions in the subprocess? That may be so but there are several functions here, as well as the mutex itself, which do not belong to autofs, I have to assume they belong to libnss_sss which is part of sss and is installed as a separate package IIRC. Nevertheless the debug trace looks a bit odd too, I'm not sure what's been happening there, the logging appears to be starting part way through some request and because earlier activities aren't known it's hard to work out what is going on. Possibly the better idea would be to get a debug log in parallel and try and match that to the sss log (or debug backtrace) so the actual actions can be identified. Also I have to say we really should be using 5.1.4-82.el8 the RHEL-5.6 release, RHEL-5.5 had a number of regressions that got by me! Ian (In reply to Roberto Bergantinos from comment #21) > // not user what is exactly requested, but i can reproduce the issue with > autofs-5.1.4-82.el8.x86_64 > > // im providing both autofs debug log and sssd_nss.log from the time of the > issue, attached to > // the bug, files are : > > 1177097.autofs ---> autofs debug log > sssd.log.problem ---> sssd log from date/time problem is reproduced > 2022-04-18 17:48:39 > > // autofs is on debug mode and sssd logs obtained with : > > [root@rhel8 log]# sssctl debug-level 7 > > // i used the reproducer detailed on description and comment #3 > > // details : > > [root@rhel8 /]# rpm -q autofs libsss_autofs sssd > autofs-5.1.4-82.el8.x86_64 > libsss_autofs-2.0.0-43.el8.x86_64 > sssd-2.5.2-2.el8_5.4.x86_64 > > > // the issue reproduces on this autofs incarnation : > > Apr 18 17:48:39 rhel8 automount[1177097]: Starting automounter version > 5.1.4-82.el8, master map auto.master > > // gathering autofs logs : > > [root@rhel8 /]# journalctl | grep automount > out > [root@rhel8 /]# grep 1177097 out > 1177097.autofs > [root@rhel8 /]# grep -iE "calling mount|mounted /test" 1177097.autofs > > // gathering sssd logs : > > # grep "2022-04-18 17:48:39" sssd_nss.log > sssd.log.problem > > // we can see how 3 x bind mounts for dir2, dir7 and dir8 got hung : > > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir6 /test-03137975/dir6 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir6 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir8 /test-03137975/dir8 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir2 /test-03137975/dir2 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir7 /test-03137975/dir7 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir10 /test-03137975/dir10 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir3 /test-03137975/dir3 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir5 /test-03137975/dir5 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir9 /test-03137975/dir9 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir1 /test-03137975/dir1 > Apr 18 17:48:39 rhel8 automount[1177097]: mount(bind): calling mount --bind > -o defaults /mnt/dir4 /test-03137975/dir4 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir5 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir10 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir3 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir1 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir9 > Apr 18 17:48:39 rhel8 automount[1177097]: mounted /test-03137975/dir4 > > // checking user stack : > > [root@rhel8 /]# for pid in `pidof automount`; do echo "== $pid"; gdb -p $pid > -batch -ex "thread apply all bt" -ex "quit"; done &> userland.out > [root@rhel8 /]# grep parse_mount userland.out > #11 0x00007ffbcbbbf755 in parse_mount (ap=0x55970686b110, > name=0x7ffbb2c13c90 "dir7", name_len=4, mapent=<optimized out>, > context=<optimized out>) at parse_sun.c:1705 > #11 0x00007ffbcbbbf755 in parse_mount (ap=0x55970686b110, > name=0x7ffbc8b15c90 "dir2", name_len=4, mapent=<optimized out>, > context=<optimized out>) at parse_sun.c:1705 > #11 0x00007ffbcbbbf755 in parse_mount (ap=0x55970686b110, > name=0x7ffbc8275c90 "dir8", name_len=4, mapent=<optimized out>, > context=<optimized out>) at parse_sun.c:1705 > > // again the stack on sss_nss_lock : > > #0 __lll_lock_wait () at > ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 > #1 0x00007ffbd4d73a79 in __GI___pthread_mutex_lock (mutex=0x7ffbc88d7100) > at ../nptl/pthread_mutex_lock.c:80 > #2 0x00007ffbc86cfc24 in sss_nss_lock () from /lib64/libnss_sss.so.2 > #3 0x00007ffbc86d0eab in _nss_sss_initgroups_dyn () from > /lib64/libnss_sss.so.2 > #4 0x00007ffbd33a14ce in internal_getgrouplist > (user=user@entry=0x7ffb94001280 "roberto", group=group@entry=1000, > size=size@entry=0x7ffbb2c087e8, groupsp=groupsp@entry=0x7ffbb2c087f0, > limit=limit@entry=655 > 36) at initgroups.c:111 > #5 0x00007ffbd33a1841 in initgroups (user=0x7ffb94001280 "roberto", > group=group@entry=1000) at initgroups.c:218 > #6 0x00005597060300c4 in do_spawn (logopt=logopt@entry=0, > wait=wait@entry=4294967295, options=options@entry=2, > prog=prog@entry=0x7ffbb2c092b2 "/usr/bin/mount", > argv=argv@entry=0x7ffb94001d60) at spawn.c:40 > 1 > #7 0x00005597060310c0 in spawn_bind_mount (logopt=0) at spawn.c:792 > #8 0x00007ffbcb770eab in mount_mount (ap=0x55970686b110, > root=0x55970686b1f0 "/test-03137975", name=<optimized out>, name_len=4, > what=0x7ffb94001680 "/mnt/dir7", fstype=0x7ffbcb9a99cf "bind", options=0x7ff > bcb782b5a "defaults", context=0x0) at mount_bind.c:167 > #9 0x00007ffbcb994488 in mount_mount (ap=0x55970686b110, root=<optimized > out>, name=<optimized out>, name_len=<optimized out>, what=<optimized out>, > fstype=<optimized out>, options=<optimized out>, context > =<optimized out>) at mount_nfs.c:307 > #10 0x00007ffbcbbbce4e in sun_mount (ap=ap@entry=0x55970686b110, > root=<optimized out>, name=name@entry=0x7ffbb2c13c90 "dir7", > namelen=namelen@entry=4, loc=loc@entry=0x7ffb94001760 ":/mnt/dir7", > loclen=locle > n@entry=10, options=0x0, ctxt=<optimized out>) at parse_sun.c:698 > #11 0x00007ffbcbbbf755 in parse_mount (ap=0x55970686b110, > name=0x7ffbb2c13c90 "dir7", name_len=4, mapent=<optimized out>, > context=<optimized out>) at parse_sun.c:1705 > #12 0x00007ffbc8b22d63 in lookup_mount (ap=<optimized out>, name=<optimized > out>, name_len=4, context=0x7ffbb80016a0) at lookup_program.c:694 > #13 0x0000559706039ee1 in do_lookup_mount (ap=ap@entry=0x55970686b110, > map=map@entry=0x559706869c60, name=name@entry=0x7ffbb2c13c90 "dir7", > name_len=name_len@entry=4) at lookup.c:831 > #14 0x000055970603abf1 in lookup_nss_mount (ap=ap@entry=0x55970686b110, > source=source@entry=0x0, name=name@entry=0x7ffbb2c13c90 "dir7", name_len=4) > at lookup.c:1216 > #15 0x000055970602adf5 in do_mount_indirect (arg=<optimized out>) at > indirect.c:750 > #16 0x00007ffbd4d7117a in start_thread (arg=<optimized out>) at > pthread_create.c:479 > #17 0x00007ffbd33d8dc3 in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > // owner not present (1177319) : > > (gdb) f 1 > #1 0x00007ffbd4d73a79 in __GI___pthread_mutex_lock (mutex=0x7ffbc88d7100) > at ../nptl/pthread_mutex_lock.c:80 > 80 LLL_MUTEX_LOCK (mutex); > > (gdb) p (pthread_mutex_t ) *mutex > $4 = {__data = {__lock = 2, __count = 0, __owner = 1177319, __nusers = 1, > __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = > 0x0}}, > __size = "\002\000\000\000\000\000\000\000\347\366\021\000\001", '\000' > <repeats 26 times>, __align = 2} > > (gdb) ! pidof automount > 1178267 1177339 1177338 1177337 > > // this looks related from issue described on comment #16 That sss_nss_lock still does not belong to autofs, it knows nothing about it. It seems to me that this lock is held while group information is obtained and held over running of mount(8). That may be the problem. But what are you asking me to do here, I can have a look at function usage once I decipher comment #16 ... (In reply to Ian Kent from comment #25) > > // this looks related from issue described on comment #16 > > That sss_nss_lock still does not belong to autofs, it knows nothing about it. > > It seems to me that this lock is held while group information is obtained and > held over running of mount(8). That may be the problem. > > But what are you asking me to do here, I can have a look at function usage > once > I decipher comment #16 ... What kernel version is being used here? (In reply to Ian Kent from comment #26) > (In reply to Ian Kent from comment #25) > > > // this looks related from issue described on comment #16 > > > > That sss_nss_lock still does not belong to autofs, it knows nothing about it. > > > > It seems to me that this lock is held while group information is obtained and > > held over running of mount(8). That may be the problem. > > > > But what are you asking me to do here, I can have a look at function usage > > once > > I decipher comment #16 ... > > What kernel version is being used here? i used 4.18.0-348.12.2.el8_5.x86_64 I noticed on automount userland code that build option ENABLE_MOUNT_LOCKING appears aimed to avoid this? , correct me if im wrong. btw, i cannot reproduce with regular nfs mounts instead of bind ones, or at least not that fast, i leaved the reproducer running for 3 days without issues. I'll see if that can be an option for field (In reply to Roberto Bergantinos from comment #27) > (In reply to Ian Kent from comment #26) > > (In reply to Ian Kent from comment #25) > > > > // this looks related from issue described on comment #16 > > > > > > That sss_nss_lock still does not belong to autofs, it knows nothing about it. > > > > > > It seems to me that this lock is held while group information is obtained and > > > held over running of mount(8). That may be the problem. > > > > > > But what are you asking me to do here, I can have a look at function usage > > > once > > > I decipher comment #16 ... > > > > What kernel version is being used here? > > i used 4.18.0-348.12.2.el8_5.x86_64 Ok, I have a look at that one, there was a bug introduced upstream that can look a bit like this and could have been pulled in due to other changes but I doubt it. I'll check anyway. > > I noticed on automount userland code that build option ENABLE_MOUNT_LOCKING > appears aimed to avoid this? > , correct me if im wrong. Not sure what you mean, that option was originally added because mount(8) and helpers were racy wrt. /etc/mtab and a way to serialise these calls was the only way to prevent it. It also doesn't violate the rules for use over fork() and exec(). > > btw, i cannot reproduce with regular nfs mounts instead of bind ones, or at > least not that > fast, i leaved the reproducer running for 3 days without issues. > > I'll see if that can be an option for field Have a look at this commit, I'm pretty sure it introduced the problem because, as we are seeing, initgroups(3) does violate what may be called between fork() and exec(). The problematic part of this is what is done between the fork() and the exec(). The functions called between the invocation of those two functions can cause problems. If exec() is called immediately after the fork() there should not be any problems. commit 6343a32920204b1a8f6935b7f40254e230cde155 Author: Jeff Mahoney <jeffm> Date: Thu Oct 19 09:48:07 2017 +0800 autofs-5.1.3 - fix ordering of seteuid/setegid in do_spawn() In do_spawn, We call seteuid() prior to calling setegid() which means that, when we're using an unprivileged uid, we won't have permissions to set the effective group anymore. We also don't touch the group memberships so the permissions used to open the directory will will include all of root's supplementary groups and none of the user's. This patch reverses the ordering and uses initgroups() to reset the supplementary groups to the unprivileged user's groups. Signed-off-by: Jeff Mahoney <jeffm> Signed-off-by: Ian Kent <raven> I'll need to put together a patch that moves the initgroups() call outside of the fork() and exec() code without changing the environment for the subsequent mount. Or perhaps this might actually be a case that needs a fork handler. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (autofs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7743 |