RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1703436 - sssd not thread-safe in innetgr()
Summary: sssd not thread-safe in innetgr()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.1
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Sumit Bose
QA Contact: Jakub Vavra
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-26 12:15 UTC by Ulrich Sibiller
Modified: 2024-12-20 18:50 UTC (History)
20 users (show)

Fixed In Version: sssd-2.5.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-11-09 19:46:33 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)
example source code (757 bytes, application/gzip)
2019-04-26 12:15 UTC, Ulrich Sibiller
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5540 0 None open sssd not thread-safe in innetgr() 2021-03-29 14:21:58 UTC
Red Hat Product Errata RHBA-2021:4435 0 None None None 2021-11-09 19:46:53 UTC
Sourceware 16181 0 P2 NEW Threade-safe implementation of innetgr and definition of struct __netgrent 2020-12-07 14:19:33 UTC

Description Ulrich Sibiller 2019-04-26 12:15:00 UTC
Created attachment 1559039 [details]
example source code

Description of problem:

"man 3 innetgr" states "The function is thread-safe." On our system it normally is but fails to be thread-safe if sssd is activated.

We hit this bug while debugging a GPFS (Spectrum Scale) issue. IBM support came up with the attached code example that triggers the problem without gpfs being involved.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Adapt "groups" and "hosts" arrays in the two attached source code files. The code will check if the machine in the hosts array is member of the netgroup mentioned in the  groups array. The host must be a member in the respective netgroup for this test.
2. compile the first example: gcc -lpthread -o netg-lock netg-lock.c
3. compile the second example: gcc -lpthread -o netg-lock2 netg-lock2.c
4. run netg-lock. It will call innetgr() with 256 threads, protected by pthread_mutex_lock/unlock(). This run will print out statistics that report every single check having been successful.
5. run netg-lock2. It will call innetgr() with 256 threads, _not_ protected by pthread_mutex_lock/unlock(). This run will print out statistics that report many failing requests.
 


Actual results:

netg-lock is successful, netg-lock2 is not

Expected results:

both examples should be successful 

Additional info:
Our netgroups configuration is quite big and nested. We have 66 netgroups altogether that have 19037 host entries.

The bug is triggered for a host that is member of a nested group.

$ ldapsearch -x -b <searchbase> ObjectClass=nisNetgroup dn | grep dn: | wc -l
66

$ ldapsearch -x -b dc=rd,dc=corpintra,dc=net ObjectClass=nisNetgroup | grep nisNetgroupTriple: | wc -l
19037

Comment 2 Jakub Hrozek 2019-04-26 13:28:10 UTC
Thank you for the bug report. 

We're going to look at the bug, but because there are quite a few bug reports and cases against SSSD, I wanted to ask if you have a RH subscription? If yes, it would help us to prioritize the work better, if you could also open a RH support case.. 

Coming to the technical aspect of the problem -- after a quick discussion with a colleague, currently I'm not sure what could be the issue. innetgr() is implemented in libc completely as a wrapper around setnetgrent/getnetgrent/endnetgrent (https://sourceware.org/git/?p=glibc.git;a=blob;f=inet/getnetgrent_r.c;hb=1c81d55fc4b07b51adf68558ba74ce975153e580#l451) so nss_sss does not implement innetgr on its own. So either one of setnetgrent/getnetgrent/endnetgrent  is thread-unsafe or libc itself (unlikely?). And sssd deamon is single threaded, using an event loop for pseudo-concurrency. So on the technical side, this needs more investigation, but as I said, having a support case would help prioritize the investigation better.

Comment 3 Sumit Bose 2019-04-29 10:08:45 UTC
(In reply to Ulrich Sibiller from comment #0)
> Created attachment 1559039 [details]
> example source code
> 
> Description of problem:
> 
> "man 3 innetgr" states "The function is thread-safe." On our system it
> normally is but fails to be thread-safe if sssd is activated.
> 

I agree this sentence is in the man page nut there is the table with lists innetgr() as "MT-Unsafe race:netgrent locale" and the sentence "In the above table, netgrent in race:netgrent signifies that if any of the functions setnetgrent(), getnetgrent_r(), innetgr(), getnetgrent(), or endnetgrent() are used in parallel in different threads of a program, then data races could occur."

Additionally the glibc documentation (info libc) only says "MT-Unsafe race:netgrent locale".

I switch the component for this ticket to 'glibc' to ask the maintainers for clarification if innetgr() is expected to the thread-safe for the described use case.

bye,
Sumit

Comment 4 Florian Weimer 2019-04-29 11:17:47 UTC
I think the code backing innetgr in glibc and sssd should be thread-safe as-is.  Unlike many other NSS functions, local state is used for it, in the result buffer.  nss_sss also uses it to keep local state across _nss_sss_setnetgrent and internal_getnetgrent_r.  See the handling of “the struct __netgrent *result” argument.

I believe the race:netgrent annotation for innetgr is a bug.  The glibc manual even lists thread safety as a benefit of innetgr:

     This function tests whether the triple specified by the parameters
     HOST, USER, and DOMAIN is part of the netgroup NETGROUP.  Using
     this function has the advantage that

       1. no other netgroup function can use the global netgroup state
          since internal locking is used and
       2. the function is implemented more efficiently than successive
          calls to the other ‘set’/‘get’/‘endnetgrent’ functions.

I think we need to find out why innetgr is not thread-safe with sssd and then re-evaluate whether we should fix this issue, and update the glibc manual accordingly.  I would appreciate if you could help with that since I'm not familiar with sssd internals.  Thanks.

Comment 5 Sumit Bose 2019-04-29 12:34:21 UTC
Thank you Florian for the details.

I will move the ticket back to the SSSD component.

I guess the issue might be caused by the way concurrent netgroup requests are handled by the SSSD NSS responder process. Can you explain 

"""

       1. no other netgroup function can use the global netgroup state
          since internal locking is used and
       2. the function is implemented more efficiently than successive
          calls to the other ‘set’/‘get’/‘endnetgrent’ functions.
""""
with some more details? I wonder if nss_sss or SSSD's NSS responder can detect somehow that the original call is innetgr() and not a sequence of ‘set’/‘get’/‘endnetgrent’ functions. But after a short look at the code in inet/getnetgrent_r.c it looks to me that innetgr() is still implemented as a sequence of set/multiple gets/end calls and I also didn't see any special locking for innetgr() only for the individual ‘set’/‘get’/‘endnetgrent’ functions.

bye,
Sumit

Comment 6 Florian Weimer 2019-04-29 12:41:49 UTC
I think you can detect the call sequence using the struct __netgrent * argument.  See the nss/nss_files/files-netgrp.c code in glibc.

Comment 7 Sumit Bose 2019-04-29 16:09:47 UTC
(In reply to Florian Weimer from comment #6)
> I think you can detect the call sequence using the struct __netgrent *
> argument.  See the nss/nss_files/files-netgrp.c code in glibc.

Ok, I see. How should NSS modules outside of the glibc source tree use struct __netgrent, since it is not defined in a public header. Currently SSSD and e.g. nss-pam-ldapd are using a copy of the struct definition in their own source tree. But this might be risky, since glibc is free to change in internals of the struct at will (see e.g. https://pagure.io/SSSD/sssd/issue/3993 for a similar case).

Comment 8 Florian Weimer 2019-05-06 11:07:16 UTC
I will try to document this better on the glibc side. I don't think there is a way around defining struct __netgrent in a public header file.

The sssd sources already contain a copy of the definition of struct __netgrent because it is impossible to implement netgroup support without it. (The first few members are used for passing data to the glibc NSS framework.) I think you can reuse the data member to store a pointer to your own internal data.  You must set the data member to NULL in the endnetgrent function and free the resources there.

Comment 9 Sumit Bose 2019-05-06 11:26:48 UTC
(In reply to Florian Weimer from comment #8)
> I will try to document this better on the glibc side. I don't think there is
> a way around defining struct __netgrent in a public header file.
> 
> The sssd sources already contain a copy of the definition of struct
> __netgrent because it is impossible to implement netgroup support without
> it. (The first few members are used for passing data to the glibc NSS
> framework.) I think you can reuse the data member to store a pointer to your
> own internal data.  You must set the data member to NULL in the endnetgrent
> function and free the resources there.

Hi Florian,

thank you for taking care of this in glibc.

I will check from the SSSD side how we can use the data member from __netgrent to make netgroup calls thread safe.

bye,
Sumit

Comment 17 Sumit Bose 2021-03-18 07:52:21 UTC
Upstream ticket:
https://github.com/SSSD/sssd/issues/5540

Comment 18 Pavel Březina 2021-04-21 10:02:23 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5541

* `master`
    * 29abf94e3aec2c54c76adf61318099097c41ea77 - intg test: test is innetgr() is thread-safe
    * 88eec1c22f188adc49f41b81fe5af03c995c690d - nss client: make innetgr() thread safe

Comment 25 errata-xmlrpc 2021-11-09 19:46:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4435


Note You need to log in before you can comment on or make changes to this bug.