Bug 1415162 - ipa-extdom-extop plugin can exhaust DS worker threads
Summary: ipa-extdom-extop plugin can exhaust DS worker threads
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: ipa
Version: 7.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: IPA Maintainers
QA Contact: ipa-qe
Aneta Šteflová Petrová
URL:
Whiteboard:
Depends On: 1473571
Blocks: Red Hat1420851 Red Hat1467835 Red Hat1472344
TreeView+ depends on / blocked
 
Reported: 2017-01-20 12:40 UTC by Thorsten Scherf
Modified: 2021-06-10 11:51 UTC (History)
19 users (show)

Fixed In Version: ipa-server-4.5.4-5.el7
Doc Type: Bug Fix
Doc Text:
The IdM LDAP server no longer becomes unresponsive when resolving an AD user takes a long time When the System Security Services Daemon (SSSD) took a long time to resolve a user from a trusted Active Directory (AD) domain on the Identity Management (IdM) server, the IdM LDAP server sometimes exhausted its own worker threads. Consequently, the IdM LDAP server was unable to respond to further requests from SSSD clients or other LDAP clients. This update adds a new API to SSSD on the IdM server, which enables identity requests to time out. Also, the IdM LDAP extended identity operations plug-in and the Schema Compatibility plug-in now support this API to enable canceling requests that take too long. As a result, the IdM LDAP server can recover from the described situation and keep responding to further requests.
Clone Of:
: 1473571 1473577 (view as bug list)
Environment:
Last Closed: 2018-04-10 16:40:25 UTC
Target Upstream Version:


Attachments (Terms of Use)
tar ball with test build with a reduced client timeout (7.81 MB, application/x-gzip)
2017-02-13 15:32 UTC, Sumit Bose
no flags Details
tar-ball with test build (7.85 MB, application/x-gzip)
2017-07-20 12:46 UTC, Sumit Bose
no flags Details
valgrind output (2.69 MB, text/plain)
2017-10-12 15:43 UTC, German Parente
no flags Details
tar-ball with test build rebased to sssd-1.15.2-50.el7_4.6 (8.96 MB, application/x-gzip)
2017-12-07 19:57 UTC, Sumit Bose
no flags Details
tar-ball with test build 5 rebased to sssd-1.15.2-50.el7_4.6 (8.97 MB, application/x-gzip)
2018-01-12 12:18 UTC, Sumit Bose
no flags Details
tar-ball with test build 5 rebased to sssd-1.15.2-50.el7_4.8 (8.98 MB, application/x-gzip)
2018-01-12 12:20 UTC, Sumit Bose
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3930671 0 Troubleshoot None Directory Server on IdM master gets stuck when the server receives many requests to resolve users from a trusted AD doma... 2019-02-21 10:10:39 UTC
Red Hat Product Errata RHBA-2018:0918 0 None None None 2018-04-10 16:41:35 UTC

Description Thorsten Scherf 2017-01-20 12:40:46 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/freeipa/ticket/5464

ipa-extdom-extop is used to resolve AD trust users/groups. It does this using libnss calls like getpwnam, getgrname, etc.

libnss calls are serialized by a simple lock and each call can last a long time because it has to get info from SSSD/AD.

If a DS server is flooded with "IPA trusted domain ID mapper" extop, many worker threads will be busy for long time. The worse condition is when all the workers are busy with such extop. 
Then DS is no longer to process others requests and DS appears to have transient hang.

ipa-extdom-extop should manage those extop with its own threads (possibly like persistant searches) to not impact DS.

Comment 6 Sumit Bose 2017-02-13 15:32:30 UTC
Created attachment 1249898 [details]
tar ball with test build with a reduced client timeout

Comment 27 Sumit Bose 2017-07-20 12:46:10 UTC
Created attachment 1301685 [details]
tar-ball with test build

Comment 29 Martin Kosek 2017-07-21 08:44:43 UTC
Note that this solution is composed from changes in ipa, sssd and slapi-nis:

* ipa: Bug 1415162 - this bug
* sssd: Bug 1473571
* slapi-nis: Bug 1473577

Comment 56 German Parente 2017-10-12 15:43:38 UTC
Created attachment 1337857 [details]
valgrind output

Comment 81 Alexander Bokovoy 2017-11-30 14:35:48 UTC
Added a doc. Let me know, Aneta, if this is enough.

Comment 82 Alexander Bokovoy 2017-11-30 14:37:44 UTC
Fixed upstream. 

master:
    78ad1cf ipa-extdom-extop: refactor nsswitch operations

ipa-4-6:
    d1dd794 ipa-extdom-extop: refactor nsswitch operations

ipa-4-5:
    a2da9f9 ipa-extdom-extop: refactor nsswitch operations

Comment 98 Sumit Bose 2017-12-07 19:57:23 UTC
Created attachment 1364463 [details]
tar-ball with test build rebased to sssd-1.15.2-50.el7_4.6

Comment 100 Mohammad Rizwan 2018-01-03 09:13:37 UTC
version:
ipa-server-4.5.4-7.el7.x86_64
sssd-1.16.0-14.el7.x86_64

sss_nss_getpwnam_timeout_test.c 
--------------------------------------------------------------
#include <stdio.h>
#define IPA_389DS_PLUGIN_HELPER_CALLS 1
#include <sss_nss_idmap.h>

int main(int argc, char* argv[])
{
    int ret;
    struct passwd pwd;
    struct passwd *pwd_result;
    char buffer[1024];
    size_t buflen = sizeof(buffer);

    if (argc != 2) {
        fprintf(stderr, "Missing argument.\n");
        return 1;
    }

    ret = sss_nss_getpwnam_timeout(argv[1], &pwd, buffer, buflen, &pwd_result,
                                   0, 1000);

    fprintf(stderr, "Done [%d].\n", ret);

    return ret;
}
--------------------------------------------------------------

steps:

Make sure the 'libsss_nss_idmap-devel' is installed (yum install libsss_nss_idmap-devel) and then call:

1. gcc -Wall -Wextra -Werror sss_nss_getpwnam_timeout_test.c -o sss_nss_getpwnam_timeout_test -lsss_nss_idmap

2. set 'timeout = 999999' in the [domain/...] section of sssd.conf

3. restart the sssd service.

4. call $ kill -STOP $(pidof sssd_be)

5. call './sss_nss_getpwnam_timeout_test non_exisiting_user_name'
   this call should return after about 1s with "Done [5]."

6. as a reference you can call 'getent passwd non_exisiting_user_name'
   this call will return after 5 minites or if

7. kill -CONT $(pidof sssd_be) is called.

Actual result:
[root@client ~]# vi /etc/sssd/sssd.conf 
[root@client ~]# 
[root@client ~]# systemctl restart sssd
[root@client ~]# kill -STOP $(pidof sssd_be)
[root@client ~]# ./sss_nss_getpwnam_timeout_test test101
Done [5].
[root@client ~]#

Thus on the basis of above observations, marking the bug status "VERIFIED".

Comment 101 Sumit Bose 2018-01-12 12:18:44 UTC
Created attachment 1380407 [details]
tar-ball with test build 5 rebased to sssd-1.15.2-50.el7_4.6

Comment 102 Sumit Bose 2018-01-12 12:20:09 UTC
Created attachment 1380408 [details]
tar-ball with test build 5 rebased to sssd-1.15.2-50.el7_4.8

Comment 108 errata-xmlrpc 2018-04-10 16:40:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0918


Note You need to log in before you can comment on or make changes to this bug.