Bug 1845965
Summary: | Users must be informed better when internal WATCHDOG terminates process. | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Alexey Tikhonov <atikhono> |
Component: | sssd | Assignee: | Alexey Tikhonov <atikhono> |
Status: | CLOSED ERRATA | QA Contact: | sssd-qe <sssd-qe> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.2 | CC: | grajaiya, jhrozek, lslebodn, mniranja, mzidek, pbrezina, sgoveas, thalman, tscherf |
Target Milestone: | rc | Keywords: | Triaged |
Target Release: | 8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | sssd-2.3.0-1.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-04 02:05:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexey Tikhonov
2020-06-10 13:45:36 UTC
* `master` * 65369f293b06ce0fe5622502bb32596bb50c523a - WATCHDOG: log process termination to the journal * 00e7b1ada3d1c1071eac79b65c17cd2701c2ae6a - DEBUG: changed "debug_prg_name" format * b5604d072e93bca7fc0c408fcfbb88f41c4d50ca - DEBUG: introduce new SSSDBG_TRACE_LDB level * 7b25375155e7b7f640d30dcfdf5b55482a6102a7 - DEBUG: changed timestamp output format python3-sssdconfig-2.3.0-4.el8.noarch sssd-common-2.3.0-4.el8.x86_64 sssd-winbind-idmap-2.3.0-4.el8.x86_64 sssd-2.3.0-4.el8.x86_64 sssd-client-2.3.0-4.el8.x86_64 sssd-nfs-idmap-2.3.0-4.el8.x86_64 sssd-kcm-2.3.0-4.el8.x86_64 sssd-dbus-2.3.0-4.el8.x86_64 sssd-krb5-common-2.3.0-4.el8.x86_64 sssd-krb5-2.3.0-4.el8.x86_64 sssd-ad-2.3.0-4.el8.x86_64 sssd-ipa-2.3.0-4.el8.x86_64 sssd-tools-2.3.0-4.el8.x86_64 sssd-common-pac-2.3.0-4.el8.x86_64 sssd-ldap-2.3.0-4.el8.x86_64 sssd-proxy-2.3.0-4.el8.x86_64 Reproducing the issue: Create a ldap server (389-ds) where there 1500000 users (student1 to student1500000) and following largegroups created 1. allusers which contains 1500000 users as members 2. lgroup1 which contains 15000 users as members 3. lgroup2 which contains 50000 users as members [sssd] config_file_version = 2 services = nss, pam domains = example1 [domain/example1] ldap_search_base = dc=example,dc=test id_provider = ldap auth_provider = ldap ldap_user_home_directory = /home/%u ldap_uri = ldaps://auto-hv-02-guest08.idmqe.lab.eng.bos.redhat.com ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem use_fully_qualified_names = True debug_level = 9 [nss] debug_level = 9 4. Start sssd and note the process pid [root@ibm-x3650m4-01-vm-11 mc]# ps -ef | grep sssd root 34653 1 1 06:58 ? 00:00:00 /usr/sbin/sssd -i --logger=files root 34656 34653 3 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files root 34657 34653 1 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files root 34658 34653 1 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files root 34659 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files 5. First lookup group lgroup1 which contains 15000 users .... .... student14982@example1,student14983@example1,student14984@example1,student14985@example1,student14986@example1, student14987@example1,student14988@example1,student14989@example1,student14990@example1,student14991@example1,student14992@example1, student14993@example1,student14994@example1 real 0m4.380s user 0m0.006s sys 0m0.008s 6. Check if the sssd_be process id same as in step 4: root 34653 1 0 06:58 ? 00:00:00 /usr/sbin/sssd -i --logger=files root 34656 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files root 34657 34653 5 06:58 ? 00:00:03 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files root 34658 34653 1 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files root 34659 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files 7. now lookup group lgroup2 which contains 50000 users [root@qe-blade-01 sssd]# time getent group lgroup2@example1 real 0m19.577s user 0m0.003s sys 0m0.004s 8. check sssd process [root@ibm-x3650m4-01-vm-11 ~]# ps -ef | grep sssd 9. As we can see sssd_be process was shutdown and restarted by watchdog. root 34653 1 0 06:58 ? 00:00:00 /usr/sbin/sssd -i --logger=files root 34656 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files root 34658 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files root 34659 34653 0 06:58 ? 00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files root 34712 34653 0 07:00 ? 00:00:00 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files As we can see sssd_be process was shutdown and started as the process id changed. 10. Below is the logs in sssd.log (2020-07-13 7:00:22): [sssd] [sbus_dispatch_reconnect] (0x0400): Connection lost. Terminating active requests. (2020-07-13 7:00:22): [sssd] [sbus_dispatch_reconnect] (0x4000): Remote client terminated the connection. Releasing data... (2020-07-13 7:00:22): [sssd] [sbus_connection_free] (0x4000): Connection 0x55b2548db630 will be freed during next loop! (2020-07-13 7:00:22): [sssd] [mt_svc_exit_handler] (0x1000): SIGCHLD handler of service example1 called (2020-07-13 7:00:22): [sssd] [svc_child_info] (0x0020): Child [34657] ('example1':'%BE_example1') was terminated by own WATCHDOG (2020-07-13 7:00:22): [sssd] [mt_svc_restart] (0x0400): Scheduling service example1 for restart 1 (2020-07-13 7:00:22): [sssd] [start_service] (0x0100): Queueing service example1 for startup Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4569 |