RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1845965 - Users must be informed better when internal WATCHDOG terminates process.
Summary: Users must be informed better when internal WATCHDOG terminates process.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Alexey Tikhonov
QA Contact: sssd-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-10 13:45 UTC by Alexey Tikhonov
Modified: 2021-06-20 04:18 UTC (History)
9 users (show)

Fixed In Version: sssd-2.3.0-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-04 02:05:17 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 5146 0 None closed Users must be informed better when internal WATCHDOG terminates process. 2021-02-11 17:31:17 UTC
Red Hat Product Errata RHBA-2020:4569 0 None None None 2020-11-04 02:05:36 UTC

Description Alexey Tikhonov 2020-06-10 13:45:36 UTC
This bug was initially created as a copy of Bug #1837545

I am copying this bug because: to track fix for RHEL8



Currently it is very difficult to understand that one of SSSD processes was terminated by internal watchdog.

This often makes troubleshooting complicated.

It would be beneficial to have this information easily available in system journal and sssd logs.

Comment 1 Alexey Tikhonov 2020-06-10 13:51:43 UTC
* `master`
    * 65369f293b06ce0fe5622502bb32596bb50c523a - WATCHDOG: log process termination to the journal
    * 00e7b1ada3d1c1071eac79b65c17cd2701c2ae6a - DEBUG: changed "debug_prg_name" format
    * b5604d072e93bca7fc0c408fcfbb88f41c4d50ca - DEBUG: introduce new SSSDBG_TRACE_LDB level
    * 7b25375155e7b7f640d30dcfdf5b55482a6102a7 - DEBUG: changed timestamp output format

Comment 4 Niranjan Mallapadi Raghavender 2020-07-13 11:04:26 UTC
python3-sssdconfig-2.3.0-4.el8.noarch
sssd-common-2.3.0-4.el8.x86_64
sssd-winbind-idmap-2.3.0-4.el8.x86_64
sssd-2.3.0-4.el8.x86_64
sssd-client-2.3.0-4.el8.x86_64
sssd-nfs-idmap-2.3.0-4.el8.x86_64
sssd-kcm-2.3.0-4.el8.x86_64
sssd-dbus-2.3.0-4.el8.x86_64
sssd-krb5-common-2.3.0-4.el8.x86_64
sssd-krb5-2.3.0-4.el8.x86_64
sssd-ad-2.3.0-4.el8.x86_64
sssd-ipa-2.3.0-4.el8.x86_64
sssd-tools-2.3.0-4.el8.x86_64
sssd-common-pac-2.3.0-4.el8.x86_64
sssd-ldap-2.3.0-4.el8.x86_64
sssd-proxy-2.3.0-4.el8.x86_64


Reproducing the issue:
Create a ldap server (389-ds) where there 1500000 users (student1 to student1500000) and following largegroups created
1. allusers which contains 1500000 users as members 
2. lgroup1 which contains 15000 users as members
3. lgroup2 which contains 50000 users as members 


[sssd]
config_file_version = 2
services = nss, pam
domains = example1

[domain/example1]
ldap_search_base = dc=example,dc=test
id_provider = ldap
auth_provider = ldap
ldap_user_home_directory = /home/%u
ldap_uri = ldaps://auto-hv-02-guest08.idmqe.lab.eng.bos.redhat.com
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
use_fully_qualified_names = True
debug_level = 9

[nss]
debug_level = 9

4. Start sssd and note the process pid

[root@ibm-x3650m4-01-vm-11 mc]# ps -ef | grep sssd
root       34653       1  1 06:58 ?        00:00:00 /usr/sbin/sssd -i --logger=files
root       34656   34653  3 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root       34657   34653  1 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files
root       34658   34653  1 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root       34659   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files


5. First lookup group lgroup1 which contains 15000 users 

....
....
student14982@example1,student14983@example1,student14984@example1,student14985@example1,student14986@example1,
student14987@example1,student14988@example1,student14989@example1,student14990@example1,student14991@example1,student14992@example1,
student14993@example1,student14994@example1

real    0m4.380s
user    0m0.006s
sys     0m0.008s


6. Check if the sssd_be process id same as in step 4:

root       34653       1  0 06:58 ?        00:00:00 /usr/sbin/sssd -i --logger=files
root       34656   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root       34657   34653  5 06:58 ?        00:00:03 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files
root       34658   34653  1 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root       34659   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files

7. now lookup group lgroup2 which contains 50000 users 
[root@qe-blade-01 sssd]# time getent group lgroup2@example1


real    0m19.577s
user    0m0.003s
sys     0m0.004s


8. check sssd process 

[root@ibm-x3650m4-01-vm-11 ~]# ps -ef | grep sssd

9. As we can see sssd_be process was shutdown and restarted by watchdog. 

root       34653       1  0 06:58 ?        00:00:00 /usr/sbin/sssd -i --logger=files
root       34656   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root       34658   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root       34659   34653  0 06:58 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
root       34712   34653  0 07:00 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain example1 --uid 0 --gid 0 --logger=files



As we can see sssd_be process was shutdown and started as the process id changed. 


10. Below is the logs in sssd.log

(2020-07-13  7:00:22): [sssd] [sbus_dispatch_reconnect] (0x0400): Connection lost. Terminating active requests.
(2020-07-13  7:00:22): [sssd] [sbus_dispatch_reconnect] (0x4000): Remote client terminated the connection. Releasing data...
(2020-07-13  7:00:22): [sssd] [sbus_connection_free] (0x4000): Connection 0x55b2548db630 will be freed during next loop!
(2020-07-13  7:00:22): [sssd] [mt_svc_exit_handler] (0x1000): SIGCHLD handler of service example1 called
(2020-07-13  7:00:22): [sssd] [svc_child_info] (0x0020): Child [34657] ('example1':'%BE_example1') was terminated by own WATCHDOG
(2020-07-13  7:00:22): [sssd] [mt_svc_restart] (0x0400): Scheduling service example1 for restart 1
(2020-07-13  7:00:22): [sssd] [start_service] (0x0100): Queueing service example1 for startup

Comment 7 errata-xmlrpc 2020-11-04 02:05:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4569


Note You need to log in before you can comment on or make changes to this bug.