Bug 1377127

Summary: SSSD subprocesses are no longer monitored by the main sssd process, but self-monitor
Product: Red Hat Enterprise Linux 7 Reporter: Amith <apeetham>
Component: sssdAssignee: SSSD Maintainers <sssd-maint>
Status: CLOSED NOTABUG QA Contact: Steeve Goveas <sgoveas>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.3CC: apeetham, grajaiya, jhrozek, lslebodn, mkosek, mzidek, pbrezina
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-09-20 19:37:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Amith 2016-09-18 20:20:07 UTC
Description of problem:
This issue was observed when SSSD was unresponsive to a SIGSTOP signal on sssd_be process. By-default sssd process should restart itself after approx. 91 seconds and also show relevant message on running service status.

See the example from RHEL-7.2 test machine:
 
# service sssd status
Redirecting to /bin/systemctl status  sssd.service
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sssd.service.d
           └─journal.conf
   Active: active (running) since Mon 2016-09-19 00:46:54 IST; 2min 34s ago
  Process: 27641 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS)
 Main PID: 27642 (sssd)
   CGroup: /system.slice/sssd.service
           ├─27642 /usr/sbin/sssd -D -f
           ├─27644 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
           ├─27645 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
           └─27682 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0 --debug-to-files

Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com systemd[1]: Starting System Security Services Daemon...
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[be[LDAP]][27643]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[nss][27644]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[pam][27645]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com systemd[1]: Started System Security Services Daemon.
Sep 19 00:48:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: Killing service [LDAP], not responding to pings!
Sep 19 00:49:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: [LDAP][27643] is not responding to SIGTERM. Sending SIGKILL.
Sep 19 00:49:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[be[LDAP]][27682]: Starting up


In the case of RHEL-7.3, sssd process never restarts and strangely logs no error message on /var/log/sssd/sssd.log. Service should be manually restarted, for sssd to function properly.

Version-Release number of selected component (if applicable):
sssd-1.14.0-42.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup sssd.conf with debug_level = 0x0270 in sssd section, as given below:

[sssd]
config_file_version = 2
services = nss, pam
domains = LDAP
debug_level = 0x0270

[domain/LDAP]
debug_level = 0xFFF0
id_provider = ldap
auth_provider = ldap
ldap_uri = ldap://<LDAP_SERVER>
ldap_tls_cacert = /etc/openldap/certs/cacert.asc

2. Send SIGSTOP signal to sssd_be process.
# kill -s SIGSTOP `pidof sssd_be`

3. Wait for some time and monitor the log messages to see whether the process restarts itself. Also, monitor the status of sssd service for following messages:

sssd[27642]: Killing service [LDAP], not responding to pings!
sssd[27642]: [LDAP][27643] is not responding to SIGTERM. Sending SIGKILL.
sssd[be[LDAP]][27682]: Starting up

4. By default sssd_be process should restart.

Actual results:
SSSD is unresponsive and sssd_be process never restarts. No error message logged.

Expected results:
Signalled process should restart and SSSD should function properly with relevant messages logged.

Additional info:

Comment 1 Jakub Hrozek 2016-09-18 20:32:27 UTC
I'm sorry, but this is expected with 7.3. The services are no longer monitored by the sssd process, but instead self-monitor. This is a first step towards making it possible to socket-activate services and remove the monitor if possible.

I suspect SIGSTOP is a test case to restart SSSD. But since SIGSTOP cannot be caught or ignored, it really stops the process and by effect also the self-monitoring.

I think it would be better to come up with a different test case than one involving SIGSTOP. In the meantime, I'm removing the Regression keyword, but I will leave the bug open until we come up with some testcase, then we can close this bug.

Comment 2 Jakub Hrozek 2016-09-18 20:34:26 UTC
Hmm, I wonder if just running SIGCONT after at least 30 seconds would make sssd_be restart (not tested, just an idea..)b

Comment 10 Jakub Hrozek 2016-09-20 19:37:25 UTC
Please reopen if the processes do not restart cleanly or the watchdog doesn't work.