Bug 1377127 - SSSD subprocesses are no longer monitored by the main sssd process, but self-monitor
Summary: SSSD subprocesses are no longer monitored by the main sssd process, but self-...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: SSSD Maintainers
QA Contact: Steeve Goveas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-18 20:20 UTC by Amith
Modified: 2016-09-20 19:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-20 19:37:25 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Amith 2016-09-18 20:20:07 UTC
Description of problem:
This issue was observed when SSSD was unresponsive to a SIGSTOP signal on sssd_be process. By-default sssd process should restart itself after approx. 91 seconds and also show relevant message on running service status.

See the example from RHEL-7.2 test machine:
 
# service sssd status
Redirecting to /bin/systemctl status  sssd.service
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sssd.service.d
           └─journal.conf
   Active: active (running) since Mon 2016-09-19 00:46:54 IST; 2min 34s ago
  Process: 27641 ExecStart=/usr/sbin/sssd -D -f (code=exited, status=0/SUCCESS)
 Main PID: 27642 (sssd)
   CGroup: /system.slice/sssd.service
           ├─27642 /usr/sbin/sssd -D -f
           ├─27644 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
           ├─27645 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
           └─27682 /usr/libexec/sssd/sssd_be --domain LDAP --uid 0 --gid 0 --debug-to-files

Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com systemd[1]: Starting System Security Services Daemon...
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[be[LDAP]][27643]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[nss][27644]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com sssd[pam][27645]: Starting up
Sep 19 00:46:54 vm-idm-012.lab.eng.pnq.redhat.com systemd[1]: Started System Security Services Daemon.
Sep 19 00:48:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: Killing service [LDAP], not responding to pings!
Sep 19 00:49:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[27642]: [LDAP][27643] is not responding to SIGTERM. Sending SIGKILL.
Sep 19 00:49:14 vm-idm-012.lab.eng.pnq.redhat.com sssd[be[LDAP]][27682]: Starting up


In the case of RHEL-7.3, sssd process never restarts and strangely logs no error message on /var/log/sssd/sssd.log. Service should be manually restarted, for sssd to function properly.

Version-Release number of selected component (if applicable):
sssd-1.14.0-42.el7.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Setup sssd.conf with debug_level = 0x0270 in sssd section, as given below:

[sssd]
config_file_version = 2
services = nss, pam
domains = LDAP
debug_level = 0x0270

[domain/LDAP]
debug_level = 0xFFF0
id_provider = ldap
auth_provider = ldap
ldap_uri = ldap://<LDAP_SERVER>
ldap_tls_cacert = /etc/openldap/certs/cacert.asc

2. Send SIGSTOP signal to sssd_be process.
# kill -s SIGSTOP `pidof sssd_be`

3. Wait for some time and monitor the log messages to see whether the process restarts itself. Also, monitor the status of sssd service for following messages:

sssd[27642]: Killing service [LDAP], not responding to pings!
sssd[27642]: [LDAP][27643] is not responding to SIGTERM. Sending SIGKILL.
sssd[be[LDAP]][27682]: Starting up

4. By default sssd_be process should restart.

Actual results:
SSSD is unresponsive and sssd_be process never restarts. No error message logged.

Expected results:
Signalled process should restart and SSSD should function properly with relevant messages logged.

Additional info:

Comment 1 Jakub Hrozek 2016-09-18 20:32:27 UTC
I'm sorry, but this is expected with 7.3. The services are no longer monitored by the sssd process, but instead self-monitor. This is a first step towards making it possible to socket-activate services and remove the monitor if possible.

I suspect SIGSTOP is a test case to restart SSSD. But since SIGSTOP cannot be caught or ignored, it really stops the process and by effect also the self-monitoring.

I think it would be better to come up with a different test case than one involving SIGSTOP. In the meantime, I'm removing the Regression keyword, but I will leave the bug open until we come up with some testcase, then we can close this bug.

Comment 2 Jakub Hrozek 2016-09-18 20:34:26 UTC
Hmm, I wonder if just running SIGCONT after at least 30 seconds would make sssd_be restart (not tested, just an idea..)b

Comment 10 Jakub Hrozek 2016-09-20 19:37:25 UTC
Please reopen if the processes do not restart cleanly or the watchdog doesn't work.


Note You need to log in before you can comment on or make changes to this bug.