RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1796466 - SSSD service is being restarted in the loop after fatal failure even if this doesn't make any sense
Summary: SSSD service is being restarted in the loop after fatal failure even if this ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: sssd
Version: 9.0
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: rc
: ---
Assignee: Alejandro López
QA Contact: sssd-qe
URL:
Whiteboard: sync-to-jira review
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-30 13:38 UTC by Amith
Modified: 2022-03-29 18:19 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-30 07:30:11 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
SSSD Domain log (1.97 MB, text/plain)
2020-01-30 13:38 UTC, Amith
no flags Details
GDB core dump gzip file (2.59 MB, application/x-core)
2020-01-30 17:49 UTC, Amith
no flags Details

Description Amith 2020-01-30 13:38:29 UTC
Created attachment 1656499 [details]
SSSD Domain log

Description of problem:
This issue was already addressed in bug 1051360 and is part of our automated sanity-misconfiguration test suite. We observed this failure during regression runs for sanity tests. Assigning a random value to "ldap_search_base" ie, "ldap_search_base = \$undefined_var" causes SSSD crash.

Looks like SSSD process dives into a loop and tries to restart itself every six seconds. Here is the SSSD service status:

# systemctl restart sssd; systemctl status sssd
Job for sssd.service failed because a fatal signal was delivered causing the control process to dump core.
See "systemctl status sssd.service" and "journalctl -xe" for details.
● sssd.service - System Security Services Daemon
   Loaded: loaded (/usr/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: core-dump) since Thu 2020-01-30 18:48:25 IST; 42ms ago
  Process: 23111 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=dumped, signal=ABRT)
 Main PID: 23111 (code=dumped, signal=ABRT)

Jan 30 18:48:25 vm-idm-013.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Main process exited, code=dumped, status=6/ABRT
Jan 30 18:48:25 vm-idm-013.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Failed with result 'core-dump'.
Jan 30 18:48:25 vm-idm-013.lab.eng.pnq.redhat.com systemd[1]: Failed to start System Security Services Daemon.

Version-Release number of selected component (if applicable):
sssd-2.2.3-11.el8.x86_64

How reproducible:
Always

Steps to Reproduce:

1. Configure sssd.conf as given below:

[sssd]
config_file_version = 2
domains = LDAP
services = nss, pam

[domain/LDAP]
debug_level = 0xFFF0
id_provider = ldap
ldap_uri = ldap://ldapserver.example.com
ldap_search_base = \$undefined_var

2. Restart SSSD service and monitor the process. It crashes.

3. Run the following command to monitor sssd_be process every 2 seconds. 

# watch 'pgrep -l sssd_be >> res ; echo "-------" >> res; tail -n 20 res'

26414 sssd_be
-------
26414 sssd_be
-------
26454 sssd_be
-------
26454 sssd_be
-------
26454 sssd_be
-------
26493 sssd_be
-------
26493 sssd_be
-------
26493 sssd_be
-------
26529 sssd_be
-------
26529 sssd_be
-------
26529 sssd_be
-------
26565 sssd_be
-------
26565 sssd_be


Actual results:
sssd processes gets killed every 6 seconds.

Expected results:


Additional info:
Attached the domain log.

Comment 1 Alexey Tikhonov 2020-01-30 14:44:13 UTC
Please provide backtrace/coredump

Comment 3 Amith 2020-01-30 17:46:54 UTC
BACKTRACE FULL:
---------------------------------------------------------------------------------------------------------------

(gdb) bt full
#0  0x00007f0d9492aa0b in kill () at ../sysdeps/unix/syscall-template.S:78
No locals.
#1  0x0000561ac3305774 in monitor_quit (ret=1, mt_ctx=<optimized out>) at src/monitor/monitor.c:1444
        svc = <optimized out>
        pid = <optimized out>
        status = 134
        error = 0
        kret = <optimized out>
        killed = <optimized out>
        __FUNCTION__ = "monitor_quit"
#2  0x0000561ac3305ec5 in monitor_restart_service (svc=0x561ac4b17a70) at src/monitor/monitor.c:2309
        restart_delay = <optimized out>
        tv = <optimized out>
        mt_ctx = 0x561ac4ad3c00
        now = <optimized out>
        te = <optimized out>
        mt_ctx = <optimized out>
        restart_delay = <optimized out>
        now = <optimized out>
        te = <optimized out>
        tv = <optimized out>
        __FUNCTION__ = "monitor_restart_service"
        __debug_macro_level = <optimized out>
        __debug_macro_level = <optimized out>
#3  mt_svc_exit_handler (pid=<optimized out>, wait_status=<optimized out>, pvt=<optimized out>) at src/monitor/monitor.c:2272
        svc = 0x561ac4b17a70
        __FUNCTION__ = "mt_svc_exit_handler"
#4  0x00007f0d968edd34 in sss_child_invoke_cb (ev=ev@entry=0x561ac4ae0eb0, imm=imm@entry=0x561ac4b0e360, pvt=pvt@entry=0x561ac4b16470) at src/util/child_common.c:182
        cb_pvt = 0x561ac4b16470
        child_ctx = 0x561ac4b177b0
        key = {type = HASH_KEY_ULONG, {str = 0x2695 <error: Cannot access memory at address 0x2695>, c_str = 0x2695 <error: Cannot access memory at address 0x2695>, ul = 9877}}
        error = <optimized out>
        __FUNCTION__ = "sss_child_invoke_cb"
#5  0x00007f0d9543e879 in tevent_common_invoke_immediate_handler (im=0x561ac4b0e360, removed=removed@entry=0x0) at ../../tevent_immediate.c:166
        handler_ev = 0x561ac4ae0eb0
        ev = 0x561ac4ae0eb0
        cur = {prev = <optimized out>, next = <optimized out>, event_ctx = <optimized out>, wrapper = 0x0, busy = <optimized out>, destroyed = <optimized out>, handler = <optimized out>, 
          private_data = <optimized out>, handler_name = <optimized out>, create_location = <optimized out>, schedule_location = <optimized out>, cancel_fn = <optimized out>, additional_data = <optimized out>}
#6  0x00007f0d9543e8a7 in tevent_common_loop_immediate (ev=ev@entry=0x561ac4ae0eb0) at ../../tevent_immediate.c:203
        im = <optimized out>
        ret = <optimized out>
#7  0x00007f0d95440823 in poll_event_loop_once (ev=0x561ac4ae0eb0, location=<optimized out>) at ../../tevent_poll.c:616
        tval = <optimized out>
#8  0x00007f0d9543db15 in _tevent_loop_once (ev=ev@entry=0x561ac4ae0eb0, location=location@entry=0x7f0d9849b659 "src/util/server.c:719") at ../../tevent.c:772
        ret = <optimized out>
        nesting_stack_ptr = 0x0
#9  0x00007f0d9543ddbb in tevent_common_loop_wait (ev=0x561ac4ae0eb0, location=0x7f0d9849b659 "src/util/server.c:719") at ../../tevent.c:895
        ret = <optimized out>
#10 0x00007f0d98479927 in server_loop (main_ctx=0x561ac4ae7c10) at src/util/server.c:719
No locals.
#11 0x0000561ac330375e in main (argc=<optimized out>, argv=<optimized out>) at src/monitor/monitor.c:2612
--Type <RET> for more, q to quit, c to continue without paging--
        opt = <optimized out>
        pc = <optimized out>
        opt_daemon = 0
        opt_interactive = 1
        opt_genconf = 0
        opt_version = 0
        opt_netlinkoff = 0
        opt_config_file = 0x0
        opt_logger = 0x561ac4ab2790 "files"
        config_file = <optimized out>
        opt_genconf_section = 0x0
        flags = <optimized out>
        main_ctx = 0x561ac4ae7c10
        tmp_ctx = 0x561ac4ad3530
        monitor = 0x561ac4ad3c00
        ret = 0
        uid = <optimized out>
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x561ac35111a0 <poptHelpOptions>, val = 0, descrip = 0x561ac330ce47 "Help options:", argDescrip = 0x0}, {
            longName = 0x561ac330ce55 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x561ac35112a8 <debug_level>, val = 0, descrip = 0x561ac330ce61 "Debug level", argDescrip = 0x0}, {
            longName = 0x561ac330ce6d "debug-to-files", shortName = 102 'f', argInfo = 1073741824, arg = 0x561ac3511184 <debug_to_file>, val = 0, 
            descrip = 0x561ac330bb98 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x561ac330ce7c "debug-to-stderr", shortName = 0 '\000', argInfo = 1073741824, 
            arg = 0x561ac3511180 <debug_to_stderr>, val = 0, descrip = 0x561ac330bbd0 "Send the debug output to stderr directly.", argDescrip = 0x0}, {longName = 0x561ac330ce8c "debug-timestamps", 
            shortName = 0 '\000', argInfo = 2, arg = 0x561ac3511260 <debug_timestamps>, val = 0, descrip = 0x561ac330ce9d "Add debug timestamps", argDescrip = 0x0}, {
            longName = 0x561ac330ceb2 "debug-microseconds", shortName = 0 '\000', argInfo = 2, arg = 0x561ac3511280 <debug_microseconds>, val = 0, descrip = 0x561ac330bc00 "Show timestamps with microseconds", 
            argDescrip = 0x0}, {longName = 0x561ac330cec9 "logger", shortName = 0 '\000', argInfo = 1, arg = 0x7ffd12277db0, val = 0, descrip = 0x561ac330cec5 "Set logger", 
            argDescrip = 0x561ac330ced0 "stderr|files|journald"}, {longName = 0x561ac330cee6 "daemon", shortName = 68 'D', argInfo = 0, arg = 0x7ffd12277d94, val = 0, 
            descrip = 0x561ac330ceed "Become a daemon (default)", argDescrip = 0x0}, {longName = 0x561ac330cf07 "interactive", shortName = 105 'i', argInfo = 0, arg = 0x7ffd12277d98, val = 0, 
            descrip = 0x561ac330bc28 "Run interactive (not a daemon)", argDescrip = 0x0}, {longName = 0x561ac330cf13 "disable-netlink", shortName = 0 '\000', argInfo = 1073741824, arg = 0x7ffd12277da4, 
            val = 0, descrip = 0x561ac330cf23 "Disable netlink interface", argDescrip = 0x0}, {longName = 0x561ac330e657 "config", shortName = 99 'c', argInfo = 1, arg = 0x7ffd12277da8, val = 0, 
            descrip = 0x561ac330bc48 "Specify a non-default config file", argDescrip = 0x0}, {longName = 0x561ac330cf3d "genconf", shortName = 103 'g', argInfo = 0, arg = 0x7ffd12277d9c, val = 0, 
            descrip = 0x561ac330bc70 "Refresh the configuration database, then exit", argDescrip = 0x0}, {longName = 0x561ac330cf45 "genconf-section", shortName = 115 's', argInfo = 1, arg = 0x7ffd12277db8, 
            val = 0, descrip = 0x561ac330bca0 "Similar to --genconf, but only refreshes the given section", argDescrip = 0x0}, {longName = 0x561ac330e62f "version", shortName = 0 '\000', argInfo = 0, 
            arg = 0x7ffd12277da0, val = 0, descrip = 0x561ac330cf55 "Print version number and exit", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, 
            argDescrip = 0x0}}
        __FUNCTION__ = "main"
(gdb) 
(gdb)

Comment 4 Amith 2020-01-30 17:49:39 UTC
Created attachment 1656549 [details]
GDB core dump gzip file

Comment 5 Alexey Tikhonov 2020-01-30 18:15:21 UTC
(In reply to Amith from comment #3)
> (gdb) bt full
> #0  0x00007f0d9492aa0b in kill () at ../sysdeps/unix/syscall-template.S:78
> No locals.
> #1  0x0000561ac3305774 in monitor_quit (ret=1, mt_ctx=<optimized out>) at src/monitor/monitor.c:1444
> #2  0x0000561ac3305ec5 in monitor_restart_service (svc=0x561ac4b17a70) at src/monitor/monitor.c:2309
> #3  mt_svc_exit_handler (pid=<optimized out>, wait_status=<optimized out>,  pvt=<optimized out>) at src/monitor/monitor.c:2272
> #4  0x00007f0d968edd34 in sss_child_invoke_cb (ev=ev@entry=0x561ac4ae0eb0, imm=imm@entry=0x561ac4b0e360, pvt=pvt@entry=0x561ac4b16470) at src/util/child_common.c:182
> #5  0x00007f0d9543e879 in tevent_common_invoke_immediate_handler (im=0x561ac4b0e360, removed=removed@entry=0x0) at ../../tevent_immediate.c:166


Ticket title says "sssd be crashes" but this is backtrace (and core) of the monitor process quitting after exceeding limit of failed attempts to start service. The only question to this backtrace is: "why does monitor_quit() end up with kill(-getpgrp(), SIGTERM) instead of doing monitor_cleanup()+exit()" but this is very minor issue and I guess it has nothing to do with this ticket.


Could you please provide backtrace of "sssd be crash"?

Comment 6 Amith 2020-01-30 18:39:12 UTC
(In reply to Alexey Tikhonov from comment #5)
> (In reply to Amith from comment #3)
> > (gdb) bt full
> > #0  0x00007f0d9492aa0b in kill () at ../sysdeps/unix/syscall-template.S:78
> > No locals.
> > #1  0x0000561ac3305774 in monitor_quit (ret=1, mt_ctx=<optimized out>) at src/monitor/monitor.c:1444
> > #2  0x0000561ac3305ec5 in monitor_restart_service (svc=0x561ac4b17a70) at src/monitor/monitor.c:2309
> > #3  mt_svc_exit_handler (pid=<optimized out>, wait_status=<optimized out>,  pvt=<optimized out>) at src/monitor/monitor.c:2272
> > #4  0x00007f0d968edd34 in sss_child_invoke_cb (ev=ev@entry=0x561ac4ae0eb0, imm=imm@entry=0x561ac4b0e360, pvt=pvt@entry=0x561ac4b16470) at src/util/child_common.c:182
> > #5  0x00007f0d9543e879 in tevent_common_invoke_immediate_handler (im=0x561ac4b0e360, removed=removed@entry=0x0) at ../../tevent_immediate.c:166
> 
> 
> Ticket title says "sssd be crashes" but this is backtrace (and core) of the
> monitor process quitting after exceeding limit of failed attempts to start
> service. The only question to this backtrace is: "why does monitor_quit()
> end up with kill(-getpgrp(), SIGTERM) instead of doing
> monitor_cleanup()+exit()" but this is very minor issue and I guess it has
> nothing to do with this ticket.
> 
> 
> Could you please provide backtrace of "sssd be crash"?

I copied the title from the automated test case, which represents the older bug 1051360. It was my mistake as the intention was to highlight the fix from the older bug.
You are right, the main SSSD process gets killed every 6 seconds and attempts to start other sssd processes. This is not the expected behavior because in the case of RHEL-8.1 sssd simply fails to start and throws an error.
But in 8.2, the sssd process dives into a loop and gets killed every 6 seconds. Thats why i captured backtrace for main SSSD pid and not for sssd_be.

See the change in process id's every 6 seconds:
19299 sssd
19302 sssd_be
-------
19299 sssd
19302 sssd_be
-------
19334 sssd
19335 sssd_be
19336 sssd_be
-------
19334 sssd
19335 sssd_be
-------
19334 sssd
19335 sssd_be
-------
19372 sssd
19375 sssd_be
-------
19372 sssd
19375 sssd_be
-------
19372 sssd
19375 sssd_be

However, if you still need sssd_be backtrace then let me know, I will provide that to you.

Comment 7 Amith 2020-01-30 18:55:07 UTC
Lets not leave anything, here is the SSSD_BE BACKTRACE:
-----------------------------------------------------------------------------------------------------------------------------------

(gdb) bt full
#0  0x00007fd14358317b in epoll_wait (epfd=0, events=events@entry=0x7ffcdf9d6e9c, maxevents=maxevents@entry=1, timeout=timeout@entry=9578) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
        resultvar = 18446744073709551612
        sc_ret = <optimized out>
#1  0x00007fd143fd8885 in epoll_event_loop (tvalp=0x7ffcdf9d6e80, epoll_ev=0x55c49680acf0) at ../../tevent_epoll.c:650
        ret = <optimized out>
        i = <optimized out>
        timeout = 9578
        wait_errno = <optimized out>
        events = {{events = 1, data = {ptr = 0x55c49681a540, fd = -1769888448, u32 = 2525078848, u64 = 94302827029824}}}
        ret = <optimized out>
        i = <optimized out>
        events = <optimized out>
        timeout = <optimized out>
        wait_errno = <optimized out>
        fde = <optimized out>
        flags = <optimized out>
        mpx_fde = <optimized out>
        handled_fde = <optimized out>
        handled_mpx = <optimized out>
#2  epoll_event_loop_once (ev=<optimized out>, location=<optimized out>) at ../../tevent_epoll.c:937
        epoll_ev = 0x55c49680acf0
        tval = {tv_sec = 9, tv_usec = 577100}
        panic_triggered = false
#3  0x00007fd143fd699b in std_event_loop_once (ev=0x55c49680aa60, location=0x7fd14702f659 "src/util/server.c:719") at ../../tevent_standard.c:110
        glue_ptr = <optimized out>
        glue = 0x55c49680aba0
        ret = <optimized out>
#4  0x00007fd143fd1b15 in _tevent_loop_once (ev=ev@entry=0x55c49680aa60, location=location@entry=0x7fd14702f659 "src/util/server.c:719") at ../../tevent.c:772
        ret = <optimized out>
        nesting_stack_ptr = 0x0
#5  0x00007fd143fd1dbb in tevent_common_loop_wait (ev=0x55c49680aa60, location=0x7fd14702f659 "src/util/server.c:719") at ../../tevent.c:895
        ret = <optimized out>
#6  0x00007fd143fd692b in std_event_loop_wait (ev=0x55c49680aa60, location=0x7fd14702f659 "src/util/server.c:719") at ../../tevent_standard.c:141
        glue_ptr = <optimized out>
        glue = 0x55c49680aba0
        ret = <optimized out>
#7  0x00007fd14700d927 in server_loop (main_ctx=0x55c49680ad80) at src/util/server.c:719
No locals.
#8  0x000055c495bfc62b in main (argc=8, argv=<optimized out>) at src/providers/data_provider_be.c:743
        opt = <optimized out>
        pc = <optimized out>
        opt_logger = 0x55c4967fb350 "files"
        be_domain = 0x55c4967e8510 "implicit_files"
        srv_name = 0x55c4968058a0 "sssd[be[implicit_files]]"
        main_ctx = 0x55c49680ad80
        confdb_path = <optimized out>
        ret = 0
        uid = 0
        gid = 0
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x55c495e2b3c0 <poptHelpOptions>, val = 0, descrip = 0x55c495c17d50 "Help options:", argDescrip = 0x0}, {
            longName = 0x55c495c17d5e "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x55c495e2b4c8 <debug_level>, val = 0, descrip = 0x55c495c17d6a "Debug level", argDescrip = 0x0}, {
--Type <RET> for more, q to quit, c to continue without paging--
            longName = 0x55c495c17d76 "debug-to-files", shortName = 102 'f', argInfo = 1073741824, arg = 0x55c495e2b3a4 <debug_to_file>, val = 0, 
            descrip = 0x55c495c18528 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x55c495c17d85 "debug-to-stderr", shortName = 0 '\000', argInfo = 1073741824, 
            arg = 0x55c495e2b3a0 <debug_to_stderr>, val = 0, descrip = 0x55c495c18560 "Send the debug output to stderr directly.", argDescrip = 0x0}, {longName = 0x55c495c17d95 "debug-timestamps", 
            shortName = 0 '\000', argInfo = 2, arg = 0x55c495e2b488 <debug_timestamps>, val = 0, descrip = 0x55c495c17da6 "Add debug timestamps", argDescrip = 0x0}, {
            longName = 0x55c495c17dbb "debug-microseconds", shortName = 0 '\000', argInfo = 2, arg = 0x55c495e2b4a0 <debug_microseconds>, val = 0, descrip = 0x55c495c18590 "Show timestamps with microseconds", 
            argDescrip = 0x0}, {longName = 0x55c495c17dd2 "logger", shortName = 0 '\000', argInfo = 1, arg = 0x7ffcdf9d6f88, val = 0, descrip = 0x55c495c17dce "Set logger", 
            argDescrip = 0x55c495c17dd9 "stderr|files|journald"}, {longName = 0x55c495c17def "uid", shortName = 0 '\000', argInfo = 2, arg = 0x7ffcdf9d6f80, val = 0, 
            descrip = 0x55c495c185b8 "The user ID to run the server as", argDescrip = 0x0}, {longName = 0x55c495c17df3 "gid", shortName = 0 '\000', argInfo = 2, arg = 0x7ffcdf9d6f84, val = 0, 
            descrip = 0x55c495c185e0 "The group ID to run the server as", argDescrip = 0x0}, {longName = 0x55c495c18bad "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7ffcdf9d6f90, val = 0, 
            descrip = 0x55c495c18608 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, 
            argDescrip = 0x0}}
        __FUNCTION__ = "main"
(gdb) 
(gdb) 
(gdb) 
(gdb) detach
Detaching from program: /usr/libexec/sssd/sssd_be, process 24685
[Inferior 1 (process 24685) detached]
(gdb) quit

Comment 8 Alexey Tikhonov 2020-01-30 18:59:22 UTC
(In reply to Amith from comment #6)
> I copied the title from the automated test case, which represents the older
> bug 1051360. It was my mistake as the intention was to highlight the fix
> from the older bug.

But description also says: "Assigning a random value to "ldap_search_base" ie, "ldap_search_base = \$undefined_var" causes SSSD crash."
Is this correct?
If this is correct then I would like to see backtrace/core of *crashing* process (Monitor doesn't crash but quits)
Otherwise please correct title.


> You are right, the main SSSD process gets killed every 6 seconds and
> attempts to start other sssd processes.

I am not sure it is Monitor ("sssd") who restarts itself. Monitor quits (sends SIGTERM itself) but it is not expected to restart itself (IIUC)

> This is not the expected behavior 
> because in the case of RHEL-8.1 sssd simply fails to start and throws an
> error.
> But in 8.2, the sssd process dives into a loop and gets killed every 6
> seconds. Thats why i captured backtrace for main SSSD pid and not for
> sssd_be.

Could you please provide corresponding journal log quote for both 8.1 and 8.2?

Comment 9 Alexey Tikhonov 2020-01-31 10:23:53 UTC
I think the reason of this change in behavior is https://pagure.io/SSSD/sssd/c/b1ea33eca64a0429513fcfe2ba7402ff56889b46
Justification is given in https://pagure.io/SSSD/sssd/issue/4040

Amith, wy do you think it is a bug?

Comment 10 Sumit Bose 2020-01-31 10:56:41 UTC
(In reply to Alexey Tikhonov from comment #9)
> I think the reason of this change in behavior is
> https://pagure.io/SSSD/sssd/c/b1ea33eca64a0429513fcfe2ba7402ff56889b46
> Justification is given in https://pagure.io/SSSD/sssd/issue/4040

Hi,

maybe 'on-failure' is a bit too heavy for SSSD since typically the sssd monitor will return with an error during startup when it cannot start for a reason, e.g. a broken configuration. As long as this reason is not solved a restart will just trigger the same error. So maybe 'on-abnormal', e.g. a real crash, would be more suitable and would satisfy the use case described in  https://pagure.io/SSSD/sssd/issue/4040 as well?

bye,
Sumit

> 
> Amith, wy do you think it is a bug?

Comment 11 Alexey Tikhonov 2020-01-31 13:40:39 UTC
I am changing the title to the best of my understanding of the issues. Please feel free to correct it if I got this wrong.

Comment 12 Alexey Tikhonov 2020-01-31 14:09:33 UTC
(In reply to Sumit Bose from comment #10)
> maybe 'on-failure' is a bit too heavy for SSSD since typically the sssd
> monitor will return with an error during startup when it cannot start for a
> reason, e.g. a broken configuration. As long as this reason is not solved a
> restart will just trigger the same error. So maybe 'on-abnormal', e.g. a
> real crash, would be more suitable and would satisfy the use case described
> in  https://pagure.io/SSSD/sssd/issue/4040 as well?

Overall idea looks good.

But the question is if Monitor really returns error code in this case.
As I wrote in comment 5, according to the backtrace "monitor_quit() ends up with kill(-getpgrp(), SIGTERM) instead of doing monitor_cleanup()+exit()". I am not sure if this will look differently for systemd than read crash. Need to check.

Comment 13 Amith 2020-02-04 12:59:35 UTC
(In reply to Alexey Tikhonov from comment #9)
> I think the reason of this change in behavior is
> https://pagure.io/SSSD/sssd/c/b1ea33eca64a0429513fcfe2ba7402ff56889b46
> Justification is given in https://pagure.io/SSSD/sssd/issue/4040
> 
> Amith, wy do you think it is a bug?

Hi, my apologies for the delayed response. Here is the journalctl log from RHEL-8.2 and RHEL-8.1.0 :

-------- SSSD.CONF FILE ----------------------
[sssd]
config_file_version = 2
domains = LDAP
services = nss, pam

[domain/LDAP]
debug_level = 0xFFF0
id_provider = ldap
ldap_uri = ldap://ldapserver.example.com
ldap_search_base = \$undefined_var


Upon restarting SSSD service, following logs were generated in RHEL-8.2 :
------------------------------------------------------------------------------------------------------
Feb 04 18:08:09 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Stopping System Security Services Daemon...
-- Subject: Unit sssd.service has begun shutting down
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has begun shutting down.
Feb 04 18:08:09 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[LDAP]][23758]: Shutting down (status = 0)
Feb 04 18:08:09 vm-idm-002.lab.eng.pnq.redhat.com sssd[nss][23759]: Shutting down (status = 0)
Feb 04 18:08:09 vm-idm-002.lab.eng.pnq.redhat.com sssd[pam][23760]: Shutting down (status = 0)
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[implicit_files]][23757]: Shutting down (status = 0)
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Stopped System Security Services Daemon.
-- Subject: Unit sssd.service has finished shutting down
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has finished shutting down.
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Starting System Security Services Daemon...
-- Subject: Unit sssd.service has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has begun starting up.
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: Starting up
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294411 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x560f03a638a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294569 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x560f03a64f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294620 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294710 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a64f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294764 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294867 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x560f03a638a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294926 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x560f03a64f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:294997 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:295081 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a64f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:295134 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x560f03a638a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x560f03a64f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a64f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x560f03a638a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[LDAP]][23790]: Starting up
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[implicit_files]][23789]: Starting up
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:424650 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55ab8e0e78a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:424790 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55ab8e0e8f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:424842 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:424928 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e8f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425002 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425111 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55ab8e0e78a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425166 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55ab8e0e8f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425240 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425325 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e8f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10:425376 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55ab8e0e78a0
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55ab8e0e8f60
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e8f60 "ldb_kv_timeout"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:10 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55ab8e0e78a0 "ldb_kv_callback"
Feb 04 18:08:10 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[LDAP]][23791]: Starting up
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571399 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55d34faeb8a0
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571562 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55d34faecf60
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571613 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571699 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faecf60 "ldb_kv_timeout"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571750 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571852 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55d34faeb8a0
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571907 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55d34faecf60
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:571953 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:572063 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faecf60 "ldb_kv_timeout"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12:572113 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_callback": 0x55d34faeb8a0
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Added timed event "ldb_kv_timeout": 0x55d34faecf60
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Running timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faecf60 "ldb_kv_timeout"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[23787]: (Tue Feb  4 18:08:12 2020) [sssd[be[LDAP]]] [ldb] (0x4000): Destroying timer event 0x55d34faeb8a0 "ldb_kv_callback"
Feb 04 18:08:12 vm-idm-002.lab.eng.pnq.redhat.com sssd[be[LDAP]][23794]: Starting up
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com sssd[nss][23795]: Starting up
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com sssd[pam][23796]: Starting up
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com sssd[nss][23797]: Starting up
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Created slice system-systemd\x2dcoredump.slice.
-- Subject: Unit system-systemd\x2dcoredump.slice has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit system-systemd\x2dcoredump.slice has finished starting up.
-- 
-- The start-up result is done.
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Started Process Core Dump (PID 23798/UID 0).
-- Subject: Unit systemd-coredump has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit systemd-coredump has finished starting up.
-- 
-- The start-up result is done.
Feb 04 18:08:15 vm-idm-002.lab.eng.pnq.redhat.com restraintd[3663]: *** Current Time: Tue Feb 04 18:08:15 2020  Localwatchdog at:  * Disabled! *
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Started Process Core Dump (PID 23804/UID 0).
-- Subject: Unit systemd-coredump has finished start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit systemd-coredump has finished starting up.
-- 
-- The start-up result is done.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd-coredump[23799]: Process 23789 (sssd_be) of user 0 dumped core.
                                                                           
                                                                           Stack trace of thread 23789:
                                                                           #0  0x00007f26fa17c70f __GI_raise (libc.so.6)
                                                                           #1  0x00007f26fa166b25 __GI_abort (libc.so.6)
                                                                           #2  0x00007f26faa76de0 talloc_abort (libtalloc.so.2)
                                                                           #3  0x00007f26faa76f2a talloc_abort_unknown_value (libtalloc.so.2)
                                                                           #4  0x00007f26fdcd7731 sss_ptr_hash_check_type (libsss_util.so)
                                                                           #5  0x00007f26fdcd780d sss_ptr_hash_lookup_internal (libsss_util.so)
                                                                           #6  0x00007f26fdcd7c62 _sss_ptr_hash_lookup (libsss_util.so)
                                                                           #7  0x00007f26fb0c933f sbus_server_connection_has_name (libsss_sbus.so)
                                                                           #8  0x00007f26fb0c969f sbus_server_name_owner_changed (libsss_sbus.so)
                                                                           #9  0x00007f26fdcd762f sss_ptr_hash_delete_cb (libsss_util.so)
                                                                           #10 0x00007f26fae9ed7d hash_delete (libdhash.so.1)
                                                                           #11 0x00007f26fdcd7d86 sss_ptr_hash_delete (libsss_util.so)
                                                                           #12 0x00007f26fdcd7e31 sss_ptr_hash_spy_destructor (libsss_util.so)
                                                                           #13 0x00007f26faa7dc50 _tc_free_internal (libtalloc.so.2)
                                                                           #14 0x00007f26faa79034 _tc_free_internal (libtalloc.so.2)
                                                                           #15 0x00007f26fac953b9 tevent_common_invoke_timer_handler (libtevent.so.0)
                                                                           #16 0x00007f26fac9555e tevent_common_loop_timer_delay (libtevent.so.0)
                                                                           #17 0x00007f26fac967ab epoll_event_loop_once (libtevent.so.0)
                                                                           #18 0x00007f26fac9499b std_event_loop_once (libtevent.so.0)
                                                                           #19 0x00007f26fac8fb15 _tevent_loop_once (libtevent.so.0)
                                                                           #20 0x00007f26fac8fdbb tevent_common_loop_wait (libtevent.so.0)
                                                                           #21 0x00007f26fac9492b std_event_loop_wait (libtevent.so.0)
                                                                           #22 0x00007f26fdccb927 server_loop (libsss_util.so)
                                                                           #23 0x00005590fa7d662b main (sssd_be)
                                                                           #24 0x00007f26fa1686a3 __libc_start_main (libc.so.6)
                                                                           #25 0x00005590fa7d67ee _start (sssd_be)
-- Subject: Process 23789 (sssd_be) dumped core
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- Documentation: man:core(5)
-- 
-- Process 23789 (sssd_be) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Main process exited, code=dumped, status=6/ABRT
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Failed with result 'core-dump'.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: Failed to start System Security Services Daemon.
-- Subject: Unit sssd.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has failed.
-- 
-- The result is failed.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd-coredump[23805]: Process 23787 (sssd) of user 0 dumped core.
                                                                           
                                                                           Stack trace of thread 23787:
                                                                           #0  0x00007fd9145cd70f __GI_raise (libc.so.6)
                                                                           #1  0x00007fd9145b7b25 __GI_abort (libc.so.6)
                                                                           #2  0x00007fd914ec7de0 talloc_abort (libtalloc.so.2)
                                                                           #3  0x00007fd914ec7f2a talloc_abort_unknown_value (libtalloc.so.2)
                                                                           #4  0x00007fd918128731 sss_ptr_hash_check_type (libsss_util.so)
                                                                           #5  0x00007fd91812880d sss_ptr_hash_lookup_internal (libsss_util.so)
                                                                           #6  0x00007fd918128c62 _sss_ptr_hash_lookup (libsss_util.so)
                                                                           #7  0x00007fd91551a33f sbus_server_connection_has_name (libsss_sbus.so)
                                                                           #8  0x00007fd91551a69f sbus_server_name_owner_changed (libsss_sbus.so)
                                                                           #9  0x00007fd91812862f sss_ptr_hash_delete_cb (libsss_util.so)
                                                                           #10 0x00007fd9152efd7d hash_delete (libdhash.so.1)
                                                                           #11 0x00007fd918128d86 sss_ptr_hash_delete (libsss_util.so)
                                                                           #12 0x00007fd918128e31 sss_ptr_hash_spy_destructor (libsss_util.so)
                                                                           #13 0x00007fd914ecec50 _tc_free_internal (libtalloc.so.2)
                                                                           #14 0x00007fd914eca034 _tc_free_internal (libtalloc.so.2)
                                                                           #15 0x00007fd9150e63b9 tevent_common_invoke_timer_handler (libtevent.so.0)
                                                                           #16 0x00007fd9150e655e tevent_common_loop_timer_delay (libtevent.so.0)
                                                                           #17 0x00007fd9150e382f poll_event_loop_once (libtevent.so.0)
                                                                           #18 0x00007fd9150e0b15 _tevent_loop_once (libtevent.so.0)
                                                                           #19 0x00007fd9150e0dbb tevent_common_loop_wait (libtevent.so.0)
                                                                           #20 0x00007fd91811c927 server_loop (libsss_util.so)
                                                                           #21 0x0000558073d1775e main (sssd)
                                                                           #22 0x00007fd9145b96a3 __libc_start_main (libc.so.6)
                                                                           #23 0x0000558073d178ae _start (sssd)
-- Subject: Process 23787 (sssd) dumped core
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- Documentation: man:core(5)
-- 
-- Process 23787 (sssd) crashed and dumped core.
-- 
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Service RestartSec=100ms expired, scheduling restart.
Feb 04 18:08:16 vm-idm-002.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Scheduled restart job, restart counter is at 1.
-- Subject: Automatic restarting of a unit has been scheduled
-- Defined-By: systemd
-- Support: https://access.redhat.com/support

######################################################################################################


JOURNALCTL LOG GENERATED IN RHEL-8.1.0:
------------------------------------------------------------------------------------------------------

Feb 04 18:20:31 vm-idm-024.lab.eng.pnq.redhat.com restraintd[4710]: *** Current Time: Tue Feb 04 18:20:31 2020  Localwatchdog at:  * Disabled! *
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: Stopping System Security Services Daemon...
-- Subject: Unit sssd.service has begun shutting down
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has begun shutting down.
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com sssd[be[implicit_files]][880]: Shutting down
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com sssd[nss][963]: Shutting down
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: Stopped System Security Services Daemon.
-- Subject: Unit sssd.service has finished shutting down
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has finished shutting down.
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: Starting System Security Services Daemon...
-- Subject: Unit sssd.service has begun start-up
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has begun starting up.
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com sssd[15950]: Cannot read config file /etc/sssd/sssd.conf. Please check that the file is accessible only by the owner and owned by root.root.
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Main process exited, code=exited, status=4/NOPERMISSION
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: sssd.service: Failed with result 'exit-code'.
Feb 04 18:21:29 vm-idm-024.lab.eng.pnq.redhat.com systemd[1]: Failed to start System Security Services Daemon.
-- Subject: Unit sssd.service has failed
-- Defined-By: systemd
-- Support: https://access.redhat.com/support
-- 
-- Unit sssd.service has failed.
-- 
-- The result is RESULT.



######################################################################################################

Comment 15 Alexey Tikhonov 2020-02-04 14:54:21 UTC
Crash in the comment 13 is duplicate of bz 1792331.

Main point of this ticket is in comment 10.

Comment 18 RHEL Program Management 2021-07-30 07:30:11 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.

Comment 19 Alexey Tikhonov 2022-03-23 19:45:03 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/6075

Comment 20 Alexey Tikhonov 2022-03-29 18:19:54 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6075

* `master`
    * a049ac715a79ddfad0d67d48fc5c60408cf62127 - systemd: only relaunch after crashes and do not retry forever


Should be fixed in RHEL 8.7/9.1 via rebase.


Note You need to log in before you can comment on or make changes to this bug.