1661183 – SSSD 2.0 has drastically lower sbus timeout than 1.x, this can result in time outs

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1661183 - SSSD 2.0 has drastically lower sbus timeout than 1.x, this can result in time outs

Summary: SSSD 2.0 has drastically lower sbus timeout than 1.x, this can result in time...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	8.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	8.0
Assignee:	SSSD Maintainers
QA Contact:	Steeve Goveas
Docs Contact:
URL:
Whiteboard:
Depends On:	1654537
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-20 10:29 UTC by Jakub Hrozek
Modified:	2020-05-02 19:04 UTC (History)
CC List:	10 users (show)
Fixed In Version:	sssd-2.0.0-39.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1654537
Environment:
Last Closed:	2019-06-13 23:11:51 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	rule-engine: mirror+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 4894	0	None	closed	SSSD 2.0 has drastically lower sbus timeout than 1.x, this can result in time outs	2020-11-20 09:38:07 UTC

Description Jakub Hrozek 2018-12-20 10:29:35 UTC

+++ This bug was initially created as a clone of Bug #1654537 +++

Sorry this is a bit vague for now, but I wanted to get it filed so other folks can take a look at it.

In the last 10 days or so, periodically the openQA FreeIPA tests are running into a failure when logging into a console on a domain member system as a domain user (who has the right to log into that system). What happens is the test enters the username and password, the system grinds away for a bit, then the message "System error" appears on the screen for a few seconds and it cycles back to the login prompt.

The logs don't provide a whole lot more detail, here's what I can see:

Nov 21 13:40:23 client002.domain.local audit[2993]: USER_ACCT pid=2993 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:local_login_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=? acct="test3" exe="/usr/bin/login" hostname=client002.domain.local addr=? terminal=tty4 res=failed'
Nov 21 13:40:23 client002.domain.local login[2993]: pam_sss(login:account): Access denied for user test3: 4 (System error)
Nov 21 13:40:23 client002.domain.local login[2993]: System error

that's from the client journal. The server journal has nothing at all around the same time. There are entries in the dirsrv access log, I'm not sure if any of them are of any interest.

This problem is intermittent; it happens on something like 1 in 3 or 1 in 4 tests. Often it doesn't happen and the tests pass just fine. Because it's intermittent, I can't pin down the initial cause, unfortunately: the *first* update for which I can see a failure like this was an update to xen, so it's pretty clear that isn't the cause.

I'm attaching tarballs of /var/log from both the client and the server for the earliest occurrence of this that I can find ATM.

Client test: https://openqa.fedoraproject.org/tests/310841
Server test: https://openqa.fedoraproject.org/tests/310838

There are two failed attempts to log in. The useful one was as 'test3' which occurs at 13:39:58 UTC with the "System error" being logged at 13:40:23 UTC. There is a later attempt to log in as 'test1', but that one's tricky to analyze as the timestamps are off - after the first failure, openQA loads a snapshot of a 'known good' state, which messes up the comparison between timestamps in the openQA log and the system logs. I'll attach the log tarballs just for the first failure, so things are clear.

--- Additional comment from Adam Williamson on 2018-11-29 02:59 UTC ---



--- Additional comment from Adam Williamson on 2018-11-29 03:00 UTC ---



--- Additional comment from Sumit Bose on 2018-11-29 08:41:25 UTC ---

Hi Adam,

do you have a chance to re-run the tests with 'debug_level=9' at least in the [pam] and [domain/...] section of sssd.conf?

'System error' is typically used by SSSD to handle some internal error.

bye,
Sumit

--- Additional comment from Adam Williamson on 2018-11-29 16:40:03 UTC ---

Thanks Sumit. I'll try tweaking the staging instance to do that. Then we'll have to wait until it happens to bite another update...or I can just re-run the test manually over and over till it happens, I guess.

--- Additional comment from Adam Williamson on 2018-11-30 17:31 UTC ---

OK, here's a log tarball of a case where it failed with the debug log level set to 9. This is from:

https://openqa.stg.fedoraproject.org/tests/415166

The failed login attempt as test1 is tried somewhere around 15:47:52 (UTC) and the 'System error' occurs at 15:48:17 (UTC). That's 10:47:52 and 10:48:17 in the sssd log files.

Hope this helps.

--- Additional comment from Adam Williamson on 2018-11-30 17:33:52 UTC ---

Hum, this in sssd_pam.log looks interesting:

(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_dp_send_req] (0x0100): Sending request with the following data:
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): command: SSS_PAM_ACCT_MGMT
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): domain: domain.local
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): user: test1
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): service: login
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): tty: tty2
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): ruser: not set
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): rhost: not set
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): authtok type: 0
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): newauthtok type: 0
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): priv: 1
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): cli_pid: 2308
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): logon name: test1
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_print_data] (0x0100): flags: 0
(Fri Nov 30 10:47:52 2018) [sssd[pam]] [pam_dom_forwarder] (0x0100): pam_dp_send_req returned 0
(Fri Nov 30 10:47:56 2018) [sssd[pam]] [pam_initgr_cache_remove] (0x2000): [test1] removed from PAM initg
roup cache
(Fri Nov 30 10:47:57 2018) [sssd[pam]] [pam_initgr_cache_remove] (0x2000): [test1] removed from PAM initgroup cache
(Fri Nov 30 10:47:57 2018) [sssd[pam]] [pam_initgr_cache_remove] (0x2000): [test1] removed from PAM initgroup cache
(Fri Nov 30 10:48:17 2018) [sssd[pam]] [sbus_dispatch] (0x4000): Dispatching.
(Fri Nov 30 10:48:17 2018) [sssd[pam]] [sbus_reply_check] (0x4000): D-Bus error [org.freedesktop.DBus.Error.NoReply]: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
(Fri Nov 30 10:48:17 2018) [sssd[pam]] [pam_dp_send_req_done] (0x0020): PAM handler failed [1432158312]: No reply was received

--- Additional comment from Adam Williamson on 2018-12-04 20:01:27 UTC ---

ping?

--- Additional comment from Sumit Bose on 2018-12-05 11:11:07 UTC ---

Hi Adam,

sorry for the delay. 

I think the reason as the same as in https://bugzilla.redhat.com/show_bug.cgi?id=1644919, the selinux_child needs lots of time to finish. 

To me it looks that the 'System error' is returned if selinux_child needs 15s or more and the request on the internal D-Bus connection times out as you have shown in the log snippet above. 

If I understand it correctly we let libdbus decide about the timeout because we call e.g. dbus_connection_send_with_reply() with a timeout value of '-1'. Pavel, is this correct and is there a way to tune this timeout?

Currently I cannot say what causes the delay in selinux_child. At about the time the selinux_child finishes there are the following messages in /var/log/messages:

Nov 30 10:48:15 client003 kernel: SELinux:  Class xdp_socket not defined in policy.
Nov 30 10:48:15 client003 kernel: SELinux: the above unknown classes and permissions will be allowed
Nov 30 10:48:16 client003 kernel: SELinux:  Context unconfined_u:unconfined_r:sandbox_t:s0-s0:c0.c1023 became invalid (unmapped).
Nov 30 10:48:16 client003 kernel: SELinux:  Context system_u:unconfined_r:sandbox_t:s0-s0:c0.c1023 became invalid (unmapped).
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability network_peer_controls=1
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability open_perms=1
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability extended_socket_class=1
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability always_check_network=0
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability cgroup_seclabel=1
Nov 30 10:48:17 client003 kernel: SELinux:  policy capability nnp_nosuid_transition=1

I'm not sure if maybe the 'became invalid' messages are of importance here?

bye,
Sumit

--- Additional comment from Jakub Hrozek on 2018-12-05 11:39:34 UTC ---

(In reply to Sumit Bose from comment #8)
> Currently I cannot say what causes the delay in selinux_child. At about the
> time the selinux_child finishes there are the following messages in
> /var/log/messages:

I think the reason might be a fix for https://pagure.io/SSSD/sssd/issue/3819, see commit https://pagure.io/SSSD/sssd/c/945865a

tl;dr adding the user-role mapping to semanage is an expensive operation. Previously we only added the mapping if the user's SELinux label was different from the default (typically, different from unconfined) but this was breaking the case where the home directory is non-standard, because in that case you need the epxlicit mapping so that libselinux also knows that the files under this user's home directory should be labeled as home_dir_t.

It would be trivial to revert that fix and provide a test build to prove this theory.

This would also correlate with the other bug that Adam filed where the login suddenly takes longer. This is an issue I was sort of afraid of when I implemented the patch but at the same time I didn't see any other way around the bug. Since explicitly adding the user to the selinux database is needed to create the mappings for the user's home directory, maybe we could also call some libsemanage/libselinux function to see if the home directory already has the home_t mapping and avoid adding an uncoditional call.

--- Additional comment from Pavel Březina on 2018-12-05 11:55:15 UTC ---

If there is a real need for bigger timeout we can set a custom value instead of -1.

--- Additional comment from Sumit Bose on 2018-12-05 12:12:23 UTC ---

(In reply to Pavel Březina from comment #10)
> If there is a real need for bigger timeout we can set a custom value instead
> of -1.

I was mainly guessing that the D_Bus error was caused by a timeout, the message is a bit vague here. Do you know what libdbus would pick as default timeout?

--- Additional comment from Pavel Březina on 2018-12-05 15:06:11 UTC ---

There is this 25 second delay before reply to the dbus method is sent, so I it is a timeout issue, although we almost made it though as default dbus timeout is  25 seconds (_DBUS_DEFAULT_TIMEOUT_VALUE in dbus code). So it was actually a race  what finishes sooner.

(Fri Nov 30 10:47:52 2018) [sssd[be[domain.local]]] [write_pipe_handler] (0x0400): All data has been sent!
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [child_sig_handler] (0x1000): Waiting for child [2316].
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [child_sig_handler] (0x0100): child [2316] finished successfully.
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [read_pipe_handler] (0x0400): EOF received, client finished
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [dp_req_done] (0x0400): DP Request [PAM SELinux #6]: Request handler finished [0]: Success
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [_dp_req_recv] (0x0400): DP Request [PAM SELinux #6]: Receiving request data.
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [dp_req_destructor] (0x0400): DP Request [PAM SELinux #6]: Request removed.
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [dp_req_destructor] (0x0400): Number of active DP request: 0
(Fri Nov 30 10:48:17 2018) [sssd[be[domain.local]]] [sbus_issue_request_done] (0x0400): sssd.dataprovider.pamHandler: Success

From selinux child:
(Fri Nov 30 10:47:52 2018) [[sssd[selinux_child[2316]]]] [seuser_needs_update] (0x0400): The SELinux user does need an update
(Fri Nov 30 10:48:17 2018) [[sssd[selinux_child[2316]]]] [pack_buffer] (0x0400): result [0]

I think that sc_set_seuser just takes too long and if possible we should make it faster or increase the timeout:
    needs_update = seuser_needs_update(username, ibuf->seuser,
                                       ibuf->mls_range);
    if (needs_update == true) {
        ret = sc_set_seuser(username, ibuf->seuser, ibuf->mls_range);
        if (ret != EOK) {
            DEBUG(SSSDBG_CRIT_FAILURE, "Cannot set SELinux login context.\n");
            goto fail;
        }
    }

--- Additional comment from Adam Williamson on 2018-12-05 16:42:25 UTC ---

Aha, thanks for the info, guys. Maybe we should do two builds: one with a *shorter* timeout and one with a *longer* timeout. Then I can verify that the bug happens more often (or all the time) with the short timeout, and run the build with a longer timeout through a few iterations and make sure the bug doesn't happen. That should confirm the theory?

--- Additional comment from Adam Williamson on 2018-12-07 03:08:45 UTC ---

So, I spent an hour trying to pick through the thicket of bits in sssd to work out where to change the timeout that side, then realized it'd be much easier to just change it in dbus :P

So I did a couple of scratch dbus builds, one with the timeout dropped to 15 secs, one with it bumped to 60 secs. I ran the tests twice with the reduced timeout and they failed both times; I ran twice with the bumped timeout and they passed both times. I can run a few more times to confirm the impression, but it does look a lot like the timeout is indeed the issue here.

--- Additional comment from Jakub Hrozek on 2018-12-07 08:06:50 UTC ---

(In reply to Adam Williamson from comment #14)
> So, I spent an hour trying to pick through the thicket of bits in sssd to
> work out where to change the timeout that side, then realized it'd be much
> easier to just change it in dbus :P
> 
> So I did a couple of scratch dbus builds, one with the timeout dropped to 15
> secs, one with it bumped to 60 secs. I ran the tests twice with the reduced
> timeout and they failed both times; I ran twice with the bumped timeout and
> they passed both times. I can run a few more times to confirm the
> impression, but it does look a lot like the timeout is indeed the issue here.

Great, thank you very much for testing.

Pavel, can you point us to the place where we can set the timeout (or send a PR before you leave for the break) ?

--- Additional comment from Pavel Březina on 2018-12-07 10:00:30 UTC ---

$ git grep SBUS_MESSAGE_TIMEOUT
src/sbus/request/sbus_request.c:    subreq = sbus_message_send(state, conn, msg, SBUS_MESSAGE_TIMEOUT);
src/sbus/sbus_message.h:#define SBUS_MESSAGE_TIMEOUT -1
src/sbus/sbus_sync.h: * @see SBUS_MESSAGE_TIMEOUT
src/sbus/sync/sbus_sync.c:    ret = sbus_sync_message_send(NULL, conn, msg, SBUS_MESSAGE_TIMEOUT, NULL);
src/sbus/sync/sbus_sync_call.c:    ret = sbus_sync_message_send(tmp_ctx, conn, msg, SBUS_MESSAGE_TIMEOUT,

It is sufficient to change the definition of SBUS_MESSAGE_TIMEOUT in src/sbus/sbus_sync.h. Prior 2.0, the timeout was SSS_CLI_SOCKET_TIMEOUT/2 (= 150 seconds) so we can set it to this value as well, although it is ridiculously long. This can be used as a hotfix but we should check with selinux if there is any way to speed the operation up?

--- Additional comment from Adam Williamson on 2018-12-13 06:37:42 UTC ---

I sent a PR that extends SBUS_MESSAGE_TIMEOUT, for now anyway:

https://github.com/SSSD/sssd/pull/715

CCing mgrepl for the SELinux question - mgrepl, see comments #8 and #9 for more detail on the expensive SELinux operation that's going on here, we want to know if it can be made...less expensive. :)

--- Additional comment from Fedora Update System on 2018-12-13 07:58:30 UTC ---

sssd-2.0.0-5.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-8f0282af53

--- Additional comment from Jakub Hrozek on 2018-12-13 13:30:13 UTC ---

(In reply to Adam Williamson from comment #17)
> I sent a PR that extends SBUS_MESSAGE_TIMEOUT, for now anyway:
> 
> https://github.com/SSSD/sssd/pull/715
> 
> CCing mgrepl for the SELinux question - mgrepl, see comments #8 and #9 for
> more detail on the expensive SELinux operation that's going on here, we want
> to know if it can be made...less expensive. :)

btw when I was looking the original bug I was in contact with Petr Lautrbach, not with mgrepl (just saying so that mgrepl knows who to redirect the needinfo to..)

--- Additional comment from Adam Williamson on 2018-12-13 14:55:22 UTC ---

aha, OK. mgrepl is just my selinux goto. :)

--- Additional comment from Fedora Update System on 2018-12-17 19:12:13 UTC ---

sssd-2.0.0-5.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 1 Jakub Hrozek 2018-12-20 10:30:58 UTC

Upstream ticket:
https://pagure.io/SSSD/sssd/issue/3909

Comment 2 Jakub Hrozek 2019-01-06 19:22:12 UTC

* master: e4469fbdb3d5c53294c6514280ac75b847b3c61c

Comment 5 Jakub Hrozek 2019-01-08 11:34:41 UTC

Steeve, can you consider giving qa_ack for this bug? I'm afraid there is no reliable reproducer. The only thing I can think of (and if you need that, I can prepare detailed steps) is to delay the execution of some PAM action by more than 25 seconds with gdb.

Comment 6 Steeve Goveas 2019-01-08 12:14:22 UTC

Jakub, if there is a way to test, then I think it will be good try it. We can automate it as well.

Comment 13 Niranjan Mallapadi Raghavender 2019-02-21 14:47:34 UTC

Reproducing the issue on RHEL7 sssd
------------------------------------

Versions:

sssd-client-1.16.2-13.el7.x86_64
sssd-krb5-common-1.16.2-13.el7.x86_64
sssd-ldap-1.16.2-13.el7.x86_64
sssd-proxy-1.16.2-13.el7.x86_64
sssd-tools-1.16.2-13.el7.x86_64
sssd-libwbclient-1.16.2-13.el7.x86_64
sssd-common-1.16.2-13.el7.x86_64
sssd-common-pac-1.16.2-13.el7.x86_64
sssd-ad-1.16.2-13.el7.x86_64
sssd-krb5-1.16.2-13.el7.x86_64
sssd-1.16.2-13.el7.x86_64
python-sssdconfig-1.16.2-13.el7.noarch
sssd-ipa-1.16.2-13.el7.x86_64
sssd-dbus-1.16.2-13.el7.x86_64
sssd-kcm-1.16.2-13.el7.x86_64
sssd-winbind-idmap-1.16.2-13.el7.x86_64

1. Configure /etc/sssd/sssd.conf as below:

[root@host-8-243-176 sssd]# cat /etc/sssd/sssd.conf 
[sssd]
domains = EXAMPLE.TEST
config_file_version = 2
services = nss, pam, sudo, ifp

[domain/EXAMPLE.TEST]
enumerate = false
id_provider = ldap
ldap_uri = ldaps://host-8-243-176.host.centralci.eng.rdu2.redhat.com
ldap_search_base = dc=example,dc=test
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
auth_provider = krb5
krb5_server = host-8-243-176.host.centralci.eng.rdu2.redhat.com
krb5_kpasswd = host-8-243-176.host.centralci.eng.rdu2.redhat.com
krb5_realm = EXAMPLE.TEST
debug_level = 9

[nss]
debug_level = 9

[pam]
debug_level = 9

[secrets]
debug_level = 9

[kcm]
debug_level = 9

2. Run gdb and attach sssd backend process to put it to sleep. as show below:



gdb -p $(pidof sssd_be) -ex "shell sleep 60" -ex detach -ex quit & 

Loaded symbols for /lib64/libgssapi_krb5.so.2
Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_files.so.2
Reading symbols from /usr/lib64/sssd/libsss_krb5.so...Reading symbols from /usr/lib64/sssd/libsss_krb5.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /usr/lib64/sssd/libsss_krb5.so
Reading symbols from /lib64/libfreeblpriv3.so...Reading symbols from /lib64/libfreeblpriv3.so...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libfreeblpriv3.so
Reading symbols from /lib64/libnss_sss.so.2...Reading symbols from /lib64/libnss_sss.so.2...(no debugging symbols found)...done.
(no debugging symbols found)...done.
Loaded symbols for /lib64/libnss_sss.so.2


[1]+  Stopped                 gdb -p $(pidof sssd_be) -ex "shell sleep 60" -ex detach -ex quit
[root@host-8-243-176 sssd]# id foo0
id: foo0: no such user
[root@host-8-243-176 sssd]# id foo0




On RHEL8:

Versions:

sssd-krb5-common-2.0.0-43.el8.x86_64
sssd-tools-2.0.0-43.el8.x86_64
sssd-dbus-2.0.0-43.el8.x86_64
sssd-2.0.0-43.el8.x86_64
sssd-libwbclient-2.0.0-23.el8.x86_64
sssd-client-2.0.0-43.el8.x86_64
sssd-ipa-2.0.0-43.el8.x86_64
sssd-ldap-2.0.0-43.el8.x86_64
sssd-winbind-idmap-2.0.0-23.el8.x86_64
python3-sssdconfig-2.0.0-43.el8.noarch
sssd-common-pac-2.0.0-43.el8.x86_64
sssd-krb5-2.0.0-43.el8.x86_64
sssd-proxy-2.0.0-43.el8.x86_64
sssd-kcm-2.0.0-43.el8.x86_64
sssd-ad-2.0.0-43.el8.x86_64
sssd-nfs-idmap-2.0.0-23.el8.x86_64
sssd-common-2.0.0-43.el8.x86_64


1. Configure sssd.conf as below:

[root@foobar1 db]# cat /etc/sssd/sssd.conf
[sssd]
domains = EXAMPLE.TEST
config_file_version = 2
services = nss, pam, sudo, ifp

[domain/EXAMPLE.TEST]
enumerate = false
id_provider = ldap
ldap_uri = ldaps://foobar1.example.test
ldap_search_base = dc=example,dc=test
ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem
auth_provider = krb5
krb5_server = foobar1.example.test
krb5_kpasswd = foobar1.example.test
krb5_realm = EXAMPLE.TEST
debug_level = 9

[nss]
debug_level = 9

[pam]
debug_level = 9

[secrets]
debug_level = 9

[kcm]
debug_level = 9


[root@foobar1 db]# ps -ef | grep sssd
root      3759     1  0 19:48 ?        00:00:00 /usr/sbin/sssd -i --logger=files
root      3760  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain implicit_files --uid 0 --gid 0 --logger=files
root      3761  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_be --domain EXAMPLE.TEST --uid 0 --gid 0 --logger=files
root      3762  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --logger=files
root      3763  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --logger=files
root      3764  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_sudo --uid 0 --gid 0 --logger=files
root      3765  3759  0 19:48 ?        00:00:00 /usr/libexec/sssd/sssd_ifp --uid 0 --gid 0 --logger=files


2. Run gdb and set backend to sleep:

gdb -p 3761 -ex "set confirm off" -ex "shell sleep 60" -ex detach  -ex quit &

Reading symbols from .gnu_debugdata for /usr/lib64/sasl2/libplain.so...
warning: Loadable section ".note.gnu.property" outside of ELF segments
(no debugging symbols found)...done.
(no debugging symbols found)...done.
Reading symbols from /usr/lib64/sssd/libsss_krb5.so...Missing separate debuginfo for /usr/lib64/sssd/libsss_krb5.so
Try: dnf --enablerepo='*debug*' install /usr/lib/debug/.build-id/e1/507d88594fa7f5d875f41ed7a3cc59104adb0d.debug
Reading symbols from .gnu_debugdata for /usr/lib64/sssd/libsss_krb5.so...
warning: Loadable section ".note.gnu.property" outside of ELF segments
(no debugging symbols found)...done.
(no debugging symbols found)...done.
0x00007fcd32de6ceb in epoll_wait () from /lib64/libc.so.6

[root@foobar1 db]# id foo0
Detaching from program: /usr/libexec/sssd/sssd_be, process 3761
[Inferior 1 (process 3761) detached]
uid=14583100(foo0) gid=14564100(ldapusers) groups=14564100(ldapusers)
[1]+  Done                    gdb -p 3761 -ex "set confirm off" -ex "shell sleep 60" -ex detach -ex quit

Note You need to log in before you can comment on or make changes to this bug.