Bug 2195919 - sssd-be tends to run out of system resources, hitting the maximum number of open files
Summary: sssd-be tends to run out of system resources, hitting the maximum number of o...
Keywords:
Status: VERIFIED
Alias: None
Deadline: 2023-07-03
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.7
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Sumit Bose
QA Contact: Anuj Borah
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-06 15:57 UTC by Abhijit Roy
Modified: 2023-07-14 13:37 UTC (History)
8 users (show)

Fixed In Version: sssd-2.9.1-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 6744 0 None open sssd-be tends to run out of system resources, hitting the maximum number of open files 2023-05-22 19:40:04 UTC
Github SSSD sssd pull 6745 0 None open krb5: make sure sockets are closed on timeouts 2023-05-23 08:22:13 UTC
Github SSSD sssd pull 6760 0 None open Tests: Add a fast, reliable, accurate ssh module 2023-06-02 08:53:44 UTC
Red Hat Issue Tracker RHELPLAN-156563 0 None None None 2023-05-06 15:59:12 UTC
Red Hat Issue Tracker SSSD-6124 0 None None None 2023-05-23 07:26:26 UTC

Description Abhijit Roy 2023-05-06 15:57:58 UTC
Description of problem:

sssd-be tends to run out of system resources, hitting the maximum number of open files

(2023-05-04  9:26:39): [be[redhat.com]] [get_active_uid_linux] (0x4000): [RID#148299] get_uid_from_pid() failed.
(2023-05-04  9:26:39): [be[redhat.com]] [get_uid_from_pid] (0x0020): [RID#148299] open failed [/proc/4075832/status][24][Too many open files].
….
(2023-05-04  9:26:39): [be[redhat.com]] [be_resolve_server_process] (0x0200): [RID#148299] Found address for server idm03.redhat.com: [10.x.x.x] TTL 1200
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_kinit_kdc_resolved] (0x1000): [RID#148299] KDC resolved, attempting to get TGT...
(2023-05-04  9:26:39): [be[redhat.com]] [create_tgt_req_send_buffer] (0x0400): [RID#148299] buffer size: 86
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_fork_child] (0x0020): [RID#148299] pipe(from) failed [24][Too many open files].
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_get_tgt_send] (0x0020): [RID#148299] sdap_fork_child failed.
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_kinit_done] (0x0020): [RID#148299] child failed (24 [Too many open files])
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_cli_kinit_done] (0x0400): [RID#148299] Cannot get a TGT: ret [24](Too many open files)
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_cli_connect_recv] (0x0040): [RID#148299] Unable to establish connection [13]: Permission denied
$ cat etc/sssd/sssd.conf 
[domain/redhat.com]

id_provider = ipa
dns_discovery_domain = redhat.com
default_shell = /bin/bash
override_shell = /bin/bash
ipa_server = _srv_, xxx.redhat.com
ipa_domain = redhat.com
ipa_hostname = xxx.redhat.com
auth_provider = ipa
chpass_provider = ipa
access_provider = ipa
cache_credentials = True
ldap_tls_cacert = /etc/ipa/ca.crt
krb5_store_password_if_offline = True
debug_level = 9
[sssd]
services = nss, pam, ssh, sudo
enable_files_domain=false

domains = redhat.com
default_domain_suffix = xxx.local
full_name_format = %1$s
debug_level = 9
[nss]
homedir_substring = /home
debug_level = 9
[pam]
debug_level = 9

$ cat lsof |grep sssd|wc -l
1954

WORKAROUND: Restart sssd service.

Version-Release number of selected component (if applicable):

sssd-2.7.3-4.el8_7.3.x86_64 

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

https://github.com/SSSD/sssd/blob/cd843dafe63589d0a77145445c454f6fc19dabae/src/providers/ldap/sdap_child_helpers.c#L86


    ret = pipe(pipefd_to_child);
    if (ret == -1) {
        ret = errno;
        DEBUG(SSSDBG_CRIT_FAILURE,
              "pipe(to) failed [%d][%s].\n", ret, strerror(ret)); <--
        goto fail;
    }

Comment 1 Abhijit Roy 2023-05-06 16:03:51 UTC
Created attachment 1962799 [details]
lsof

Comment 2 Abhijit Roy 2023-05-06 16:05:45 UTC
Created attachment 1962800 [details]
proc_pgrep sssd_be_fd

Comment 3 Sumit Bose 2023-05-08 06:24:34 UTC
Hi,

can you check with the 'ps' command if there are actually many ldap_child/krb5_child processes running? Can you share the ldap_child/krb5_child log file to check if there are any issues?

bye,
Sumit

Comment 7 Abhijit Roy 2023-05-09 17:51:44 UTC
Hello Sumit,

Log details: https://drive.google.com/file/d/1vLBrigx6tAn98PSyrB9mSfCdwnWHxTIA/view?usp=sharing

Comment 8 Abhijit Roy 2023-05-17 15:48:21 UTC
(In reply to Sumit Bose from comment #5)
> Hi,
> 
> some of the relevant log files are truncated in the sos-report. Please ask
> to create a tar ball with the logs form /var/log/sssd and attach it to the
> case.
> 
> bye,
> Sumit

Hi Sumit,

Did you get a chance to look into this issue

For now, I have asked cus to check

1) Login into the problematic host as root. 

2) vi /etc/security/limits.conf and set value as below.

   --------------------------------------
   root     soft    nofile         20480
   root     hard    nofile         20480
   --------------------------------------

3) Logout and relogin again as root and check if output of below command is showing "20480" value.

    # ulimit -n

4) Restart SSSD service

    # systemctl restart sssd
    # systemctl status sssd

5) Check if the issue is observed.

Comment 12 Sumit Bose 2023-05-22 19:56:58 UTC
Hi,

to reproduce the issue I replaced /usr/libexec/sssd/krb5_child with a shell script like

#!/bin/bash
sleep 10


to create a reliable timeout when the backend calls krb5_child. Then I run authentications from one shell while watching of `ls -al /proc/$(pidof sssd_be)/fd` in another window.

HTH

bye,
Sumit

Comment 13 John 2023-05-23 03:49:43 UTC
I also am seeing this after recent updates.

# cat /var/log/messages | grep -i "open files" | cut -f 5- -d ' '
sssd_be[4114231]: Could not open file [/var/log/sssd/krb5_child.log]. Error: [24][Too many open files]
...

Necessary to restart sssd to temporarily fix.


Disgraceful to see bugs like this in an "enterprise" OS, but, this is what we have grown to expect from Red Hat, and from awful components like SSSD.

Comment 14 John 2023-05-23 03:52:44 UTC
For me, issue first observed in same version:
sssd-2.7.3-4.el8_7.3.x86_64

Comment 15 Alexey Tikhonov 2023-05-23 08:14:03 UTC
(In reply to John from comment #13)
> I also am seeing this after recent updates.

This bug (that is being fixed by https://github.com/SSSD/sssd/pull/6745) was there for ages.
Something different changed that this bug is now triggered in your environment.


(In reply to John from comment #14)
> For me, issue first observed in same version:
> sssd-2.7.3-4.el8_7.3.x86_64

Please look into 'krb5_child.log' to figure out why (if) it started failing often.

Comment 16 John 2023-05-23 08:31:50 UTC
(In reply to Alexey Tikhonov from comment #15)
> Please look into 'krb5_child.log' to figure out why (if) it started failing
> often.

Thanks for the suggestion but I would rather stab myself in the face with a fork.
Never, in the 10 years or so i've had occasion to look at SSSD logs, have i ever found SSSD logs useful to debug any of the many issues I've had with SSSD.

I have over 30 years of experience with a variety of unixes, and SSSD logs are the most obfuscated and useless logs i have ever seen.

It doesn't matter how low high your debug level is, all you get is more and more misleading noise, with a million things failing that have no relevance to your issue, and which occur even when sssd is "working".
The logs never, ever, contain anything useful.

The only way for anyone to resolve any issues with sssd is just by randomly changing settings in the config file and praying.

I'll just ignore the problem and hope it goes away, since that seems to work for Red Hat on a regular basis, maybe it'll work for me just this once.

Comment 17 Alexey Tikhonov 2023-05-23 08:41:20 UTC
(In reply to John from comment #16)
> 
> I'll just ignore the problem and hope it goes away

Another option could be to reach out to your customer support point of contact.

Comment 18 Andre Boscatto 2023-05-23 09:51:50 UTC
Hi John,

Thank you for bringing your concerns/experience to our attention. 

We understand that you are encountering this issue with SSSD and are disappointed with the overall debugging experience. We apologize for any inconvenience you have experienced so far.

While we appreciate your feedback, it would be immensely helpful if you could provide us with more specific details about the problems you are facing. This will enable us to investigate the issue more effectively and find a resolution. We would like to work collaboratively with you, so any specific information or log excerpts you can share would greatly assist us in identifying the root cause.

In case you are still not interested in going through the logs, we would like to understand your experience in more detail. Could you please provide specific examples of how the logs and debugging experience were not useful and appeared obfuscated? This will help us understand what aspects you found challenging or missed during your troubleshooting process.

Of course, another option is to engage the customer support point of contact, as Alexey previously mentioned. However, it's important to note that simply reaching out to customer support may not necessarily contribute to improving the overall user experience, the reason why we are asking for details about debugging/logs.

Software issues can be frustrating, but our teams are dedicated to continuously improving our OS and its components. Ignoring feedback is not part of our approach; instead, we actively seek ways to enhance our system based on input from users like you.

Please let us know if you can provide any further details or if you would like us to assist you in any specific way. We are here to help and ensure a positive user experience.

Best regards,

Andre Boscatto
Product Owner, Identity and Access Management Department

Comment 19 Alexey Tikhonov 2023-05-23 13:10:07 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/6745

Comment 21 Alexey Tikhonov 2023-05-26 10:54:56 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6745

* `master`
    * 455611952f90ed0cefaff1e840623ea14ac06be1 - krb5: make sure sockets are closed on timeouts
* `sssd-2-9`
    * 4d2cf0b62bbf0386755550bfad684cf36b36eccd - krb5: make sure sockets are closed on timeouts


Note You need to log in before you can comment on or make changes to this bug.