RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2195919 - sssd-be tends to run out of system resources, hitting the maximum number of open files
Summary: sssd-be tends to run out of system resources, hitting the maximum number of o...
Keywords:
Status: CLOSED ERRATA
Alias: None
Deadline: 2023-07-03
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sssd
Version: 8.7
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Sumit Bose
QA Contact: Anuj Borah
URL:
Whiteboard: sync-to-jira
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-06 15:57 UTC by Abhijit Roy
Modified: 2023-11-14 18:08 UTC (History)
9 users (show)

Fixed In Version: sssd-2.9.1-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-11-14 15:50:01 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 6744 0 None open sssd-be tends to run out of system resources, hitting the maximum number of open files 2023-05-22 19:40:04 UTC
Github SSSD sssd pull 6745 0 None open krb5: make sure sockets are closed on timeouts 2023-05-23 08:22:13 UTC
Github SSSD sssd pull 6760 0 None open Tests: Add a fast, reliable, accurate ssh module 2023-06-02 08:53:44 UTC
Red Hat Issue Tracker RHEL-10039 0 None None None 2023-09-27 12:36:11 UTC
Red Hat Issue Tracker RHELPLAN-156563 0 None None None 2023-05-06 15:59:12 UTC
Red Hat Issue Tracker SSSD-6124 0 None None None 2023-05-23 07:26:26 UTC
Red Hat Product Errata RHBA-2023:7127 0 None None None 2023-11-14 15:50:13 UTC

Description Abhijit Roy 2023-05-06 15:57:58 UTC
Description of problem:

sssd-be tends to run out of system resources, hitting the maximum number of open files

(2023-05-04  9:26:39): [be[redhat.com]] [get_active_uid_linux] (0x4000): [RID#148299] get_uid_from_pid() failed.
(2023-05-04  9:26:39): [be[redhat.com]] [get_uid_from_pid] (0x0020): [RID#148299] open failed [/proc/4075832/status][24][Too many open files].
….
(2023-05-04  9:26:39): [be[redhat.com]] [be_resolve_server_process] (0x0200): [RID#148299] Found address for server idm03.redhat.com: [10.x.x.x] TTL 1200
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_kinit_kdc_resolved] (0x1000): [RID#148299] KDC resolved, attempting to get TGT...
(2023-05-04  9:26:39): [be[redhat.com]] [create_tgt_req_send_buffer] (0x0400): [RID#148299] buffer size: 86
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_fork_child] (0x0020): [RID#148299] pipe(from) failed [24][Too many open files].
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_get_tgt_send] (0x0020): [RID#148299] sdap_fork_child failed.
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_kinit_done] (0x0020): [RID#148299] child failed (24 [Too many open files])
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_cli_kinit_done] (0x0400): [RID#148299] Cannot get a TGT: ret [24](Too many open files)
(2023-05-04  9:26:39): [be[redhat.com]] [sdap_cli_connect_recv] (0x0040): [RID#148299] Unable to establish connection [13]: Permission denied
$ cat etc/sssd/sssd.conf 
[domain/redhat.com]

id_provider = ipa
dns_discovery_domain = redhat.com
default_shell = /bin/bash
override_shell = /bin/bash
ipa_server = _srv_, xxx.redhat.com
ipa_domain = redhat.com
ipa_hostname = xxx.redhat.com
auth_provider = ipa
chpass_provider = ipa
access_provider = ipa
cache_credentials = True
ldap_tls_cacert = /etc/ipa/ca.crt
krb5_store_password_if_offline = True
debug_level = 9
[sssd]
services = nss, pam, ssh, sudo
enable_files_domain=false

domains = redhat.com
default_domain_suffix = xxx.local
full_name_format = %1$s
debug_level = 9
[nss]
homedir_substring = /home
debug_level = 9
[pam]
debug_level = 9

$ cat lsof |grep sssd|wc -l
1954

WORKAROUND: Restart sssd service.

Version-Release number of selected component (if applicable):

sssd-2.7.3-4.el8_7.3.x86_64 

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

https://github.com/SSSD/sssd/blob/cd843dafe63589d0a77145445c454f6fc19dabae/src/providers/ldap/sdap_child_helpers.c#L86


    ret = pipe(pipefd_to_child);
    if (ret == -1) {
        ret = errno;
        DEBUG(SSSDBG_CRIT_FAILURE,
              "pipe(to) failed [%d][%s].\n", ret, strerror(ret)); <--
        goto fail;
    }

Comment 1 Abhijit Roy 2023-05-06 16:03:51 UTC
Created attachment 1962799 [details]
lsof

Comment 2 Abhijit Roy 2023-05-06 16:05:45 UTC
Created attachment 1962800 [details]
proc_pgrep sssd_be_fd

Comment 3 Sumit Bose 2023-05-08 06:24:34 UTC
Hi,

can you check with the 'ps' command if there are actually many ldap_child/krb5_child processes running? Can you share the ldap_child/krb5_child log file to check if there are any issues?

bye,
Sumit

Comment 7 Abhijit Roy 2023-05-09 17:51:44 UTC
Hello Sumit,

Log details: https://drive.google.com/file/d/1vLBrigx6tAn98PSyrB9mSfCdwnWHxTIA/view?usp=sharing

Comment 8 Abhijit Roy 2023-05-17 15:48:21 UTC
(In reply to Sumit Bose from comment #5)
> Hi,
> 
> some of the relevant log files are truncated in the sos-report. Please ask
> to create a tar ball with the logs form /var/log/sssd and attach it to the
> case.
> 
> bye,
> Sumit

Hi Sumit,

Did you get a chance to look into this issue

For now, I have asked cus to check

1) Login into the problematic host as root. 

2) vi /etc/security/limits.conf and set value as below.

   --------------------------------------
   root     soft    nofile         20480
   root     hard    nofile         20480
   --------------------------------------

3) Logout and relogin again as root and check if output of below command is showing "20480" value.

    # ulimit -n

4) Restart SSSD service

    # systemctl restart sssd
    # systemctl status sssd

5) Check if the issue is observed.

Comment 12 Sumit Bose 2023-05-22 19:56:58 UTC
Hi,

to reproduce the issue I replaced /usr/libexec/sssd/krb5_child with a shell script like

#!/bin/bash
sleep 10


to create a reliable timeout when the backend calls krb5_child. Then I run authentications from one shell while watching of `ls -al /proc/$(pidof sssd_be)/fd` in another window.

HTH

bye,
Sumit

Comment 13 John 2023-05-23 03:49:43 UTC
I also am seeing this after recent updates.

# cat /var/log/messages | grep -i "open files" | cut -f 5- -d ' '
sssd_be[4114231]: Could not open file [/var/log/sssd/krb5_child.log]. Error: [24][Too many open files]
...

Necessary to restart sssd to temporarily fix.


Disgraceful to see bugs like this in an "enterprise" OS, but, this is what we have grown to expect from Red Hat, and from awful components like SSSD.

Comment 14 John 2023-05-23 03:52:44 UTC
For me, issue first observed in same version:
sssd-2.7.3-4.el8_7.3.x86_64

Comment 15 Alexey Tikhonov 2023-05-23 08:14:03 UTC
(In reply to John from comment #13)
> I also am seeing this after recent updates.

This bug (that is being fixed by https://github.com/SSSD/sssd/pull/6745) was there for ages.
Something different changed that this bug is now triggered in your environment.


(In reply to John from comment #14)
> For me, issue first observed in same version:
> sssd-2.7.3-4.el8_7.3.x86_64

Please look into 'krb5_child.log' to figure out why (if) it started failing often.

Comment 16 John 2023-05-23 08:31:50 UTC
(In reply to Alexey Tikhonov from comment #15)
> Please look into 'krb5_child.log' to figure out why (if) it started failing
> often.

Thanks for the suggestion but I would rather stab myself in the face with a fork.
Never, in the 10 years or so i've had occasion to look at SSSD logs, have i ever found SSSD logs useful to debug any of the many issues I've had with SSSD.

I have over 30 years of experience with a variety of unixes, and SSSD logs are the most obfuscated and useless logs i have ever seen.

It doesn't matter how low high your debug level is, all you get is more and more misleading noise, with a million things failing that have no relevance to your issue, and which occur even when sssd is "working".
The logs never, ever, contain anything useful.

The only way for anyone to resolve any issues with sssd is just by randomly changing settings in the config file and praying.

I'll just ignore the problem and hope it goes away, since that seems to work for Red Hat on a regular basis, maybe it'll work for me just this once.

Comment 17 Alexey Tikhonov 2023-05-23 08:41:20 UTC
(In reply to John from comment #16)
> 
> I'll just ignore the problem and hope it goes away

Another option could be to reach out to your customer support point of contact.

Comment 18 Andre Boscatto 2023-05-23 09:51:50 UTC
Hi John,

Thank you for bringing your concerns/experience to our attention. 

We understand that you are encountering this issue with SSSD and are disappointed with the overall debugging experience. We apologize for any inconvenience you have experienced so far.

While we appreciate your feedback, it would be immensely helpful if you could provide us with more specific details about the problems you are facing. This will enable us to investigate the issue more effectively and find a resolution. We would like to work collaboratively with you, so any specific information or log excerpts you can share would greatly assist us in identifying the root cause.

In case you are still not interested in going through the logs, we would like to understand your experience in more detail. Could you please provide specific examples of how the logs and debugging experience were not useful and appeared obfuscated? This will help us understand what aspects you found challenging or missed during your troubleshooting process.

Of course, another option is to engage the customer support point of contact, as Alexey previously mentioned. However, it's important to note that simply reaching out to customer support may not necessarily contribute to improving the overall user experience, the reason why we are asking for details about debugging/logs.

Software issues can be frustrating, but our teams are dedicated to continuously improving our OS and its components. Ignoring feedback is not part of our approach; instead, we actively seek ways to enhance our system based on input from users like you.

Please let us know if you can provide any further details or if you would like us to assist you in any specific way. We are here to help and ensure a positive user experience.

Best regards,

Andre Boscatto
Product Owner, Identity and Access Management Department

Comment 19 Alexey Tikhonov 2023-05-23 13:10:07 UTC
Upstream PR: https://github.com/SSSD/sssd/pull/6745

Comment 21 Alexey Tikhonov 2023-05-26 10:54:56 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/6745

* `master`
    * 455611952f90ed0cefaff1e840623ea14ac06be1 - krb5: make sure sockets are closed on timeouts
* `sssd-2-9`
    * 4d2cf0b62bbf0386755550bfad684cf36b36eccd - krb5: make sure sockets are closed on timeouts

Comment 38 errata-xmlrpc 2023-11-14 15:50:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7127


Note You need to log in before you can comment on or make changes to this bug.