Description of problem: sssd-be tends to run out of system resources, hitting the maximum number of open files (2023-05-04 9:26:39): [be[redhat.com]] [get_active_uid_linux] (0x4000): [RID#148299] get_uid_from_pid() failed. (2023-05-04 9:26:39): [be[redhat.com]] [get_uid_from_pid] (0x0020): [RID#148299] open failed [/proc/4075832/status][24][Too many open files]. …. (2023-05-04 9:26:39): [be[redhat.com]] [be_resolve_server_process] (0x0200): [RID#148299] Found address for server idm03.redhat.com: [10.x.x.x] TTL 1200 (2023-05-04 9:26:39): [be[redhat.com]] [sdap_kinit_kdc_resolved] (0x1000): [RID#148299] KDC resolved, attempting to get TGT... (2023-05-04 9:26:39): [be[redhat.com]] [create_tgt_req_send_buffer] (0x0400): [RID#148299] buffer size: 86 (2023-05-04 9:26:39): [be[redhat.com]] [sdap_fork_child] (0x0020): [RID#148299] pipe(from) failed [24][Too many open files]. (2023-05-04 9:26:39): [be[redhat.com]] [sdap_get_tgt_send] (0x0020): [RID#148299] sdap_fork_child failed. (2023-05-04 9:26:39): [be[redhat.com]] [sdap_kinit_done] (0x0020): [RID#148299] child failed (24 [Too many open files]) (2023-05-04 9:26:39): [be[redhat.com]] [sdap_cli_kinit_done] (0x0400): [RID#148299] Cannot get a TGT: ret [24](Too many open files) (2023-05-04 9:26:39): [be[redhat.com]] [sdap_cli_connect_recv] (0x0040): [RID#148299] Unable to establish connection [13]: Permission denied $ cat etc/sssd/sssd.conf [domain/redhat.com] id_provider = ipa dns_discovery_domain = redhat.com default_shell = /bin/bash override_shell = /bin/bash ipa_server = _srv_, xxx.redhat.com ipa_domain = redhat.com ipa_hostname = xxx.redhat.com auth_provider = ipa chpass_provider = ipa access_provider = ipa cache_credentials = True ldap_tls_cacert = /etc/ipa/ca.crt krb5_store_password_if_offline = True debug_level = 9 [sssd] services = nss, pam, ssh, sudo enable_files_domain=false domains = redhat.com default_domain_suffix = xxx.local full_name_format = %1$s debug_level = 9 [nss] homedir_substring = /home debug_level = 9 [pam] debug_level = 9 $ cat lsof |grep sssd|wc -l 1954 WORKAROUND: Restart sssd service. Version-Release number of selected component (if applicable): sssd-2.7.3-4.el8_7.3.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: https://github.com/SSSD/sssd/blob/cd843dafe63589d0a77145445c454f6fc19dabae/src/providers/ldap/sdap_child_helpers.c#L86 ret = pipe(pipefd_to_child); if (ret == -1) { ret = errno; DEBUG(SSSDBG_CRIT_FAILURE, "pipe(to) failed [%d][%s].\n", ret, strerror(ret)); <-- goto fail; }
Created attachment 1962799 [details] lsof
Created attachment 1962800 [details] proc_pgrep sssd_be_fd
Hi, can you check with the 'ps' command if there are actually many ldap_child/krb5_child processes running? Can you share the ldap_child/krb5_child log file to check if there are any issues? bye, Sumit
Hello Sumit, Log details: https://drive.google.com/file/d/1vLBrigx6tAn98PSyrB9mSfCdwnWHxTIA/view?usp=sharing
(In reply to Sumit Bose from comment #5) > Hi, > > some of the relevant log files are truncated in the sos-report. Please ask > to create a tar ball with the logs form /var/log/sssd and attach it to the > case. > > bye, > Sumit Hi Sumit, Did you get a chance to look into this issue For now, I have asked cus to check 1) Login into the problematic host as root. 2) vi /etc/security/limits.conf and set value as below. -------------------------------------- root soft nofile 20480 root hard nofile 20480 -------------------------------------- 3) Logout and relogin again as root and check if output of below command is showing "20480" value. # ulimit -n 4) Restart SSSD service # systemctl restart sssd # systemctl status sssd 5) Check if the issue is observed.
Hi, to reproduce the issue I replaced /usr/libexec/sssd/krb5_child with a shell script like #!/bin/bash sleep 10 to create a reliable timeout when the backend calls krb5_child. Then I run authentications from one shell while watching of `ls -al /proc/$(pidof sssd_be)/fd` in another window. HTH bye, Sumit
I also am seeing this after recent updates. # cat /var/log/messages | grep -i "open files" | cut -f 5- -d ' ' sssd_be[4114231]: Could not open file [/var/log/sssd/krb5_child.log]. Error: [24][Too many open files] ... Necessary to restart sssd to temporarily fix. Disgraceful to see bugs like this in an "enterprise" OS, but, this is what we have grown to expect from Red Hat, and from awful components like SSSD.
For me, issue first observed in same version: sssd-2.7.3-4.el8_7.3.x86_64
(In reply to John from comment #13) > I also am seeing this after recent updates. This bug (that is being fixed by https://github.com/SSSD/sssd/pull/6745) was there for ages. Something different changed that this bug is now triggered in your environment. (In reply to John from comment #14) > For me, issue first observed in same version: > sssd-2.7.3-4.el8_7.3.x86_64 Please look into 'krb5_child.log' to figure out why (if) it started failing often.
(In reply to Alexey Tikhonov from comment #15) > Please look into 'krb5_child.log' to figure out why (if) it started failing > often. Thanks for the suggestion but I would rather stab myself in the face with a fork. Never, in the 10 years or so i've had occasion to look at SSSD logs, have i ever found SSSD logs useful to debug any of the many issues I've had with SSSD. I have over 30 years of experience with a variety of unixes, and SSSD logs are the most obfuscated and useless logs i have ever seen. It doesn't matter how low high your debug level is, all you get is more and more misleading noise, with a million things failing that have no relevance to your issue, and which occur even when sssd is "working". The logs never, ever, contain anything useful. The only way for anyone to resolve any issues with sssd is just by randomly changing settings in the config file and praying. I'll just ignore the problem and hope it goes away, since that seems to work for Red Hat on a regular basis, maybe it'll work for me just this once.
(In reply to John from comment #16) > > I'll just ignore the problem and hope it goes away Another option could be to reach out to your customer support point of contact.
Hi John, Thank you for bringing your concerns/experience to our attention. We understand that you are encountering this issue with SSSD and are disappointed with the overall debugging experience. We apologize for any inconvenience you have experienced so far. While we appreciate your feedback, it would be immensely helpful if you could provide us with more specific details about the problems you are facing. This will enable us to investigate the issue more effectively and find a resolution. We would like to work collaboratively with you, so any specific information or log excerpts you can share would greatly assist us in identifying the root cause. In case you are still not interested in going through the logs, we would like to understand your experience in more detail. Could you please provide specific examples of how the logs and debugging experience were not useful and appeared obfuscated? This will help us understand what aspects you found challenging or missed during your troubleshooting process. Of course, another option is to engage the customer support point of contact, as Alexey previously mentioned. However, it's important to note that simply reaching out to customer support may not necessarily contribute to improving the overall user experience, the reason why we are asking for details about debugging/logs. Software issues can be frustrating, but our teams are dedicated to continuously improving our OS and its components. Ignoring feedback is not part of our approach; instead, we actively seek ways to enhance our system based on input from users like you. Please let us know if you can provide any further details or if you would like us to assist you in any specific way. We are here to help and ensure a positive user experience. Best regards, Andre Boscatto Product Owner, Identity and Access Management Department
Upstream PR: https://github.com/SSSD/sssd/pull/6745
Pushed PR: https://github.com/SSSD/sssd/pull/6745 * `master` * 455611952f90ed0cefaff1e840623ea14ac06be1 - krb5: make sure sockets are closed on timeouts * `sssd-2-9` * 4d2cf0b62bbf0386755550bfad684cf36b36eccd - krb5: make sure sockets are closed on timeouts