918394 – sssd etas 99% CPU and runs out of file descriptors when clearing cache

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 918394 - sssd etas 99% CPU and runs out of file descriptors when clearing cache

Summary: sssd etas 99% CPU and runs out of file descriptors when clearing cache

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	sssd
Sub Component:
Version:	6.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jakub Hrozek
QA Contact:	Kaushik Banerjee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-06 07:24 UTC by Harald Strack
Modified:	2020-05-02 17:17 UTC (History)
CC List:	8 users (show)
Fixed In Version:	sssd-1.9.2-112.el6
Doc Type:	Bug Fix
Doc Text:	Cause: The SSSD did not close file descriptor to the memory cache in case the memory cache was reset with the sss_cache tool. Consequence: Running sss_cache resulted in a fd leak Fix: The sssd was amended so that the file descriptor to the memory cache is closed correctly. Result: Running sss_cache no longer results in a memory leak.
Clone Of:
Environment:
Last Closed:	2013-11-21 22:15:25 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	SSSD sssd issues 2868	0	None	closed	sssd etas 99% CPU and runs out of file descriptors when clearing cache	2020-08-13 12:48:32 UTC
Red Hat Product Errata	RHBA-2013:1680	0	normal	SHIPPED_LIVE	sssd bug fix and enhancement update	2013-11-20 21:52:37 UTC

Description Harald Strack 2013-03-06 07:24:12 UTC

Description of problem:
When we clear the sss-cache by using sss_cache -U, sss_cache -G, sss_cache -u <login> the process sssd_nss takes each time some fds more. When the process reaches its fd_limit, sssd runs at 99% CPU and the system gets unresponsive for every user-related task.


Version-Release number of selected component (if applicable):
rpm -qa | grep sssd
sssd-tools-1.9.2-82.el6.x86_64
sssd-client-1.9.2-82.el6.x86_64
sssd-1.9.2-82.el6.x86_64


How reproducible:
Everytime we run 

sss_cache -U or sss_cache -u <login>

the number of open files increases up to the fd_limit. Then, sssd runs at 99% CPU and no nss is working anymore...

Steps to Reproduce:
1. service sssd start #start service
2. watch "lsof -p `ps -ef | grep sssd_nss | grep -v grep |  perl -l -a -n -F"\s+" -e 'print $F[1]'` | wc -l" #watch fds
3. sss_cache -U #clear cache several times and watch the number of fds
  
Actual results:
Increasing number of fds for the sssd_nss process

Expected results:
Constant number of fds for the sssd_nss process


Additional info:
The leaking fds are all pointing to this files, lsof output:

sssd_nss 2090 root 8176u   REG                8,1   6806312    3424241 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8177u   REG                8,1   5206312    3424243 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8178u   REG                8,1   6806312    3424242 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8179u   REG                8,1   5206312    3424245 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8180u   REG                8,1   6806312    3424247 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8181u   REG                8,1   6806312    3424244 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8182u   REG                8,1   5206312    3424246 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8183u   REG                8,1   5206312    3424248 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8184u   REG                8,1   5206312    3424250 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8185u   REG                8,1   6806312    3424251 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8186u   REG                8,1   5206312    3424252 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8187u   REG                8,1   6806312    3424253 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8188u   REG                8,1   5206312    3424254 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8189u   REG                8,1   6806312   11493377 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8190u   REG                8,1   6806312    3424255 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8191u   REG                8,1   5206312    3424256 /var/lib/sss/mc/group (deleted)


The reason for the CPU usage is the error handling after epoll_wait(), strace output:

epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110])           = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40376) = 1

Workaround:
We set the fd_limit in the [nss] section of sssd.conf to a much too high value
and restart sssd with our NMS when it approaches the limit. 

[nss]
entry_negative_timeout = 0
debug_level = 0x1310
fd_limit=200000

This is not yet fixed in the packages in this repo

[sssd-1.9-RHEL6.3]
name=SSSD 1.9.x built for latest stable RHEL
baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/
enabled=1
skip_if_unavailable=1
gpgcheck=0

Comment 1 Jakub Hrozek 2013-03-06 10:19:48 UTC

I can reproduce. Thank you for the bug report.

Comment 2 Jakub Hrozek 2013-03-06 10:22:06 UTC

Upstream ticket:
https://fedorahosted.org/sssd/ticket/1826

Comment 3 Jakub Hrozek 2013-03-07 16:21:11 UTC

(In reply to comment #0)
> [sssd-1.9-RHEL6.3]
> name=SSSD 1.9.x built for latest stable RHEL
> baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/
> enabled=1
> skip_if_unavailable=1
> gpgcheck=0

Harald, thank you for testing the packages from this repository. But the repository was intended just as a preview for testing purposes. Please do not rely on that repo for production systems.

Comment 4 Harald Strack 2013-03-07 16:33:15 UTC

We are not using these packages. I only wanted to point out that the Problem is  not yet fixed in your latest testing package.

Comment 5 Jakub Hrozek 2013-05-10 15:10:25 UTC

Fixed upstream.

Comment 11 Nirupama Karandikar 2013-10-25 12:14:21 UTC

Tested with sssd-1.9.2-128.el6.x86_64

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: sssd etas 99% CPU and runs out of file descriptors when clearing cache BZ 918394
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:: [   PASS   ] :: Running 'getent passwd sssduser1| grep sssduser1' (Expected 0, got 0)
:: [   PASS   ] :: sssd_nss is not leaking FDs 
:: [   PASS   ] :: sssd_nss is not leaking FDs 
:: [   PASS   ] :: sssd_nss is not leaking FDs 
:: [   PASS   ] :: sssd_nss is not leaking FDs 
:: [   PASS   ] :: sssd_nss is not leaking FDs 
:: [   LOG    ] :: Duration: 5s
:: [   LOG    ] :: Assertions: 6 good, 0 bad
:: [   PASS   ] :: RESULT: sssd etas 99% CPU and runs out of file descriptors when clearing cache BZ 918394

Comment 12 errata-xmlrpc 2013-11-21 22:15:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1680.html

Note You need to log in before you can comment on or make changes to this bug.