Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Cause: The SSSD did not close file descriptor to the memory cache in case the memory cache was reset with the sss_cache tool.
Consequence: Running sss_cache resulted in a fd leak
Fix: The sssd was amended so that the file descriptor to the memory cache is closed correctly.
Result: Running sss_cache no longer results in a memory leak.
Description of problem:
When we clear the sss-cache by using sss_cache -U, sss_cache -G, sss_cache -u <login> the process sssd_nss takes each time some fds more. When the process reaches its fd_limit, sssd runs at 99% CPU and the system gets unresponsive for every user-related task.
Version-Release number of selected component (if applicable):
rpm -qa | grep sssd
sssd-tools-1.9.2-82.el6.x86_64
sssd-client-1.9.2-82.el6.x86_64
sssd-1.9.2-82.el6.x86_64
How reproducible:
Everytime we run
sss_cache -U or sss_cache -u <login>
the number of open files increases up to the fd_limit. Then, sssd runs at 99% CPU and no nss is working anymore...
Steps to Reproduce:
1. service sssd start #start service
2. watch "lsof -p `ps -ef | grep sssd_nss | grep -v grep | perl -l -a -n -F"\s+" -e 'print $F[1]'` | wc -l" #watch fds
3. sss_cache -U #clear cache several times and watch the number of fds
Actual results:
Increasing number of fds for the sssd_nss process
Expected results:
Constant number of fds for the sssd_nss process
Additional info:
The leaking fds are all pointing to this files, lsof output:
sssd_nss 2090 root 8176u REG 8,1 6806312 3424241 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8177u REG 8,1 5206312 3424243 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8178u REG 8,1 6806312 3424242 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8179u REG 8,1 5206312 3424245 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8180u REG 8,1 6806312 3424247 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8181u REG 8,1 6806312 3424244 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8182u REG 8,1 5206312 3424246 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8183u REG 8,1 5206312 3424248 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8184u REG 8,1 5206312 3424250 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8185u REG 8,1 6806312 3424251 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8186u REG 8,1 5206312 3424252 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8187u REG 8,1 6806312 3424253 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8188u REG 8,1 5206312 3424254 /var/lib/sss/mc/group (deleted)
sssd_nss 2090 root 8189u REG 8,1 6806312 11493377 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8190u REG 8,1 6806312 3424255 /var/lib/sss/mc/passwd (deleted)
sssd_nss 2090 root 8191u REG 8,1 5206312 3424256 /var/lib/sss/mc/group (deleted)
The reason for the CPU usage is the error handling after epoll_wait(), strace output:
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1
accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files)
epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40376) = 1
Workaround:
We set the fd_limit in the [nss] section of sssd.conf to a much too high value
and restart sssd with our NMS when it approaches the limit.
[nss]
entry_negative_timeout = 0
debug_level = 0x1310
fd_limit=200000
This is not yet fixed in the packages in this repo
[sssd-1.9-RHEL6.3]
name=SSSD 1.9.x built for latest stable RHEL
baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/
enabled=1
skip_if_unavailable=1
gpgcheck=0
(In reply to comment #0)
> [sssd-1.9-RHEL6.3]
> name=SSSD 1.9.x built for latest stable RHEL
> baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/
> enabled=1
> skip_if_unavailable=1
> gpgcheck=0
Harald, thank you for testing the packages from this repository. But the repository was intended just as a preview for testing purposes. Please do not rely on that repo for production systems.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
http://rhn.redhat.com/errata/RHBA-2013-1680.html
Description of problem: When we clear the sss-cache by using sss_cache -U, sss_cache -G, sss_cache -u <login> the process sssd_nss takes each time some fds more. When the process reaches its fd_limit, sssd runs at 99% CPU and the system gets unresponsive for every user-related task. Version-Release number of selected component (if applicable): rpm -qa | grep sssd sssd-tools-1.9.2-82.el6.x86_64 sssd-client-1.9.2-82.el6.x86_64 sssd-1.9.2-82.el6.x86_64 How reproducible: Everytime we run sss_cache -U or sss_cache -u <login> the number of open files increases up to the fd_limit. Then, sssd runs at 99% CPU and no nss is working anymore... Steps to Reproduce: 1. service sssd start #start service 2. watch "lsof -p `ps -ef | grep sssd_nss | grep -v grep | perl -l -a -n -F"\s+" -e 'print $F[1]'` | wc -l" #watch fds 3. sss_cache -U #clear cache several times and watch the number of fds Actual results: Increasing number of fds for the sssd_nss process Expected results: Constant number of fds for the sssd_nss process Additional info: The leaking fds are all pointing to this files, lsof output: sssd_nss 2090 root 8176u REG 8,1 6806312 3424241 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8177u REG 8,1 5206312 3424243 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8178u REG 8,1 6806312 3424242 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8179u REG 8,1 5206312 3424245 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8180u REG 8,1 6806312 3424247 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8181u REG 8,1 6806312 3424244 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8182u REG 8,1 5206312 3424246 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8183u REG 8,1 5206312 3424248 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8184u REG 8,1 5206312 3424250 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8185u REG 8,1 6806312 3424251 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8186u REG 8,1 5206312 3424252 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8187u REG 8,1 6806312 3424253 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8188u REG 8,1 5206312 3424254 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8189u REG 8,1 6806312 11493377 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8190u REG 8,1 6806312 3424255 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8191u REG 8,1 5206312 3424256 /var/lib/sss/mc/group (deleted) The reason for the CPU usage is the error handling after epoll_wait(), strace output: epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40376) = 1 Workaround: We set the fd_limit in the [nss] section of sssd.conf to a much too high value and restart sssd with our NMS when it approaches the limit. [nss] entry_negative_timeout = 0 debug_level = 0x1310 fd_limit=200000 This is not yet fixed in the packages in this repo [sssd-1.9-RHEL6.3] name=SSSD 1.9.x built for latest stable RHEL baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/ enabled=1 skip_if_unavailable=1 gpgcheck=0