Bug 918394
| Summary: | sssd etas 99% CPU and runs out of file descriptors when clearing cache | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Harald Strack <hstrack> |
| Component: | sssd | Assignee: | Jakub Hrozek <jhrozek> |
| Status: | CLOSED ERRATA | QA Contact: | Kaushik Banerjee <kbanerje> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.4 | CC: | grajaiya, hstrack, jgalipea, lamar.folsom, lnovich, mkosek, nkarandi, pbrezina |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | sssd-1.9.2-112.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: The SSSD did not close file descriptor to the memory cache in case the memory cache was reset with the sss_cache tool.
Consequence: Running sss_cache resulted in a fd leak
Fix: The sssd was amended so that the file descriptor to the memory cache is closed correctly.
Result: Running sss_cache no longer results in a memory leak.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2013-11-21 22:15:25 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I can reproduce. Thank you for the bug report. Upstream ticket: https://fedorahosted.org/sssd/ticket/1826 (In reply to comment #0) > [sssd-1.9-RHEL6.3] > name=SSSD 1.9.x built for latest stable RHEL > baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/ > enabled=1 > skip_if_unavailable=1 > gpgcheck=0 Harald, thank you for testing the packages from this repository. But the repository was intended just as a preview for testing purposes. Please do not rely on that repo for production systems. We are not using these packages. I only wanted to point out that the Problem is not yet fixed in your latest testing package. Fixed upstream. Tested with sssd-1.9.2-128.el6.x86_64 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ LOG ] :: sssd etas 99% CPU and runs out of file descriptors when clearing cache BZ 918394 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: :: [ PASS ] :: Running 'getent passwd sssduser1| grep sssduser1' (Expected 0, got 0) :: [ PASS ] :: sssd_nss is not leaking FDs :: [ PASS ] :: sssd_nss is not leaking FDs :: [ PASS ] :: sssd_nss is not leaking FDs :: [ PASS ] :: sssd_nss is not leaking FDs :: [ PASS ] :: sssd_nss is not leaking FDs :: [ LOG ] :: Duration: 5s :: [ LOG ] :: Assertions: 6 good, 0 bad :: [ PASS ] :: RESULT: sssd etas 99% CPU and runs out of file descriptors when clearing cache BZ 918394 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1680.html |
Description of problem: When we clear the sss-cache by using sss_cache -U, sss_cache -G, sss_cache -u <login> the process sssd_nss takes each time some fds more. When the process reaches its fd_limit, sssd runs at 99% CPU and the system gets unresponsive for every user-related task. Version-Release number of selected component (if applicable): rpm -qa | grep sssd sssd-tools-1.9.2-82.el6.x86_64 sssd-client-1.9.2-82.el6.x86_64 sssd-1.9.2-82.el6.x86_64 How reproducible: Everytime we run sss_cache -U or sss_cache -u <login> the number of open files increases up to the fd_limit. Then, sssd runs at 99% CPU and no nss is working anymore... Steps to Reproduce: 1. service sssd start #start service 2. watch "lsof -p `ps -ef | grep sssd_nss | grep -v grep | perl -l -a -n -F"\s+" -e 'print $F[1]'` | wc -l" #watch fds 3. sss_cache -U #clear cache several times and watch the number of fds Actual results: Increasing number of fds for the sssd_nss process Expected results: Constant number of fds for the sssd_nss process Additional info: The leaking fds are all pointing to this files, lsof output: sssd_nss 2090 root 8176u REG 8,1 6806312 3424241 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8177u REG 8,1 5206312 3424243 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8178u REG 8,1 6806312 3424242 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8179u REG 8,1 5206312 3424245 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8180u REG 8,1 6806312 3424247 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8181u REG 8,1 6806312 3424244 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8182u REG 8,1 5206312 3424246 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8183u REG 8,1 5206312 3424248 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8184u REG 8,1 5206312 3424250 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8185u REG 8,1 6806312 3424251 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8186u REG 8,1 5206312 3424252 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8187u REG 8,1 6806312 3424253 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8188u REG 8,1 5206312 3424254 /var/lib/sss/mc/group (deleted) sssd_nss 2090 root 8189u REG 8,1 6806312 11493377 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8190u REG 8,1 6806312 3424255 /var/lib/sss/mc/passwd (deleted) sssd_nss 2090 root 8191u REG 8,1 5206312 3424256 /var/lib/sss/mc/group (deleted) The reason for the CPU usage is the error handling after epoll_wait(), strace output: epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40403) = 1 accept(23, 0x149b38e0, [110]) = -1 EMFILE (Too many open files) epoll_wait(5, {{EPOLLIN, {u32=24633616, u64=24633616}}}, 1, 40376) = 1 Workaround: We set the fd_limit in the [nss] section of sssd.conf to a much too high value and restart sssd with our NMS when it approaches the limit. [nss] entry_negative_timeout = 0 debug_level = 0x1310 fd_limit=200000 This is not yet fixed in the packages in this repo [sssd-1.9-RHEL6.3] name=SSSD 1.9.x built for latest stable RHEL baseurl=http://repos.fedorapeople.org/repos/jhrozek/sssd/epel-6/$basearch/ enabled=1 skip_if_unavailable=1 gpgcheck=0