Bug 1274633 - SSSD is not closing sockets properly
Summary: SSSD is not closing sockets properly
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd
Version: 6.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Lukas Slebodnik
QA Contact: Namita Soman
URL:
Whiteboard:
Depends On:
Blocks: 1172231 1272422
TreeView+ depends on / blocked
 
Reported: 2015-10-23 08:27 UTC by Jakub Hrozek
Modified: 2020-05-02 18:10 UTC (History)
13 users (show)

Fixed In Version: sssd-1.13.2-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-10 20:24:48 UTC
Target Upstream Version:


Attachments (Terms of Use)
lsof output from cu rhel 7.2 system (65.76 KB, text/plain)
2016-03-30 14:43 UTC, kludhwan
no flags Details


Links
System ID Priority Status Summary Last Updated
Github SSSD sssd issues 3833 None closed SSSD is not closing sockets properly 2020-09-04 05:48:32 UTC
Red Hat Product Errata RHBA-2016:0782 normal SHIPPED_LIVE sssd bug fix and enhancement update 2016-05-10 22:36:00 UTC

Description Jakub Hrozek 2015-10-23 08:27:55 UTC
This bug is created as a clone of upstream ticket:
https://fedorahosted.org/sssd/ticket/2792

After certain period of time process sssd-be tends to run out of system resources, hitting the maximum number of open files. It turns out it leaves behind lots of not properly closed sockets:
{{{
[root@login05 ondrejv]# ls -l /proc/`pgrep sssd_be`/fd/
total 0
lrwx------ 1 root root 64 Sep 16 09:50 0 -> /dev/null
lrwx------ 1 root root 64 Sep 16 09:50 1 -> /dev/null
lr-x------ 1 root root 64 Sep 16 09:50 10 -> inotify
lrwx------ 1 root root 64 Sep 16 09:50 100 -> socket:[158904170]
lrwx------ 1 root root 64 Sep 16 09:50 101 -> socket:[174418463]
lrwx------ 1 root root 64 Sep 16 09:50 102 -> socket:[158905068]
lrwx------ 1 root root 64 Sep 16 09:50 103 -> socket:[158906757]
lrwx------ 1 root root 64 Sep 16 09:50 104 -> socket:[166590564]
lrwx------ 1 root root 64 Sep 16 09:50 105 -> socket:[166990886]
lrwx------ 1 root root 64 Sep 16 09:50 106 -> socket:[174408591]
lrwx------ 1 root root 64 Sep 16 09:50 107 -> socket:[166991626]
lrwx------ 1 root root 64 Sep 16 09:50 108 -> socket:[166992492]
lrwx------ 1 root root 64 Sep 16 09:50 109 -> socket:[166993970]
l-wx------ 1 root root 64 Sep 16 09:50 11 -> /var/log/sssd/sssd_default.log
lrwx------ 1 root root 64 Sep 16 09:50 110 -> socket:[168650278]
lrwx------ 1 root root 64 Sep 16 09:50 111 -> socket:[172443487]
lrwx------ 1 root root 64 Sep 16 09:50 112 -> socket:[187704428]
lrwx------ 1 root root 64 Sep 16 09:50 113 -> socket:[182234435]
lrwx------ 1 root root 64 Sep 16 09:50 114 -> socket:[179190714]
lrwx------ 1 root root 64 Sep 16 09:50 115 -> socket:[184308683]
lrwx------ 1 root root 64 Sep 16 09:50 116 -> socket:[189328893]
lrwx------ 1 root root 64 Sep 16 09:50 117 -> socket:[189353567]
lrwx------ 1 root root 64 Sep 16 09:50 118 -> socket:[189343260]
lrwx------ 1 root root 64 Sep 16 09:50 119 -> socket:[189374973]
lrwx------ 1 root root 64 Sep 16 09:50 12 -> [eventpoll]
lrwx------ 1 root root 64 Sep 16 09:50 120 -> socket:[189367203]
lrwx------ 1 root root 64 Sep 16 09:50 121 -> socket:[189407282]
lrwx------ 1 root root 64 Sep 16 09:50 122 -> socket:[189389422]
lrwx------ 1 root root 64 Sep 16 09:50 123 -> socket:[189398015]
lrwx------ 1 root root 64 Sep 16 09:50 124 -> socket:[189321826]
lrwx------ 1 root root 64 Sep 16 09:50 125 -> socket:[191857375]
lrwx------ 1 root root 64 Sep 16 09:50 126 -> socket:[191669942]
lrwx------ 1 root root 64 Sep 16 09:50 127 -> socket:[189869722]
lrwx------ 1 root root 64 Sep 16 09:50 128 -> socket:[191848106]
lrwx------ 1 root root 64 Sep 16 09:50 129 -> socket:[189870357]
lrwx------ 1 root root 64 Sep 16 09:50 13 -> /var/lib/sss/db/cache_default.ldb
lrwx------ 1 root root 64 Sep 16 09:50 130 -> socket:[195279172]
lrwx------ 1 root root 64 Sep 16 09:50 131 -> socket:[209696954]
lrwx------ 1 root root 64 Sep 16 09:50 132 -> socket:[198582238]
lrwx------ 1 root root 64 Sep 16 09:50 133 -> socket:[202870686]
lrwx------ 1 root root 64 Sep 16 09:50 134 -> socket:[202881663]
lrwx------ 1 root root 64 Sep 16 09:50 135 -> socket:[218798231]
lrwx------ 1 root root 64 Sep 16 09:50 136 -> socket:[215428278]
lrwx------ 1 root root 64 Sep 16 09:50 137 -> socket:[220534921]
lrwx------ 1 root root 64 Sep 16 09:50 138 -> socket:[218807007]
lrwx------ 1 root root 64 Sep 16 09:50 139 -> socket:[218817269]
lrwx------ 1 root root 64 Sep 16 09:50 14 -> socket:[22419983]
lrwx------ 1 root root 64 Sep 16 09:50 140 -> socket:[220525178]
lrwx------ 1 root root 64 Sep 16 09:50 141 -> socket:[220549570]
lrwx------ 1 root root 64 Sep 16 09:50 142 -> socket:[222279094]
lrwx------ 1 root root 64 Sep 16 09:50 143 -> socket:[230783349]
lrwx------ 1 root root 64 Sep 16 09:50 144 -> socket:[225518746]
lrwx------ 1 root root 64 Sep 16 09:50 145 -> socket:[230774051]
lrwx------ 1 root root 64 Sep 16 09:50 146 -> socket:[237821496]
lrwx------ 1 root root 64 Sep 16 09:50 147 -> socket:[237813267]
lrwx------ 1 root root 64 Sep 16 09:50 148 -> socket:[251038692]
lrwx------ 1 root root 64 Sep 16 09:50 149 -> socket:[239610812]
lrwx------ 1 root root 64 Sep 16 09:50 15 -> socket:[22419986]
lrwx------ 1 root root 64 Sep 16 09:50 150 -> socket:[239547005]
lrwx------ 1 root root 64 Sep 16 09:50 151 -> socket:[239621136]
lrwx------ 1 root root 64 Sep 16 09:50 152 -> socket:[242866383]
lrwx------ 1 root root 64 Sep 16 09:50 153 -> socket:[247910227]
lrwx------ 1 root root 64 Sep 16 09:50 154 -> socket:[248831180]
lrwx------ 1 root root 64 Sep 16 09:50 155 -> socket:[248050545]
lrwx------ 1 root root 64 Sep 16 09:50 156 -> socket:[248975173]
lrwx------ 1 root root 64 Sep 16 09:50 157 -> socket:[253786932]
lrwx------ 1 root root 64 Sep 16 09:50 158 -> socket:[253778043]
lrwx------ 1 root root 64 Sep 16 09:50 159 -> socket:[253633523]
lrwx------ 1 root root 64 Sep 16 09:50 16 -> socket:[22419990]
lrwx------ 1 root root 64 Sep 16 09:50 160 -> socket:[253782320]
lrwx------ 1 root root 64 Sep 16 09:50 161 -> socket:[264670302]
lrwx------ 1 root root 64 Sep 16 09:50 162 -> socket:[253946603]
lrwx------ 1 root root 64 Sep 16 09:50 163 -> socket:[258063353]
lrwx------ 1 root root 64 Sep 16 09:50 164 -> socket:[253946837]
lrwx------ 1 root root 64 Sep 16 09:50 165 -> socket:[253947206]
lrwx------ 1 root root 64 Sep 16 09:50 166 -> socket:[253947641]
lrwx------ 1 root root 64 Sep 16 09:50 167 -> socket:[260200790]
lrwx------ 1 root root 64 Sep 16 09:50 168 -> socket:[258786272]
lrwx------ 1 root root 64 Sep 16 09:50 169 -> socket:[259133490]
l-wx------ 1 root root 64 Sep 16 09:50 17 -> /var/log/sssd/ldap_child.log
lrwx------ 1 root root 64 Sep 16 09:50 170 -> socket:[266886828]
lrwx------ 1 root root 64 Sep 16 09:50 171 -> socket:[266890082]
lrwx------ 1 root root 64 Sep 16 09:50 172 -> socket:[270535370]
lrwx------ 1 root root 64 Sep 16 09:50 173 -> socket:[274378300]
lrwx------ 1 root root 64 Sep 16 09:50 174 -> socket:[270631237]
lrwx------ 1 root root 64 Sep 16 09:50 175 -> socket:[270819551]
....
}}}

Comment 1 Jakub Hrozek 2015-10-23 08:29:29 UTC
Dev acking, fix available upstream.

Comment 2 Jakub Hrozek 2015-10-30 06:53:45 UTC
Please add steps to reproduce so that we can qa_ack..

Comment 3 Lukas Slebodnik 2015-10-30 11:36:05 UTC
My reproducer: 
* two active directories with sites; so sometimes sssd connect to server A 
  and sometimes to server B. 
* block connection to one server 
 
[root@host sssd]# iptables -n -L 
Chain INPUT (policy ACCEPT) 
target     prot opt source               destination 
                                                                                Chain FORWARD (policy ACCEPT) 
target     prot opt source               destination                                                      
Chain OUTPUT (policy ACCEPT)             
target     prot opt source               destination 
DROP       tcp  --  0.0.0.0/0            10.12.0.158          tcp dpt:389                                 
 
* force sssd to go offline. Send signals to sssd process to go offline (-USR1) and online (-USR2). 

You might be able to reproduce it even with plain LDAP. So it should not be necessary to test with AD. Moreover it should be simpler to reproduce with LDAP because sssd will not connect to different site in AD and every time will try to connect to blocked ldap port.
 
It might help if set value of options that 
  dns_resolver_timeout < ldap_network_timeout

Comment 4 Lukas Slebodnik 2015-11-05 09:39:48 UTC
master:
* a10f67d4c64f3b1243de5d86a996475361adf0ac 

sssd-1-13:
* db2fdba6f3cecd0612439988e61be60d5d8576bf 

sssd-1-12:
* 2136f71c94660bcdde83f80feb83734389d57674

Comment 8 Dan Lavu 2016-03-22 19:17:00 UTC
Verified against sssd-client-1.13.3-19.el6.x86_64, sockets are closing correctly.

[root@sssdqe5 ~]# nslookup ad2.domain.com
Server:		192.168.51.4
Address:	192.168.51.4#53

Name:	ad2.domain.com
Address: 192.168.51.5

[root@sssdqe5 ~]# iptables -A INPUT -s 192.168.51.5 -j DROP

Made the process go offline/online several times, and they're no additional sockets.


[root@sssdqe5 ~]# ls -l /proc/`pgrep sssd_be`/fd/
ls: cannot access 5515: No such file or directory
ls: cannot access 5524: No such file or directory
ls: cannot access 5537/fd/: No such file or directory
/proc/5420:
total 0
dr-xr-xr-x. 2 root root 0 Mar 22 15:11 attr
-rw-r--r--. 1 root root 0 Mar 22 15:11 autogroup
-r--------. 1 root root 0 Mar 22 15:11 auxv
-r--r--r--. 1 root root 0 Mar 22 15:11 cgroup
--w-------. 1 root root 0 Mar 22 15:11 clear_refs
-r--r--r--. 1 root root 0 Mar 22 15:05 cmdline
-rw-r--r--. 1 root root 0 Mar 22 15:11 comm
-rw-r--r--. 1 root root 0 Mar 22 15:11 coredump_filter
-r--r--r--. 1 root root 0 Mar 22 15:11 cpuset
lrwxrwxrwx. 1 root root 0 Mar 22 15:11 cwd -> /
-r--------. 1 root root 0 Mar 22 15:11 environ
lrwxrwxrwx. 1 root root 0 Mar 22 15:11 exe -> /usr/libexec/sssd/sssd_be
dr-x------. 2 root root 0 Mar 22 15:11 fd
dr-x------. 2 root root 0 Mar 22 15:11 fdinfo
-r--------. 1 root root 0 Mar 22 15:11 io
-rw-------. 1 root root 0 Mar 22 15:11 limits
-rw-r--r--. 1 root root 0 Mar 22 15:11 loginuid
-r--r--r--. 1 root root 0 Mar 22 15:11 maps
-rw-------. 1 root root 0 Mar 22 15:11 mem
-r--r--r--. 1 root root 0 Mar 22 15:11 mountinfo
-r--r--r--. 1 root root 0 Mar 22 15:11 mounts
-r--------. 1 root root 0 Mar 22 15:11 mountstats
dr-xr-xr-x. 5 root root 0 Mar 22 15:11 net
dr-x--x--x. 2 root root 0 Mar 22 15:11 ns
-r--r--r--. 1 root root 0 Mar 22 15:11 numa_maps
-rw-r--r--. 1 root root 0 Mar 22 15:11 oom_adj
-r--r--r--. 1 root root 0 Mar 22 15:11 oom_score
-rw-r--r--. 1 root root 0 Mar 22 15:11 oom_score_adj
-r--r--r--. 1 root root 0 Mar 22 15:11 pagemap
-r--r--r--. 1 root root 0 Mar 22 15:11 personality
lrwxrwxrwx. 1 root root 0 Mar 22 15:11 root -> /
-rw-r--r--. 1 root root 0 Mar 22 15:11 sched
-r--r--r--. 1 root root 0 Mar 22 15:11 schedstat
-r--r--r--. 1 root root 0 Mar 22 15:11 sessionid
-r--r--r--. 1 root root 0 Mar 22 15:11 smaps
-r--r--r--. 1 root root 0 Mar 22 15:11 stack
-r--r--r--. 1 root root 0 Mar 22 15:05 stat
-r--r--r--. 1 root root 0 Mar 22 15:11 statm
-r--r--r--. 1 root root 0 Mar 22 15:03 status
-r--r--r--. 1 root root 0 Mar 22 15:11 syscall
dr-xr-xr-x. 3 root root 0 Mar 22 15:11 task
-r--r--r--. 1 root root 0 Mar 22 15:11 wchan

Comment 9 kludhwan 2016-03-30 14:43:03 UTC
Created attachment 1141778 [details]
lsof output from cu rhel 7.2 system

Hello,  

I have a cu that seems to facing the similar issue on his rhel 7.2 system.

Do we have bugzilla for rhel 7 as well?

Thanks,
Kushal

Comment 10 Jakub Hrozek 2016-03-30 15:15:45 UTC
(In reply to kludhwan from comment #9)
> Created attachment 1141778 [details]
> lsof output from cu rhel 7.2 system
> 
> Hello,  
> 
> I have a cu that seems to facing the similar issue on his rhel 7.2 system.
> 
> Do we have bugzilla for rhel 7 as well?
> 
> Thanks,
> Kushal

https://bugzilla.redhat.com/show_bug.cgi?id=1313014

Comment 12 errata-xmlrpc 2016-05-10 20:24:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0782.html


Note You need to log in before you can comment on or make changes to this bug.