Bug 884733
Summary: | sssd tries to reconnect to ldap provider too often, slow serving requests while retrying | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Orion Poplawski <orion> |
Component: | sssd | Assignee: | Jakub Hrozek <jhrozek> |
Status: | CLOSED NOTABUG | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | jhrozek, sbose, sgallagh, ssorce |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-12-09 14:31:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Orion Poplawski
2012-12-06 16:13:56 UTC
Sorry, but I'm a little confused now -- does this timeout only happen for a user who is completely offline? Or when just one of the servers is not reachable? I'm confused because you stated "our offline laptop users" but the logs show that the server could be resolved, just that the connect timeout fires. Local or system users like pulse-rt might be good candidates to filter out completely using the "filter_users" configuration option, by the way. Sorry, replace offline with offsite (and so no access to ldap server) but network connected. I haven't really checked with completely offline use. Why do items in /etc/passwd get searched in sss if: passwd: files sss shadow: files sss group: files sss Shouldn't it get it from passwd? If filter_users is still useful for this situation it would be nice to have it automatically configured. I'm usually on irc (orionp) if it's helpful to chat about it there. OK, so there are several problems at play. Your ldap.cora. LDAP server address is resolvable from the Internet (unlike ldap2), so whenever a request comes in, the address is resolved and we attempt to connect. I assume the firewall on the machine drops packets rather then rejects? Because with reject, the SSSD should immediately detect to move on. When the packets are dropped, the SSSD waits until the ldap_network_timeout fires, which is 6 seconds by default (so you can also experiment with lowering the value of ldap_network_timeout). You're right that when a local group is requested AND it exists, the Name Service Switch would stop processing after first hit as the default action in nsswitch.conf is [sucess=return]. However, to my surprise, the pulse-rt group[1] doesn't exist at all on none of the Fedora. The fact that pulse-rt is missing is a problem not only for the SSSD, nss_ldap/nss-pam-ldapd/any other nss module would get queried as well. I consider that a bug in pulseaudio, the group should be present, but empty, because then the request would just shortcut from libnss_files. I filed: https://bugzilla.redhat.com/show_bug.cgi?id=885020 A short-term workaround of putting the group into SSSD's negative cache might do the trick. I'm not sure about putting pulse-rt into the default negcache list, because some admin might actually want to create the group in LDAP rather than on the clients in a completely centrally managed fashion. In conclusion, I don't this there's much we can be doing differently in the SSSD. [1] pulse-rt is a pulseaudio group. According to pulseaudio(1), the membership in that group determines if sounds played by the particular user should receive realtime priority. Thanks for the analysis - I've changed our firewall to reject the ldap requests and will try creating the local pulse-rt group as well. It stills seems to me that sssd should be able to serve nss requests quickly from cache even while trying to reconnect to the ldap server. But I'll be the first to admit that I don't know all of the issues surrounding that behavior. (In reply to comment #4) > Thanks for the analysis - I've changed our firewall to reject the ldap > requests and will try creating the local pulse-rt group as well. > That sounds like the best approach to me..I haven't heard back from the pulseaudio devs if creating the local pulse-rt group has any disadvantages, but their man page doesn't make it sound like there are any. > It stills seems to me that sssd should be able to serve nss requests quickly > from cache even while trying to reconnect to the ldap server. But I'll be > the first to admit that I don't know all of the issues surrounding that > behavior. It does so for existing groups, during cache midpoint refresh -- see the entry_cache_percentage option in man sssd.conf I'm not sure if it would be possible to do the same for non-existing entries like pulse-rt. We need to connect to the remote server to reliably answer if the entry is there or not. The negative hits are cached by the negative cache, which is rather short-lived by default and non-persistent between reboots. |