Bug 728343
Summary: | SSSD taking 5 minutes to log in | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Matthew Mosesohn <mmosesoh> | ||||
Component: | sssd | Assignee: | Stephen Gallagher <sgallagh> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chandrasekar Kannan <ckannan> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | benl, dpal, grajaiya, jgalipea, jhrozek, jvcelak, kbanerje, msanders, pbinkows, prc | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | sssd-1.5.1-52.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: SSSD checked for an incorrect DBus return code
Consequence: Instead of detecting timeouts properly, the monitor process disconnected from the back end process resulting in failure to be notified about back end going online and visible lags
Fix: SSSD checks for a correct DBus code and handles timeouts on the DBus connection better
Result: No more unwanted monitor disconnects resulting in less lags
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 748864 (view as bug list) | Environment: | |||||
Last Closed: | 2011-12-06 16:39:20 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 743047, 746654, 748864, 748893 | ||||||
Attachments: |
|
Description
Matthew Mosesohn
2011-08-04 19:30:01 UTC
(In reply to comment #0) > Additional info: > See attached logs for more information I think you forgot to attach them? (In reply to comment #0) > Description of problem: > This is a duplicate of 726467 with another example. > If it is the same you should have 60 DNS servers in the resolv.conf It is not the same issue (although there are some DNS timeouts in the logs as well). This looks like improper handling of LDAP timeouts: (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.rdu.redhat.com' is 'working' (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.rdu.redhat.com' is 'neutral' (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.rdu.redhat.com' is 'working' (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.rdu.redhat.com: [10.11.255.5] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_uri_callback] (6): Constructed uri 'ldap://ldap.rdu.redhat.com' (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_rootdse_send] (9): Getting rootdse (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (6): calling ldap_search_ext with [(objectclass=*)][]. (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [*] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [altServer] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [namingContexts] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedControl] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedExtension] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedFeatures] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedLDAPVersion] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedSASLMechanisms] (Thu Aug 4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [defaultNamingContext] -------------------------------------------------------------------------------- (Thu Aug 4 11:00:58 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (3): ldap_search_ext failed: Can't contact LDAP server (Thu Aug 4 11:00:58 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (3): Connection error: (null) There's more than 50 seconds waiting for ldap_search_ext to time out. We suspect this issue is actually a libldap bug. However, we need additional data to debug this issue further. I have prepared a scratch build that is able to turn libldap debugging. It is the same RPM as you were using, with one additional patch: https://brewweb.devel.redhat.com/taskinfo?taskID=3546607 Would you mind testing it out? You would need to 1) install the RPMs 2) tell SSSD to raise the debug level by setting the SSSD_DEBUG_LDAP_SEARCH environment variable. I did so by putting: export SSSD_DEBUG_LDAP_SEARCH="0xffff" into /etc/init.d/sssd 3) service sssd restart When the timeout occurs again, please attach your logs. Thank you! Jakub, the brew scratch build failed or expired. Can you please build it again? There was an openldap update available: Aug 16 09:37:10 Updated: openldap-2.4.23-15.el6_1.1.x86_64 Aug 16 09:37:16 Updated: openldap-clients-2.4.23-15.el6_1.1.x86_64 Maybe this fixed it (In reply to comment #15) > There was an openldap update available: > > Aug 16 09:37:10 Updated: openldap-2.4.23-15.el6_1.1.x86_64 > Aug 16 09:37:16 Updated: openldap-clients-2.4.23-15.el6_1.1.x86_64 > > Maybe this fixed it Jan, is it possible that the update made an ldap_search_ext() timeout go away? The easiest way to be sure is either revert back to packages without the extra debugging or simply comment out the SSSD_DEBUG_LDAP_SEARCH environment variable to turn off the debugging and see if the bug strikes again. I don't think so. That update fixes just one memory leak (#717738), it should not influence behavior of ldap_search_ext(). Matthew, have you tried reverting to the packages without debugging enabled? Would the bug hit again? (That would confirm that it is a race condition in openldap-libs that doesn't occur when the extra debugging slows down the LDAP search operation) Created attachment 520856 [details]
logs with debug packages
See attachment. It has the issue reproducing itself using the debug packages provided
Okay, I think I've identified the problem here. What happened is that an error was received that caused the data provider backend to disconnect from the monitor process. As a result, the monitor was unable to inform the data provider when the network state had changed (which tells the data provider to refresh its DNS information). So the back-end was stuck in the offline state, with repeated attempts to reconnect being stymied by a combination of repeated DNS timeouts and slow LDAP connection attempts causing the visible lag. I've scratch-built this fix at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3635153 Please test and let me know if it resolves your issue. I found an additional issue with the previous patch. New scratch-build is here: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3638970 Please verify this one instead. Sorry, bad merge of the patch from the master branch on that previous scratch build. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3638997 should work now. Steps to verify: 1. # service sssd stop;rm -f /var/lib/sss/db/*;service sssd start Stopping sssd: [ OK ] Starting sssd: [ OK ] 2. # killall -STOP sssd_be 3. After backend restarts, # killall -CONT sssd_be 4. # getent -s sss passwd kau20 kau20:*:202020:202020:kaushik20:/home/kau20: Verified in version: # rpm -qi sssd | head Name : sssd Relocations: (not relocatable) Version : 1.5.1 Vendor: Red Hat, Inc. Release : 58.el6 Build Date: Sat 15 Oct 2011 12:14:26 AM IST Install Date: Mon 17 Oct 2011 12:00:06 PM IST Build Host: x86-001.build.bos.redhat.com Group : Applications/System Source RPM: sssd-1.5.1-58.el6.src.rpm Size : 3599114 License: GPLv3+ Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://fedorahosted.org/sssd/ Summary : System Security Services Daemon Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: SSSD checked for an incorrect DBus return code Consequence: Instead of detecting timeouts properly, the monitor process disconnected from the back end process resulting in failure to be notified about back end going online and visible lags Fix: SSSD checks for a correct DBus code and handles timeouts on the DBus connection better Result: No more unwanted monitor disconnects resulting in less lags Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1529.html |