Bug 728343

Summary: SSSD taking 5 minutes to log in
Product: Red Hat Enterprise Linux 6 Reporter: Matthew Mosesohn <mmosesoh>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: benl, dpal, grajaiya, jgalipea, jhrozek, jvcelak, kbanerje, msanders, pbinkows, prc
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.5.1-52.el6 Doc Type: Bug Fix
Doc Text:
Cause: SSSD checked for an incorrect DBus return code Consequence: Instead of detecting timeouts properly, the monitor process disconnected from the back end process resulting in failure to be notified about back end going online and visible lags Fix: SSSD checks for a correct DBus code and handles timeouts on the DBus connection better Result: No more unwanted monitor disconnects resulting in less lags
Story Points: ---
Clone Of:
: 748864 (view as bug list) Environment:
Last Closed: 2011-12-06 16:39:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 743047, 746654, 748864, 748893    
Attachments:
Description Flags
logs with debug packages none

Description Matthew Mosesohn 2011-08-04 19:30:01 UTC
Description of problem:
This is a duplicate of 726467 with another example.

Version-Release number of selected component (if applicable):
sssd-1.5.1-34.el6_1.2.x86_64

How reproducible:
intermittently

Steps to Reproduce:
1. Lock screen
2. Drop VPN connection through bad connection
3. Attempt to unlock screen by typing password
  
Actual results:
The screen locks for 5 minutes before deferring to local password

Expected results:
The timeout should be a reasonable length of time

Additional info:
See attached logs for more information

Comment 2 Jakub Hrozek 2011-08-04 19:42:50 UTC
(In reply to comment #0)
> Additional info:
> See attached logs for more information

I think you forgot to attach them?

Comment 5 Dmitri Pal 2011-08-04 21:55:56 UTC
(In reply to comment #0)
> Description of problem:
> This is a duplicate of 726467 with another example.
> 

If it is the same you should have 60 DNS servers in the resolv.conf

Comment 6 Jakub Hrozek 2011-08-05 09:14:13 UTC
It is not the same issue (although there are some DNS timeouts in the logs as well). This looks like improper handling of LDAP timeouts:


(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.rdu.redhat.com' is 'working'
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.rdu.redhat.com' is 'neutral'
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.rdu.redhat.com' is 'working'
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.rdu.redhat.com: [10.11.255.5]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_uri_callback] (6): Constructed uri 'ldap://ldap.rdu.redhat.com'
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_rootdse_send] (9): Getting rootdse
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (6): calling ldap_search_ext with [(objectclass=*)][].
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [*]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [altServer]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [namingContexts]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedControl]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedExtension]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedFeatures]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedLDAPVersion]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedSASLMechanisms]
(Thu Aug  4 11:00:07 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (7): Requesting attrs: [defaultNamingContext]
--------------------------------------------------------------------------------
(Thu Aug  4 11:00:58 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (3): ldap_search_ext failed: Can't contact LDAP server
(Thu Aug  4 11:00:58 2011) [sssd[be[redhat.com]]] [sdap_get_generic_send] (3): Connection error: (null)

There's more than 50 seconds waiting for ldap_search_ext to time out.

Comment 7 Jakub Hrozek 2011-08-10 16:14:57 UTC
We suspect this issue is actually a libldap bug. However, we need additional data to debug this issue further.

I have prepared a scratch build that is able to turn libldap debugging. It is the same RPM as you were using, with one additional patch:
https://brewweb.devel.redhat.com/taskinfo?taskID=3546607

Would you mind testing it out? You would need to
1) install the RPMs
2) tell SSSD to raise the debug level by setting the SSSD_DEBUG_LDAP_SEARCH environment variable. I did so by putting:

export SSSD_DEBUG_LDAP_SEARCH="0xffff"

into /etc/init.d/sssd

3) service sssd restart

When the timeout occurs again, please attach your logs.

Thank you!

Comment 8 Matthew Mosesohn 2011-08-16 14:07:27 UTC
Jakub,

the brew scratch build failed or expired.  Can you please build it again?

Comment 9 Jakub Hrozek 2011-08-16 14:46:24 UTC
Of course:

https://brewweb.devel.redhat.com/taskinfo?taskID=3559904

Comment 15 Matthew Mosesohn 2011-08-23 14:00:11 UTC
There was an openldap update available:

Aug 16 09:37:10 Updated: openldap-2.4.23-15.el6_1.1.x86_64
Aug 16 09:37:16 Updated: openldap-clients-2.4.23-15.el6_1.1.x86_64

Maybe this fixed it

Comment 16 Jakub Hrozek 2011-08-23 14:10:34 UTC
(In reply to comment #15)
> There was an openldap update available:
> 
> Aug 16 09:37:10 Updated: openldap-2.4.23-15.el6_1.1.x86_64
> Aug 16 09:37:16 Updated: openldap-clients-2.4.23-15.el6_1.1.x86_64
> 
> Maybe this fixed it

Jan, is it possible that the update made an ldap_search_ext() timeout go away?

The easiest way to be sure is either revert back to packages without the extra debugging or simply comment out the SSSD_DEBUG_LDAP_SEARCH environment variable to turn off the debugging and see if the bug strikes again.

Comment 17 Jan Vcelak 2011-08-24 07:44:28 UTC
I don't think so. That update fixes just one memory leak (#717738), it should not influence behavior of ldap_search_ext().

Comment 18 Jakub Hrozek 2011-08-26 06:42:44 UTC
Matthew, have you tried reverting to the packages without debugging enabled? Would the bug hit again? (That would confirm that it is a race condition in openldap-libs that doesn't occur when the extra debugging slows down the LDAP search operation)

Comment 20 Matthew Mosesohn 2011-08-31 16:06:39 UTC
Created attachment 520856 [details]
logs with debug packages

See attachment.  It has the issue reproducing itself using the debug packages provided

Comment 23 Stephen Gallagher 2011-09-14 17:28:38 UTC
Okay, I think I've identified the problem here. What happened is that an error was received that caused the data provider backend to disconnect from the monitor process. As a result, the monitor was unable to inform the data provider when the network state had changed (which tells the data provider to refresh its DNS information).

So the back-end was stuck in the offline state, with repeated attempts to reconnect being stymied by a combination of repeated DNS timeouts and slow LDAP connection attempts causing the visible lag.

I've scratch-built this fix at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3635153

Please test and let me know if it resolves your issue.

Comment 24 Stephen Gallagher 2011-09-15 15:42:59 UTC
I found an additional issue with the previous patch. New scratch-build is here: http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3638970

Please verify this one instead.

Comment 25 Stephen Gallagher 2011-09-15 15:59:28 UTC
Sorry, bad merge of the patch from the master branch on that previous scratch build. http://brewweb.devel.redhat.com/brew/taskinfo?taskID=3638997 should work now.

Comment 28 Kaushik Banerjee 2011-10-17 12:29:10 UTC
Steps to verify:

1. # service sssd stop;rm -f /var/lib/sss/db/*;service sssd start
Stopping sssd:                                             [  OK  ]
Starting sssd:                                             [  OK  ]

2. # killall -STOP sssd_be

3. After backend restarts,
   # killall -CONT sssd_be

4. # getent -s sss passwd kau20
kau20:*:202020:202020:kaushik20:/home/kau20:

Verified in version:
# rpm -qi sssd | head
Name        : sssd                         Relocations: (not relocatable)
Version     : 1.5.1                             Vendor: Red Hat, Inc.
Release     : 58.el6                        Build Date: Sat 15 Oct 2011 12:14:26 AM IST
Install Date: Mon 17 Oct 2011 12:00:06 PM IST      Build Host: x86-001.build.bos.redhat.com
Group       : Applications/System           Source RPM: sssd-1.5.1-58.el6.src.rpm
Size        : 3599114                          License: GPLv3+
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://fedorahosted.org/sssd/
Summary     : System Security Services Daemon

Comment 29 Jakub Hrozek 2011-10-27 15:06:53 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: SSSD checked for an incorrect DBus return code
Consequence: Instead of detecting timeouts properly, the monitor process disconnected from the back end process resulting in failure to be notified about back end going online and visible lags
Fix: SSSD checks for a correct DBus code and handles timeouts on the DBus connection better
Result: No more unwanted monitor disconnects resulting in less lags

Comment 30 errata-xmlrpc 2011-12-06 16:39:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1529.html