Bug 726467

Summary: SSSD takes 30+ seconds to login
Product: Red Hat Enterprise Linux 6 Reporter: Mason Sanders <msanders>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: Chandrasekar Kannan <ckannan>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.2CC: benl, dpal, grajaiya, jgalipea, jhrozek, kbanerje, msanders, prc
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: sssd-1.8.0-2.el6.beta2 Doc Type: Bug Fix
Doc Text:
No documentation needed
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 11:47:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 637248, 736857, 756082    
Attachments:
Description Flags
tar cfz sssd-msanders.tar.gz /etc/sssd/sssd.conf /var/log/messages* /var/log/sssd none

Description Mason Sanders 2011-07-28 17:51:09 UTC
Created attachment 515776 [details]
tar cfz sssd-msanders.tar.gz /etc/sssd/sssd.conf /var/log/messages* /var/log/sssd

Description of problem:
After a while of having my computer on and suspending/resuming and docking/undocking sssd will go from letting me login in a few seconds to taking 30+ seconds to let me login.  If I reboot the problem is fixed for a matter of time.

Version-Release number of selected component (if applicable):
sssd-1.5.1-34.el6_1.2.x86_64

How reproducible:
always

Steps to Reproduce:
1. use laptop for a week.  suspend/resume and dock/undock
2. sssd will eventually start taking a long time to login
3.
  
Actual results:
sssd takes 30+ seconds to login

Expected results:
sssd logins immediately

Additional info:
Logs and config files attached.

Comment 2 Dmitri Pal 2011-07-28 20:04:54 UTC
I have seen this. It is usually the case when the VPN drops in between the last two times SSSD talks to server or you change networks. For example you were on the netweork that had direct access to the server and then close the lid, suspend and go to a place like Whole Foods or Panera and try to resume there. The network connection will be established pretty quickly if you go there from time to time and have non expired certs but SSSD might be confused that it is online and try server with fail over before it will give up and go offline.
Anyways to troubleshoot the issue we would need SSSD logs. I suspect the devug_level should be at least 6 to see what is going on.

Comment 3 Mason Sanders 2011-07-28 20:11:37 UTC
Dmitri,

I attached the logs in the tar file I uploaded when I created the bug.  Let me know if you need something additional.

Mason

Comment 4 Jenny Severance 2011-07-29 12:45:29 UTC
I have also seen this when the VPN drops while my screen is locked from prolonged inactivity or has been suspended for an extended period of time.

Comment 5 Jakub Hrozek 2011-08-04 19:19:16 UTC
I haven't been able to reproduce the issue yet, but the investigation of the logs revealed a possible cause, which is our improper handling of DNS timeouts.

----------------------------------------------
(Thu Jul 28 13:41:20 2011) [sssd[be[redhat.com]]] [set_server_common_status] (4): Marking server 'kerberos.rdu.redhat.com' as 'resolving name
'
(Thu Jul 28 13:41:21 2011) [sssd[be[redhat.com]]] [check_fd_timeouts] (9): Checking for DNS timeouts
(Thu Jul 28 13:41:25 2011) [sssd[be[redhat.com]]] [check_fd_timeouts] (9): Checking for DNS timeouts
(Thu Jul 28 13:41:30 2011) [sssd[be[redhat.com]]] [check_fd_timeouts] (9): Checking for DNS timeouts
(Thu Jul 28 13:41:31 2011) [sssd[be[redhat.com]]] [check_fd_timeouts] (9): Checking for DNS timeouts
(Thu Jul 28 13:41:36 2011) [sssd[be[redhat.com]]] [check_fd_timeouts] (9): Checking for DNS timeouts
----------------------------------------------

Our internal resolver library treats its timeout parameter as per-server. I suspect that in the above example, /etc/resolv.conf contained multiple records and resolver waited 5 seconds for every one of them. The same happened for the second server configured in fail over, doubling the total time.

This does not happen if the DNS server is down or unreachable, because the resolver would immediatelly detect that it can't connect and fail over.

I would like to try to reproduce the issue to be sure but I think we need to have a mechanism to cancel the resolving after the timeout and don't rely on the resolver library timeouts.

Comment 12 Jakub Hrozek 2012-04-03 17:21:10 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
No documentation needed

Comment 13 Kaushik Banerjee 2012-04-27 08:46:50 UTC
Verified with sssd-1.8.0-23.el6 that there is an improvement of ~30 seconds with the steps below:

Verification steps:

1. Setup bind dns on nameserver1 and nameserver2. Write a iptables rule to drop packets to port 53 on nameserver1.
2. Resolve hosts ldap.example.com and krb.example.com in the bind server.
3. On the client machine, in /etc/resolv.conf add:
   nameserver nameserver1
   nameserver nameserver2
3. In sssd.conf, the domain section is:
[domain/LDAP-KRB5]
id_provider = ldap
ldap_uri = ldap://invalid1.example.com,ldap://ldap.example.com
ldap_search_base = dc=example,dc=com
debug_level = 0xFFF0
auth_provider = krb5
krb5_server = invalid2.example.com,krb.example.com
krb5_realm = EXAMPLE.COM
4. Perform a auth.


Using sssd-1.5.1-66.el6_2.3:

# time ssh -l puser1 localhost
puser1@localhost's password: 
Last login: Fri Apr 27 13:37:05 2012 from localhost
-sh-4.1$ logout
Connection to localhost closed.

real	0m55.702s
user	0m0.007s
sys	0m0.039s


Using sssd-1.8.0-23.el6:

# time ssh -l puser1 localhost
puser1@localhost's password: 
Last login: Wed Apr 25 20:51:01 2012 from localhost
-sh-4.1$ logout
Connection to localhost closed.

real	0m23.047s
user	0m0.012s
sys	0m0.041s

Comment 15 errata-xmlrpc 2012-06-20 11:47:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0747.html