Bug 634592

Summary: console freezes while LDAP server is unavailable with SSD
Product: [Fedora] Fedora Reporter: Aaron Hagopian <airhead1>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: dpal, jhrozek, sbose, sgallagh, ssorce
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-23 18:49:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
sssd_default.log with debug level set to 6 none

Description Aaron Hagopian 2010-09-16 13:25:15 UTC
Description of problem:

When I am at my home and my work LDAP server is unavailable, my system hangs from about 20 seconds periodically.  It happens more often when I am using konsole but it happen in other apps from time to time as well.  After every period of freeze the following messages show up in /var/log/messages:

Sep 16 08:18:01 barfolomew sssd[be[default]]: LDAP connection error: (null)
Sep 16 08:18:07 barfolomew sssd[be[default]]: LDAP connection error: (null)
Sep 16 08:18:13 barfolomew sssd[be[default]]: LDAP connection error: (null)


Version-Release number of selected component (if applicable):


How reproducible:
Have not seen this problem too much when completely offline, seems to happen when online but the LDAP server is unavailable.

Steps to Reproduce:
1. Setup F13 for LDAP authentication with SSSD
2. Login as an LDAP user (into KDE)
3. use konsole, eventually freezes
  
Actual results:


Expected results:
Expect system not to freeze

Additional info:

Linux barfolomew.hra.local 2.6.34.6-54.fc13.x86_64 #1 SMP Sun Sep 5 17:16:27 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

Name        : sssd
Arch        : x86_64
Version     : 1.2.2
Release     : 21.fc13

Comment 1 Stephen Gallagher 2010-11-05 18:25:40 UTC
Can you tell me if the problem persists with sssd-1.3.0-35.fc13 or later?

Comment 2 Aaron Hagopian 2010-11-05 20:36:19 UTC
(In reply to comment #1)
> Can you tell me if the problem persists with sssd-1.3.0-35.fc13 or later?

Yes it does.  I have been running 1.3.0-35.fc13 since about Oct 11 (according to my yum.log) and continue to have this problem when outside of my network.

The issue also seems to manifest itself when I try to unlock my screensaver, just getting the prompt to put in my password can sometimes take a full minute.  Again unlocking the screensaver when connected in my network is nice and fast.

Some additional info, although I wouldn't think this is a factor, because SSSD uses TLS, and the hostname must match the cert common name, I've put an entry in my /etc/hosts file to point the cert common name (public DNS name where LDAP is not allowed through the firewall) to the local IP address, which obviously will never be available outside my network.

Comment 3 Stephen Gallagher 2010-11-07 12:22:39 UTC
(In reply to comment #2)
> Some additional info, although I wouldn't think this is a factor, because SSSD
> uses TLS, and the hostname must match the cert common name, I've put an entry
> in my /etc/hosts file to point the cert common name (public DNS name where LDAP
> is not allowed through the firewall) to the local IP address, which obviously
> will never be available outside my network.

This might be relevant, actually. SSSD doesn't actually read /etc/hosts for name/IP mapping. It relies only on entries from /etc/resolv.conf and DNS. (This is a shortcoming that we need to fix at some point).

So it's possible that the timeout you're experiencing is a DNS timeout trying to contact your DNS server(s) and then eventually giving up and switching to offline authentication.

Could you add the line
debug_level = 6
to your [sssd/<DOMAIN>] section of your /etc/sssd/sssd.conf (replacing <DOMAIN> with the domain name appropriate to your setup, probably "default" if you used authconfig to set it up originally) and restart SSSD.

Next time you experience this issue, look at /var/log/sssd/sssd_<DOMAIN>.log and copy the log information for the relevant time period into this bug (sanitizing server names and IPs if necessary).

I can then use that to track down what's causing the long timeout.

Comment 4 Jakub Hrozek 2010-11-07 15:51:14 UTC
(In reply to comment #3)
> This might be relevant, actually. SSSD doesn't actually read /etc/hosts for
> name/IP mapping. It relies only on entries from /etc/resolv.conf and DNS. (This
> is a shortcoming that we need to fix at some point).
> 

FWIW, this would be nearly trivial patch, c-ares can read /etc/hosts/

Comment 5 Aaron Hagopian 2010-11-09 15:33:20 UTC
Created attachment 459148 [details]
sssd_default.log with debug level set to 6

Comment 6 Aaron Hagopian 2010-11-09 15:36:28 UTC
Although the attachment doesn't show a terrible one, still took it roughly 22 seconds for me to run "sudo ls" in the konsole, sometimes literally just running a non-sudo command will take up to a minute to process.

Without knowing too much what is going on, it looks like sssd figured out that my ldap server was unavailable in the first 4 seconds but decided to do a bunch more stuff before completing the request.  I do not think its an issue with /etc/hosts since it does show the internal ip address from /etc/hosts, its resolving that probably from nsswitch right? files is set before ldap.

Comment 7 Stephen Gallagher 2010-12-23 18:37:52 UTC
Please try out sssd-1.5.0-1.fc14 and let me know if this fixes your issue.

Comment 8 Aaron Hagopian 2011-01-08 02:03:24 UTC
Nope.  I upgraded the sssd and sssd client from updates-testing to the version you mentioned and still have the hangs on sudo commands and the could not start TLS error in /var/log/messages during the hangup.

Comment 9 Dmitri Pal 2011-01-08 16:14:29 UTC
Aaron,

Does this happen right after some kind of the network disruption? Like VPN dropped, or you suspended machine and resuming or unplugging machine from a docking station? 
Or it just happens periodically when you are working online but your VPN is not connected to the corporate network? How many servers you have configured? If you can attach your sssd.conf would be great. 

Does your experience look similar to this: https://fedorahosted.org/sssd/ticket/709 ?

Thanks
Dmitri

Comment 10 Stephen Gallagher 2011-04-13 14:22:10 UTC
Aaron, is this problem still persisting with SSSD 1.5.4 or later?

From the log posted above, it looks like an error contacting the LDAP server. This could be a certificate issue or a routing problem. Unfortunately, due to bug http://www.openldap.org/its/index.cgi/Incoming?id=6789 we don't get any information back from the ldap client libraries to explain the problem.

Please let us know whether you are still experiencing this issue.

Comment 11 Stephen Gallagher 2011-05-23 18:49:29 UTC
This bug has gone for more than a month without additional data from the reporter.

Please reopen if the requested information is provided.