Bug 746654

Summary: SSSD backend gets killed on slow systems
Product: Red Hat Enterprise Linux 6 Reporter: Jan Zeleny <jzeleny>
Component: sssdAssignee: Stephen Gallagher <sgallagh>
Status: CLOSED ERRATA QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: high    
Version: 6.1CC: grajaiya, jgalipea, jhrozek, kbanerje, prc, syeghiay
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-1.5.1-59.el6 Doc Type: Bug Fix
Doc Text:
Do not document
Story Points: ---
Clone Of:
: 748893 (view as bug list) Environment:
Last Closed: 2011-12-06 16:41:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 728343, 748864    
Bug Blocks: 748554, 748893    

Description Jan Zeleny 2011-10-17 12:31:00 UTC
On system with slow network connections or on system under heavy load, the SSSD backend fails to response to monitor pings through SBUS in time and it gets killed repeatedly.

This results in logins taking several minutes. Sometimes the login process gets frozen altogether.

Comment 1 Stephen Gallagher 2011-10-17 12:33:46 UTC
The cause of this bug was that the timeout option is specified in seconds and we were not multiplying it by 1000 before passing it to a function argument requiring milliseconds. Thus 10s (default) was being treated as 10ms. This is far too short a time to reasonably expect a response.

Comment 3 Jan Zeleny 2011-10-17 12:50:55 UTC
Steps to reproduce:
1. Set timeout=1 in config file
2. Remove cache and logs
3. Start sssd
4. tail -f /var/log/sssd/sssd.log
5. log in as a user stored on remote server to invoke backend operation

You should see messages in the log that connection to backend timed out. If the operation of backend is long enough, the monitor sends SIGTERM to the backend and it has to restart, potentially leaving ongoing auth requests to hang.

Comment 7 Kaushik Banerjee 2011-10-18 07:47:10 UTC
Verified that auth works appropriately and backend doesn't get killed on a slow system.

Verified in version:
# rpm -qi sssd | head
Name        : sssd                         Relocations: (not relocatable)
Version     : 1.5.1                             Vendor: Red Hat, Inc.
Release     : 59.el6                        Build Date: Mon 17 Oct 2011 04:59:48 PM EDT
Install Date: Tue 18 Oct 2011 01:22:41 AM EDT      Build Host: x86-003.build.bos.redhat.com
Group       : Applications/System           Source RPM: sssd-1.5.1-59.el6.src.rpm
Size        : 3615305                          License: GPLv3+
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://fedorahosted.org/sssd/
Summary     : System Security Services Daemon

Comment 8 Jakub Hrozek 2011-10-25 15:27:22 UTC
Upstream ticket:
https://fedorahosted.org/sssd/ticket/1059

Comment 9 Jakub Hrozek 2011-10-27 14:32:17 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Do not document

Comment 10 errata-xmlrpc 2011-12-06 16:41:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1529.html