Bug 795562
Summary: | Infinite loop checking Kerberos credentials | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Daniel Sands <dnsands> | ||||
Component: | sssd | Assignee: | Stephen Gallagher <sgallagh> | ||||
Status: | CLOSED ERRATA | QA Contact: | IDM QE LIST <seceng-idm-qe-list> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.2 | CC: | dpal, grajaiya, jgalipea, jhrozek, kbanerje, prc | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | sssd-1.8.0-12.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: A status of a server in the SSSD server list is reset after 30 seconds to allow retries. If a full cycle over the server list took more than 30 seconds, the cycle would start again
Consequence: SSSD deployments using large server fail over lists might loop indefinitely
Fix: The SSSD was fixed to only loop over the fail over list once
Result: If the SSSD tries all the servers in the fail over list without succeeding, the operation always fails.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 828190 (view as bug list) | Environment: | |||||
Last Closed: | 2012-06-20 11:55:03 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 828190 | ||||||
Attachments: |
|
Description
Daniel Sands
2012-02-20 22:13:27 UTC
If handle_child_done() is never called would mean that the child never answered. Daniel, do you have logs available? Especially the /var/log/sssd/sssd_$domain.log and /var/log/sssd/krb5_child.log are of interest. You can generate logs by putting debug_level = 7 (or higher) into the domain section of sssd.conf Created attachment 565700 [details]
sssd_default logging (sanitized)
Here is the sssd_default log. There are no krb5_child logs to speak of because when krb5_child is killed with -9, it apparently fails to add any logging.
Just to clarify, I know that krb5_child finished its job because I was watching the network with WireShark while krb5_child was being restarted every 15 seconds. Upstream ticket: https://fedorahosted.org/sssd/ticket/1214 Thank you for providing the logs, Daniel. I identified the cause of the infinite loop from them, which is now being tracked in the upstream ticket #1214. However, I'm concerned about the empty krb5_child.log. Even if the child is killed with a signal, it should always log at least a level-7 debug message saying "krb5_child started.". I'll look into reproducing that issue locally. (In reply to comment #6) > However, I'm concerned about the empty krb5_child.log. Even if the child is > killed with a signal, it should always log at least a level-7 debug message > saying "krb5_child started.". I'll look into reproducing that issue locally. There was actually a small bug where we would set up debugging to file *after* we attempted to print the "krb5_child started." message. There are also no tracing DEBUG messages in the child code, only error reporting, so unless there was an error, the child process might really not log anything..I suspect that was your case, the child called a blocking krb5 function, hang there, was killed by the responder without having a chance to log any error. That's plausible. In my case Wireshark showed that the KRB5 request was sent and a reply received. And once the sssd backend closed its send pipe, krb5_child seemed to exit normally. Please add steps to reproduce this issue (In reply to comment #9) > Please add steps to reproduce this issue The process of connecting to the server must take at least 30 seconds, which is the (currently hardcoded) interval after which we reset the status of a server. This could be achieved by putting two krb5 servers into the fail over list, setting the krb5_auth_timeout to, say, 20 seconds and starting a firewall with DROP rules on the servers. The authentication should take 40 seconds, after which the status of the first server would be reset and authentication would cycle. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: A status of a server in the SSSD server list is reset after 30 seconds to allow retries. If a full cycle over the server list took more than 30 seconds, the cycle would start again Consequence: SSSD deployments using large server fail over lists might loop indefinitely Fix: The SSSD was fixed to only loop over the fail over list once Result: If the SSSD tries all the servers in the fail over list without succeeding, the operation always fails. Tested before fix: The first server status is always reset. With the patch (sssd-1.8.0-23.el6): The cycle doesn't restart. The last server status is reset. Verified in version: # rpm -qi sssd | head Name : sssd Relocations: (not relocatable) Version : 1.8.0 Vendor: Red Hat, Inc. Release : 23.el6 Build Date: Fri 20 Apr 2012 11:30:39 PM IST Install Date: Mon 23 Apr 2012 08:48:40 PM IST Build Host: x86-003.build.bos.redhat.com Group : Applications/System Source RPM: sssd-1.8.0-23.el6.src.rpm Size : 7874744 License: GPLv3+ Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://fedorahosted.org/sssd/ Summary : System Security Services Daemon Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0747.html |