Bug 1122873
Summary: | Failover does not always happen from SRV to hostname resolution(via /etc/hosts) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Kaushik Banerjee <kbanerje> | ||||
Component: | sssd | Assignee: | Jakub Hrozek <jhrozek> | ||||
Status: | CLOSED ERRATA | QA Contact: | Kaushik Banerjee <kbanerje> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.6 | CC: | fahmed, grajaiya, jgalipea, lslebodn, mkosek, pbrezina, preichl | ||||
Target Milestone: | rc | Keywords: | Regression | ||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | sssd-1.11.6-13.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: Port status was not marked as not working when previous SRV lookup status failed but SSSD still did not try to resolv SRV record again.
Consequence: Failover did not cycle through all available servers and thus SSSD remained offline.
Fix: Port status is marked as not working.
Result: Failover mechanism will continue with next configured server.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-10-14 04:49:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Kaushik Banerjee
2014-07-24 09:24:16 UTC
Hi, I couldn't reproduce this bug. Do you see this also with 1.12 or current master? Can you please send me sssd.conf and the whole log file? Thanks. Created attachment 920861 [details]
Domain log
I can reproduce this with 1.11 version of sssd on RHEL6.6
The sssd.conf domain section has:
[domain/LDAP]
debug_level = 0xFFFF
id_provider = ldap
ldap_uri = _srv_ , ldap://<ldap server>
ldap_search_base = dc=example,dc=com
I will e-mail you my machine credentials privately in case you want to reproduce on my setup.
Hi, thanks for the info. It is reproducible by these steps only with sssd-1-11. The problem is hidden in master by "failover: Shorter retry time for failed SRV" 8ed6630fb25c7d5af6e2d9915786b94600198b01 commit, which lowers SRV retry timeout to 15 seconds instead of 14k if previous attempt to resolve SRV failed. However, this commit only hides the problem. The log shows that for some reason we only try SRV record and we do not cycle through all configured servers. I will investigate more. I have just a little tip for you test. In SSSD we have a mechanism called "fast reply" which makes the user request return immediately if SSSD is in offline mode and than continue with the request out of band. I.e. the last getent command will still fail since SSSD is still offline but the user data will be downloaded in background (which brings SSSD back online). So to make the test correct you should either grep SSSD logs to see if we are online or put another getent command at the end or signal SSSD with SIGUSR2 before the getent command (which will make it go online). Upstream ticket: https://fedorahosted.org/sssd/ticket/2390 For completeness, here are the commit hashes: master: 5490fa3ea4f6a4118a5188acb0e05daa302ed9d6 sssd-1-11: 8b0aeee7fe668d8863699d942519a0ac51304049 Verified in build sssd-1.11.6-14.el6 Failover works properly now. Log shows: <snip> (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'LDAP' as 'not resolved' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (1432158226) (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [get_server_status] (0x1000): Status of server 'ldapserver.example.com' is 'name not resolved' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldapserver.example.com' is 'neutral' </snip> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1375.html |