Red Hat Bugzilla – Bug 1122873
Failover does not always happen from SRV to hostname resolution(via /etc/hosts)
Last modified: 2015-05-03 20:47:46 EDT
Description of problem: This test was written as part of the fix to bug 954275 The failover is not proper from srv lookup to hostname resolution(via /etc/hosts) Version-Release number of selected component (if applicable): sssd-1.11.6-4.el6 How reproducible: Always Steps to Reproduce: 1. sssd.conf has "ldap_uri=_srv_,ldap://server_hostname" 2. Empty /etc/resolv.conf and /etc/hosts 3. Perform lookup (Fails and sssd goes offline) 4. Update /etc/hosts with <server_ip> <server_hostname> . Verify that you can lookup the server. 5. Sleep for 60 seconds 6. Lookup user (Still fails) Actual results: Log shows: (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_get_account_info] (0x0100): Got request for [4097][1][name=testuser] (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_get_account_info] (0x0100): Request processed. Returned 1,11,Fast reply - offline (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_req_set_domain] (0x0400): Changing request domain from [LDAP] to [LDAP] (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [sdap_id_op_connect_step] (0x4000): beginning to connect (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port status of port 0 for server '(no name)' is 'not working' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [get_port_status] (0x0100): Reseting the status of port 0 for server '(no name)' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [resolve_srv_send] (0x0200): The status of SRV lookup is not resolved (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (5) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0040): Failed to find a server after 10 attempts (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_resolve_server_done] (0x1000): Server resolution failed: 5 (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error]) (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_mark_offline] (0x2000): Going offline! (Tue Jul 22 04:52:22 2014) [sssd[be[LDAP]]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks. Expected results: Failover works Additional info:
Hi, I couldn't reproduce this bug. Do you see this also with 1.12 or current master? Can you please send me sssd.conf and the whole log file? Thanks.
Created attachment 920861 [details] Domain log I can reproduce this with 1.11 version of sssd on RHEL6.6 The sssd.conf domain section has: [domain/LDAP] debug_level = 0xFFFF id_provider = ldap ldap_uri = _srv_ , ldap://<ldap server> ldap_search_base = dc=example,dc=com I will e-mail you my machine credentials privately in case you want to reproduce on my setup.
Hi, thanks for the info. It is reproducible by these steps only with sssd-1-11. The problem is hidden in master by "failover: Shorter retry time for failed SRV" 8ed6630fb25c7d5af6e2d9915786b94600198b01 commit, which lowers SRV retry timeout to 15 seconds instead of 14k if previous attempt to resolve SRV failed. However, this commit only hides the problem. The log shows that for some reason we only try SRV record and we do not cycle through all configured servers. I will investigate more. I have just a little tip for you test. In SSSD we have a mechanism called "fast reply" which makes the user request return immediately if SSSD is in offline mode and than continue with the request out of band. I.e. the last getent command will still fail since SSSD is still offline but the user data will be downloaded in background (which brings SSSD back online). So to make the test correct you should either grep SSSD logs to see if we are online or put another getent command at the end or signal SSSD with SIGUSR2 before the getent command (which will make it go online).
Upstream ticket: https://fedorahosted.org/sssd/ticket/2390
For completeness, here are the commit hashes: master: 5490fa3ea4f6a4118a5188acb0e05daa302ed9d6 sssd-1-11: 8b0aeee7fe668d8863699d942519a0ac51304049
Verified in build sssd-1.11.6-14.el6 Failover works properly now. Log shows: <snip> (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [set_srv_data_status] (0x0100): Marking SRV lookup of service 'LDAP' as 'not resolved' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0080): Couldn't resolve server (SRV lookup meta-server), resolver returned (1432158226) (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [be_resolve_server_process] (0x1000): Trying with the next one! (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [get_server_status] (0x1000): Status of server 'ldapserver.example.com' is 'name not resolved' (Tue Aug 5 02:39:16 2014) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port status of port 389 for server 'ldapserver.example.com' is 'neutral' </snip>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1375.html