Hide Forgot
Description of problem: Log in to GDM fails or logs in with cached credentials if master ipa server is down and only replica is available with integrated DNS installed on both Master and Replica. <snip> (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [id_callback] (4): Got id ack and version (1) from Monitor (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [be_client_init] (4): Set-up Backend ID timeout [0x88adc38] (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [be_client_init] (4): Set-up Backend ID timeout [0x88b0b30] (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Cancel DP ID timeout [0x88adc38] (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Added Frontend client [NSS] (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Cancel DP ID timeout [0x88b0b30] (Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Added Frontend client [PAM] (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg] (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_gethostbyname_send] (4): Trying to resolve A record of 'dhcp-100-18-190.testrelm' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolve_srv_cont] (4): Searching for servers via SRV query '_ldap._tcp.testrelm' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_getsrv_send] (4): Trying to resolve SRV record of '_ldap._tcp.testrelm' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolve_srv_done] (1): SRV query failed: [Could not contact DNS servers] (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_set_port_status] (4): Marking port 0 of server '(no name)' as 'not working' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_srv_data_status] (4): Marking SRV lookup of service 'IPA' as 'not resolved' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [get_server_status] (4): Hostname resolution expired, reseting the server status of 'dhcp-100-18-10.testrelm' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'name not resolved' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_gethostbyname_send] (4): Trying to resolve A record of 'dhcp-100-18-10.testrelm' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'resolving name' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_done] (1): Failed to resolve server 'dhcp-100-18-10.testrelm': Could not contact DNS servers (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'not working' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (1): No available servers for service 'IPA' (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [sdap_id_op_connect_done] (1): Failed to connect, going offline (5 [Input/output error]) (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [acctinfo_callback] (4): Request processed. Returned 1,11,Offline (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg] (Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg] (Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg] </snip> Client does not failover to replica, but goes offline. /etc/resolve.conf contains both master and replica nameservers - first master and then replica. If I change the order to replica then slave, then it works Version-Release number of selected component (if applicable): ipa-client-2.0.0-23.el6.i686 sssd-1.5.1-34.el6.i686 How reproducible: always Steps to Reproduce: 1. install and configure IPA master and replica both with integrated DNS 2. install IPA client and test authentication from GDM with an ipa user to cache credentials on the client - make sure /etc/resolve.conf contains both of the DNS servers first the master then the slave 3. create a new ipa user and assign the user a password 4. bring the master IPA server down (ipactl stop) 5. log into the client GDM as the user with cached credentials - uses credential cache even though the replica is available 6. log into the client GDM as the new user - authentication fails and not prompted to create new password Actual results: Replica is not found and client goes offline Expected results: Replica would be used for authentication while master is down Additional info:
Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Updating bug summary. The problem is not limited (or even related) to FreeIPA with integrated DNS. I have opened upstream ticket https://fedorahosted.org/sssd/ticket/867 to track the real issue. We're not properly failing over to secondary DNS servers if the first server in the list is broken. Steps to reproduce: 1. Set up a valid /etc/resolv.conf with a working primary DNS server 2. Add nameserver 127.0.0.2 to the above the working DNS entries (simulates having an unreachable DNS server first in the list) 3. Enable debug logs and restart SSSD The debug log will contain (Wed May 11 16:08:52 2011) [sssd[be[example.com]]] [fo_resolve_service_done] (1): Failed to resolve server 'ldap.example.com': Could not contact DNS servers and SSSD will operate permanently in offline mode because it can never resolve the SRV records. It's unclear right now whether the bug is in SSSD's async resolver or internal to the c-ares library.
Verified in version: # rpm -qi sssd | head Name : sssd Relocations: (not relocatable) Version : 1.5.1 Vendor: Red Hat, Inc. Release : 49.el6 Build Date: Mon 29 Aug 2011 08:26:38 PM IST Install Date: Wed 31 Aug 2011 07:01:44 AM IST Build Host: x86-010.build.bos.redhat.com Group : Applications/System Source RPM: sssd-1.5.1-49.el6.src.rpm Size : 3549339 License: GPLv3+ Signature : (none) Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> URL : http://fedorahosted.org/sssd/ Summary : System Security Services Daemon
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: the internal resolver of SSSD was set to never retry other name servers it reads from /etc/resolv.conf should the first one fail to resolve a host name Consequence: If the resolving failed, SSSD switched to offline mode without asking the other configured name servers Fix: the resolver was configured so that it queries all name servers Result: hostname resulution correctly retries until it either queries all the configured name servers or resolves the host name
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1529.html