RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 703624 - SSSD's async resolver only tries the first nameserver in /etc/resolv.conf
Summary: SSSD's async resolver only tries the first nameserver in /etc/resolv.conf
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sssd
Version: 6.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Stephen Gallagher
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks: 707574 708352 748835
TreeView+ depends on / blocked
 
Reported: 2011-05-10 20:16 UTC by Jenny Severance
Modified: 2020-05-04 10:20 UTC (History)
8 users (show)

Fixed In Version: sssd-1.5.1-35.el6
Doc Type: Bug Fix
Doc Text:
Cause: the internal resolver of SSSD was set to never retry other name servers it reads from /etc/resolv.conf should the first one fail to resolve a host name Consequence: If the resolving failed, SSSD switched to offline mode without asking the other configured name servers Fix: the resolver was configured so that it queries all name servers Result: hostname resulution correctly retries until it either queries all the configured name servers or resolves the host name
Clone Of:
: 707574 748835 (view as bug list)
Environment:
Last Closed: 2011-12-06 16:38:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 1909 0 None closed The async resolver only tries the first nameserver in /etc/resolv.conf 2020-08-06 03:46:38 UTC
Red Hat Product Errata RHBA-2011:1529 0 normal SHIPPED_LIVE sssd bug fix and enhancement update 2011-12-06 00:50:20 UTC

Description Jenny Severance 2011-05-10 20:16:24 UTC
Description of problem:
Log in to GDM fails or logs in with cached credentials if master ipa server is down and only replica is available with integrated DNS installed on both Master and Replica.

<snip>

(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [id_callback] (4): Got id ack and version (1) from Monitor
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [be_client_init] (4): Set-up Backend ID timeout [0x88adc38]
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [be_client_init] (4): Set-up Backend ID timeout [0x88b0b30]
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Cancel DP ID timeout [0x88adc38]
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Added Frontend client [NSS]
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Cancel DP ID timeout [0x88b0b30]
(Tue May 10 11:34:28 2011) [sssd[be[testrelm]]] [client_registration] (4): Added Frontend client [PAM]
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg]
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_gethostbyname_send] (4): Trying to resolve A record of 'dhcp-100-18-190.testrelm'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolve_srv_cont] (4): Searching for servers via SRV query '_ldap._tcp.testrelm'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_getsrv_send] (4): Trying to resolve SRV record of '_ldap._tcp.testrelm'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolve_srv_done] (1): SRV query failed: [Could not contact DNS servers]
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_set_port_status] (4): Marking port 0 of server '(no name)' as 'not working'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_srv_data_status] (4): Marking SRV lookup of service 'IPA' as 'not resolved'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [get_server_status] (4): Hostname resolution expired, reseting the server status of 'dhcp-100-18-10.testrelm'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'name not resolved'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [resolv_gethostbyname_send] (4): Trying to resolve A record of 'dhcp-100-18-10.testrelm'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'resolving name'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_done] (1): Failed to resolve server 'dhcp-100-18-10.testrelm': Could not contact DNS servers
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [set_server_common_status] (4): Marking server 'dhcp-100-18-10.testrelm' as 'not working'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (4): Trying to resolve service 'IPA'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [fo_resolve_service_send] (1): No available servers for service 'IPA'
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [sdap_id_op_connect_done] (1): Failed to connect, going offline (5 [Input/output error])
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_run_offline_cb] (3): Going offline. Running callbacks.
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [acctinfo_callback] (4): Request processed. Returned 1,11,Offline
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg]
(Tue May 10 11:34:32 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg]
(Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Tue May 10 11:34:38 2011) [sssd[be[testrelm]]] [be_get_account_info] (4): Got request for [4097][1][name=jennyg]

</snip>

Client does not failover to replica, but goes offline.

/etc/resolve.conf contains both master and replica nameservers - first master and then replica.

If I change the order to replica then slave, then it works



Version-Release number of selected component (if applicable):

ipa-client-2.0.0-23.el6.i686
sssd-1.5.1-34.el6.i686

How reproducible:
always

Steps to Reproduce:
1. install and configure IPA master and replica both with integrated DNS
2. install IPA client and test authentication from GDM with an ipa user to cache credentials on the client - make sure /etc/resolve.conf contains both of the DNS servers first the master then the slave
3. create a new ipa user and assign the user a password
4. bring the master IPA server down (ipactl stop)
5. log into the client GDM as the user with cached credentials - uses credential cache even though the replica is available
6. log into the client GDM as the new user - authentication fails and not prompted to create new password
  
Actual results:

Replica is not found and client goes offline

Expected results:

Replica would be used for authentication while master is down

Additional info:

Comment 2 RHEL Program Management 2011-05-11 06:00:29 UTC
Since RHEL 6.1 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 3 Stephen Gallagher 2011-05-11 20:22:17 UTC
Updating bug summary. The problem is not limited (or even related) to FreeIPA with integrated DNS.

I have opened upstream ticket https://fedorahosted.org/sssd/ticket/867 to track the real issue.


We're not properly failing over to secondary DNS servers if the first server in the list is broken.

Steps to reproduce:

    1. Set up a valid /etc/resolv.conf with a working primary DNS server
    2. Add nameserver 127.0.0.2 to the above the working DNS entries (simulates having an unreachable DNS server first in the list)
    3. Enable debug logs and restart SSSD 

The debug log will contain

(Wed May 11 16:08:52 2011) [sssd[be[example.com]]] [fo_resolve_service_done] (1): Failed to resolve server 'ldap.example.com': Could not contact DNS servers

and SSSD will operate permanently in offline mode because it can never resolve the SRV records.

It's unclear right now whether the bug is in SSSD's async resolver or internal to the c-ares library.

Comment 7 Kaushik Banerjee 2011-09-07 17:15:46 UTC
Verified in version:

# rpm -qi sssd | head
Name        : sssd                         Relocations: (not relocatable)
Version     : 1.5.1                             Vendor: Red Hat, Inc.
Release     : 49.el6                        Build Date: Mon 29 Aug 2011 08:26:38 PM IST
Install Date: Wed 31 Aug 2011 07:01:44 AM IST      Build Host: x86-010.build.bos.redhat.com
Group       : Applications/System           Source RPM: sssd-1.5.1-49.el6.src.rpm
Size        : 3549339                          License: GPLv3+
Signature   : (none)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
URL         : http://fedorahosted.org/sssd/
Summary     : System Security Services Daemon

Comment 8 Jakub Hrozek 2011-10-26 16:17:53 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause: the internal resolver of SSSD was set to never retry other name servers it reads from /etc/resolv.conf should the first one fail to resolve a host name
Consequence: If the resolving failed, SSSD switched to offline mode without asking the other configured name servers
Fix: the resolver was configured so that it queries all name servers
Result: hostname resulution correctly retries until it either queries all the configured name servers or resolves the host name

Comment 9 errata-xmlrpc 2011-12-06 16:38:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1529.html


Note You need to log in before you can comment on or make changes to this bug.