RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1851112 - LDAP bind can fail due to unconfigurable DNS server timeouts that inhibit SSSD failover [rhel-7.9.z]
Summary: LDAP bind can fail due to unconfigurable DNS server timeouts that inhibit SSS...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sssd
Version: 7.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: SSSD Maintainers
QA Contact: Dan Lavu
URL:
Whiteboard: sync-to-jira qetodo
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-25 15:54 UTC by David Ward
Modified: 2023-10-07 10:09 UTC (History)
12 users (show)

Fixed In Version: sssd-1.16.5-10.el7_9.6
Doc Type: If docs needed, set a value
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.sssd-1.16.5-10.el7_9.5.x86_64
Clone Of:
Environment:
Last Closed: 2020-12-15 11:22:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github SSSD sssd issues 4250 0 None closed Conflicting default timeout values 2020-12-30 15:44:32 UTC
Github SSSD sssd pull 5207 0 None closed 1.16: failover: add dns_resolver_server_timeout option 2020-12-30 15:43:57 UTC
Red Hat Bugzilla 1382750 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1480891 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Issue Tracker SSSD-2603 0 None None None 2023-10-07 10:09:40 UTC

Description David Ward 2020-06-25 15:54:24 UTC
See the linked Bugzilla issues for RHEL 8. The same issue affects RHEL 7.


The fix for RHEL 8 (applied upstream) partly involved making two previously hard-coded DNS timeouts configurable:
 - 'dns_resolver_server_timeout' (response to a query by a single DNS server)
 - 'dns_resolver_op_timeout' (resolution of a query by any DNS server)

These same options need to be exposed in RHEL 7, so that the administrator can adjust them as necessary, in a way that ensures the ordering in [1] is met. Otherwise SSSD failover will be inhibited at certain levels.


The following upstream commits from the sssd-1-16 branch [2], which are very simple, can be added directly to the RHEL 7 sssd package to resolve this:
 - c317452 (config: add dns_resolver_op_timeout to option list)
 - f768f73 (man: fix description of dns_resolver_op_timeout)
 - 9b1f6f8 (man: fix description of dns_resolver_timeout)
 - 2c7c694 (failover: add dns_resolver_server_timeout option)

Please note that this does *not* change any default timeout values; this is backwards compatible. This simply makes it possible for the administrator to set these timeout values when it is necessary, especially in an AD environment.


[1] https://github.com/SSSD/sssd/blob/2c7c69485883ab0f408cf2f8cdcc4617462d68ec/src/man/include/failover.xml#L123-L129
[2] https://github.com/SSSD/sssd/pull/5207

Comment 2 Alexey Tikhonov 2020-06-25 16:57:02 UTC
* `sssd-1-16`
  * 2c7c69485883ab0f408cf2f8cdcc4617462d68ec - failover: add dns_resolver_server_timeout option
  * 9b1f6f88b1245bc8737a4b2616299a8427d822e0 - man: fix description of dns_resolver_timeout
  * f768f73903bf0c3d0d1b4361ebff6b111a918470 - man: fix description of dns_resolver_op_timeout



(In reply to David Ward from comment #0)
> 
> The following upstream commits from the sssd-1-16 branch [2], which are very
> simple, can be added directly to the RHEL 7 sssd package to resolve this:
>  - c317452 (config: add dns_resolver_op_timeout to option list)

Can't find this ^^ hash.
Did you mean https://github.com/SSSD/sssd/commit/7cb6a9f70dd65c88e0f9be97c79702566ab37afb ? This one was included with bz 1719718

>  - f768f73 (man: fix description of dns_resolver_op_timeout)
>  - 9b1f6f8 (man: fix description of dns_resolver_timeout)
>  - 2c7c694 (failover: add dns_resolver_server_timeout option)

Comment 3 David Ward 2020-06-25 17:08:46 UTC
(In reply to Alexey Tikhonov from comment #2)
> >  - c317452 (config: add dns_resolver_op_timeout to option list)
> 
> Can't find this ^^ hash.
> Did you mean
> https://github.com/SSSD/sssd/commit/7cb6a9f70dd65c88e0f9be97c79702566ab37afb
> ? This one was included with bz 1719718

Yes, my apologies. Please disregard that commit since it's covered in RHBZ#1719718.

Comment 6 Alexey Tikhonov 2020-07-02 15:21:29 UTC
Hi David,

could you please open a customer case to back up this request?

This would increase approval chances.

Comment 17 Dan Lavu 2020-12-07 17:02:45 UTC
Verified against sssd-ad-1.16.5-10.el7_9.6.x86_64

[root@rhel79 x86_64]# cat /etc/sssd/sssd.conf 

[sssd]
domains = example.com
config_file_version = 2
services = nss, pam

[domain/example.com]
debug_level = 9
ad_domain = example.com
krb5_realm = EXAMPLE.COM
realmd_tags = manages-system joined-with-samba 
cache_credentials = True
id_provider = ad
krb5_store_password_if_offline = True
default_shell = /bin/bash
ldap_id_mapping = True
use_fully_qualified_names = True
fallback_homedir = /home/%u@%d
access_provider = ad
dns_resolver_server_timeout = 300
dns_resolver_op_timeout = 5


The sssd-ad-1.16.5-10.el7_9.5.x86_64 complained about the new parameters
(2020-12-07  9:03:50:673718): [sssd] [sss_ini_call_validators] (0x0020): [rule/allowed_domain_options]: Attribute 'dns_resolver_server_timeout' is not allowed in section 'domain/example.com'. Check for typos.

ad1 is offline, logs containing dns lookups. 


(2020-12-07 10:14:55): [be[example.com]] [sdap_connect_host_resolv_done] (0x0400): Connecting to ldap://ad2.example.com:389
(2020-12-07 10:14:55): [be[example.com]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [25] for the connection.
(2020-12-07 10:14:55): [be[example.com]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting
(2020-12-07 10:14:55): [be[example.com]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldap://ad2.example.com:389/??base] with fd [25].
(2020-12-07 10:14:55): [be[example.com]] [sdap_connect_host_done] (0x0400): Successful connection to ldap://ad2.example.com:389
(2020-12-07 10:14:55): [be[example.com]] [sdap_print_server] (0x2000): Searching 192.168.75.5:389
(2020-12-07 10:14:55): [be[example.com]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(&(DnsDomain=example.com)(NtVer=\14\00\00\00))][].
(2020-12-07 10:14:55): [be[example.com]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [netlogon]
(2020-12-07 10:14:55): [be[example.com]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 1
(2020-12-07 10:14:55): [be[example.com]] [sdap_op_add] (0x2000): New operation 1 timeout 6
(2020-12-07 10:14:55): [be[example.com]] [sdap_process_result] (0x2000): Trace: sh[0x55f2720bc9c0], connected[1], ops[0x55f2720b8f90], ldap[0x55f2720c02e0]
(2020-12-07 10:14:55): [be[example.com]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_ENTRY]
(2020-12-07 10:14:55): [be[example.com]] [sdap_parse_entry] (0x1000): OriginalDN: [].
(2020-12-07 10:14:55): [be[example.com]] [sdap_parse_range] (0x2000): No sub-attributes for [netlogon]
(2020-12-07 10:14:55): [be[example.com]] [sdap_process_result] (0x2000): Trace: sh[0x55f2720bc9c0], connected[1], ops[0x55f2720b8f90], ldap[0x55f2720c02e0]
(2020-12-07 10:14:55): [be[example.com]] [sdap_process_message] (0x4000): Message type: [LDAP_RES_SEARCH_RESULT]
(2020-12-07 10:14:55): [be[example.com]] [sdap_get_generic_op_finished] (0x0400): Search result: Success(0), no errmsg set
(2020-12-07 10:14:55): [be[example.com]] [sdap_op_destructor] (0x2000): Operation 1 finished
(2020-12-07 10:14:55): [be[example.com]] [sdap_handle_release] (0x2000): Trace: sh[0x55f2720bc9c0], connected[1], ops[(nil)], ldap[0x55f2720c02e0], destructor_lock[0], release_memory[0]
(2020-12-07 10:14:55): [be[example.com]] [remove_connection_callback] (0x4000): Successfully removed connection callback.
(2020-12-07 10:14:55): [be[example.com]] [ad_get_client_site_done] (0x0400): Found site: Default-First-Site-Name
(2020-12-07 10:14:55): [be[example.com]] [ad_get_client_site_done] (0x0400): Found forest: example.com
(2020-12-07 10:14:55): [be[example.com]] [ad_srv_plugin_site_done] (0x0400): About to discover primary and backup servers
(2020-12-07 10:14:55): [be[example.com]] [fo_discover_servers_send] (0x0400): Looking up primary servers
(2020-12-07 10:14:55): [be[example.com]] [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'Default-First-Site-Name._sites.example.com'
(2020-12-07 10:14:55): [be[example.com]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.Default-First-Site-Name._sites.example.com'
(2020-12-07 10:14:55): [be[example.com]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 5 seconds
(2020-12-07 10:14:55): [be[example.com]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher
(2020-12-07 10:14:56): [be[example.com]] [check_fd_timeouts] (0x4000): Checking for DNS timeouts
(2020-12-07 10:14:56): [be[example.com]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher
(2020-12-07 10:14:56): [be[example.com]] [resolv_getsrv_done] (0x1000): Using TTL [600]
(2020-12-07 10:14:56): [be[example.com]] [request_watch_destructor] (0x0400): Deleting request watch
(2020-12-07 10:14:56): [be[example.com]] [fo_discover_srv_done] (0x0400): Got answer. Processing...
(2020-12-07 10:14:56): [be[example.com]] [fo_discover_srv_done] (0x0400): Got 3 servers
(2020-12-07 10:14:56): [be[example.com]] [fo_discover_servers_primary_done] (0x0400): Looking up backup servers
(2020-12-07 10:14:56): [be[example.com]] [resolv_discover_srv_next_domain] (0x0400): SRV resolution of service 'ldap'. Will use DNS discovery domain 'example.com'
(2020-12-07 10:14:56): [be[example.com]] [resolv_getsrv_send] (0x0100): Trying to resolve SRV record of '_ldap._tcp.example.com'
(2020-12-07 10:14:56): [be[example.com]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 5 seconds
(2020-12-07 10:14:56): [be[example.com]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher
(2020-12-07 10:14:57): [be[example.com]] [fo_resolve_service_timeout] (0x0080): Service resolving timeout reached
(2020-12-07 10:14:57): [be[example.com]] [request_watch_destructor] (0x0400): Deleting request watch
(2020-12-07 10:14:57): [be[example.com]] [be_resolve_server_done] (0x1000): Server resolution failed: [14]: Bad address
(2020-12-07 10:14:57): [be[example.com]] [sdap_id_op_connect_done] (0x0020): Failed to connect, going offline (5 [Input/output error])
(2020-12-07 10:14:57): [be[example.com]] [be_mark_offline] (0x2000): Going offline!
(2020-12-07 10:14:57): [be[example.com]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
(2020-12-07 10:14:57): [be[example.com]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
(2020-12-07 10:14:57): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 69 seconds from now [1607354166]
(2020-12-07 10:14:57): [be[example.com]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
(2020-12-07 10:14:57): [be[example.com]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
(2020-12-07 10:14:57): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
(2020-12-07 10:14:57): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0080): No AD server is available, cannot get the subdomain list while offline
(2020-12-07 10:14:57): [be[example.com]] [be_ptask_done] (0x0040): Task [Subdomains Refresh]: failed with [1432158212]: SSSD is offline
(2020-12-07 10:14:57): [be[example.com]] [be_ptask_schedule] (0x0400): Task [Subdomains Refresh]: scheduling task 14400 seconds from now [1607368497]
(2020-12-07 10:14:57): [be[example.com]] [sdap_id_op_connect_done] (0x4000): notify offline to op #2
(2020-12-07 10:14:57): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
(2020-12-07 10:14:57): [be[example.com]] [ad_subdomains_refresh_connect_done] (0x0080): No AD server is available, cannot get the subdomain list while offline
(2020-12-07 10:14:57): [be[example.com]] [dp_req_done] (0x0400): DP Request [Subdomains #0]: Request handler finished [0]: Success
(2020-12-07 10:14:57): [be[example.com]] [_dp_req_recv] (0x0400): DP Request [Subdomains #0]: Receiving request data.
(2020-12-07 10:14:57): [be[example.com]] [dp_req_reply_list_success] (0x0400): DP Request [Subdomains #0]: Finished. Success.
(2020-12-07 10:14:57): [be[example.com]] [dp_req_reply_std] (0x1000): DP Request [Subdomains #0]: Returning [Provider is Offline]: 1,1432158212,Offline
(2020-12-07 10:14:57): [be[example.com]] [dp_req_reply_std] (0x1000): DP Request [Subdomains #0]: Returning [Provider is Offline]: 1,1432158212,Offline
(2020-12-07 10:14:57): [be[example.com]] [dp_table_value_destructor] (0x0400): Removing [8:7:0000:<ALL>] from reply table
(2020-12-07 10:14:57): [be[example.com]] [dp_req_destructor] (0x0400): DP Request [Subdomains #0]: Request removed.
(2020-12-07 10:14:57): [be[example.com]] [dp_req_destructor] (0x0400): Number of active DP request: 0
(2020-12-07 10:14:57): [be[example.com]] [sdap_id_release_conn_data] (0x4000): releasing unused connection

Comment 22 errata-xmlrpc 2020-12-15 11:22:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5459


Note You need to log in before you can comment on or make changes to this bug.