Bug 1283798
Summary: | sssd failover does not work on connecting to non-responsive ldaps:// server | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Striker Leggette <striker> |
Component: | sssd | Assignee: | Tomas Halman <thalman> |
Status: | CLOSED ERRATA | QA Contact: | sssd-qe <sssd-qe> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 8.0 | CC: | cww, dbula, dpal, grajaiya, jgalipea, jhrozek, lslebodn, mhonek, mkosek, mniranja, mzidek, pbrezina, raines, sgoveas, thalman, tscherf |
Target Milestone: | pre-dev-freeze | Keywords: | Reopened |
Target Release: | 8.1 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | sssd-2.2.0-5.el8 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-11-05 22:33:53 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1310069, 1682305 | ||
Bug Blocks: | 1269194, 1461138, 1679810, 1689138 |
Description
Striker Leggette
2015-11-19 22:43:45 UTC
I corrected the title to make it clear this only happens with ldaps:// The SSSD-side fix is available, but there is still some strange behaviour within OpenLDAP that needs to be investigated. Upstream ticket: https://fedorahosted.org/sssd/ticket/2878 btw an easy workaround might be to use StartTLS instead of ldaps://. Please note that SSSD will also automatically try TLS for authentication -- cleartext auth is never permitted with SSSD. Update: I verified the same happens also when openldap-libs is compiled against OpenSSL, so as a next step, I involved the OpenLDAP maintainer to see if this might be an issue in the openldap-libs TLS manipulation. I'm afraid I need to cond-nak this bug until the dependency bug in openldap-libs (#1310069) is fixed. Since this bug depends on bugzilla #1310069 which is still not fixed, I need to move the flags to track rhel-6.10, sorry.. Since this bug depends on a fix for openldap which is not even ready for RHEL-7 and RHEL-6 is already in Production phase 3, I'm closing this bug report as CANTFIX. I had this happen to me last night. This is a highly critical bug for sssd to not be able to failover. I cannot believe it has been 2 years with no fix. The only workaround for now is to use insecure ldap instead of ldaps? sssd does not even recover once the primary LDAP server is back online. One has to manually restart sssd on the client. Did not affect all clients. On most I would see the error: sssd[be[default]]: Could not start TLS encryption. TLS error -5938:Encountered end of file but LDAP lookups still worked to the backup server. Maybe only ones in the middle of a lookup when the server went down got stuck. I only had CentOS7 clients be affected too. (In reply to Paul Raines from comment #14) > I had this happen to me last night. This is a highly critical bug for sssd > to not be able to failover. I cannot believe it has been 2 years with no > fix. The only workaround for now is to use insecure ldap instead of ldaps? StartTLS does not suffer from this issue. Only ldaps:// > sssd does not even recover once the primary LDAP server is back online. One > has to manually restart sssd on the client. > > Did not affect all clients. On most I would see the error: > > sssd[be[default]]: Could not start TLS encryption. TLS > error -5938:Encountered end of file > > but LDAP lookups still worked to the backup server. Maybe only ones in the > middle of a lookup when the server went down got stuck. > > I only had CentOS7 clients be affected too. Since this bugzilla was filed against RHEL-6 where we definitely won't be addressing the issue, I would recommend to track the RHEL-7 version of the OpenLDAP bug, which is a dependency of fixing sssd: https://bugzilla.redhat.com/show_bug.cgi?id=1310069 * master: 2d657df * sssd-1-16: 442cd65 Versions: sssd-client-2.2.0-16.el8.x86_64 sssd-dbus-2.2.0-16.el8.x86_64 sssd-krb5-common-2.2.0-16.el8.x86_64 sssd-winbind-idmap-2.2.0-16.el8.x86_64 sssd-tools-2.2.0-16.el8.x86_64 sssd-common-2.2.0-16.el8.x86_64 sssd-common-pac-2.2.0-16.el8.x86_64 python3-sssdconfig-2.2.0-16.el8.noarch sssd-krb5-2.2.0-16.el8.x86_64 sssd-proxy-2.2.0-16.el8.x86_64 sssd-ad-2.2.0-16.el8.x86_64 sssd-2.2.0-16.el8.x86_64 sssd-nfs-idmap-2.2.0-16.el8.x86_64 sssd-kcm-2.2.0-16.el8.x86_64 sssd-ldap-2.2.0-16.el8.x86_64 sssd-libwbclient-2.2.0-16.el8.x86_64 sssd-ipa-2.2.0-16.el8.x86_64 [root@ci-vm-10-0-145-194 ~]# [Server] Setup 2 directory server (389-ds) on 2 different systems serving users foo0 to foo10 with TLS configured and configured with same suffix dc=example,dc=test Directory servers were configured on 8.1 Version: 389-ds-base-1.4.1.3-5.module+el8.1.0+3776+ece1ae4c.x86_64 [client configuration] Cat /etc/sssd/sssd.conf as below: [sssd] config_file_version = 2 services = nss, pam domains = example1 [domain/example1] ldap_search_base = dc=example,dc=test id_provider = ldap auth_provider = ldap ldap_user_home_directory = /home/%u ldap_uri = ldaps://ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com, ldaps://ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com ldap_tls_cacert = /etc/openldap/cacerts/cacert.pem use_fully_qualified_names = True debug_level = 9 Ran the below test cases: 1. Stop Directory server instance on DS server1(ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com) Verify getent foo0@example1 user can be queried [root@ci-vm-10-0-145-194 sssd]# getent passwd foo0@example1 foo0@example1:*:14583100:14564100:foo0 User::/bin/bash <debug logs> (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_handle_release] (0x2000): Trace: sh[0x5572b5a19b80], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: src/providers/ldap/sdap_async_connection.c: sdap_cli_connect_done: 1591 (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [fo_set_port_status] (0x0100): Marking port 636 of server 'ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com' as 'not working' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [fo_set_port_status] (0x0400): Marking port 636 of duplicate server 'ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com' as 'not working' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [fo_resolve_service_send] (0x0100): Trying to resolve service 'LDAP' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_server_status] (0x1000): Status of server 'ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com' is 'working' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_port_status] (0x1000): Port status of port 636 for server 'ci-vm-10-0-146-127.hosted.upshift.rdu2.redhat.com' is 'not working' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues. (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_server_status] (0x1000): Status of server 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' is 'name not resolved' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_port_status] (0x1000): Port status of port 636 for server 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' is 'neutral' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 6 seconds (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [get_server_status] (0x1000): Status of server 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' is 'name not resolved' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_is_address] (0x4000): [ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com] does not look like an IP address (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_step] (0x2000): Querying files (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve A record of 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' in files (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [set_server_common_status] (0x0100): Marking server 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' as 'resolving name' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_step] (0x2000): Querying files (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_files_send] (0x0100): Trying to resolve AAAA record of 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' in files (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_next] (0x0200): No more address families to retry (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_step] (0x2000): Querying DNS (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_dns_query] (0x0100): Trying to resolve A record of 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' in DNS (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [schedule_request_timeout] (0x2000): Scheduling a timeout of 3 seconds (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [schedule_timeout_watcher] (0x2000): Scheduling DNS timeout watcher (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [unschedule_timeout_watcher] (0x4000): Unscheduling DNS timeout watcher (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [resolv_gethostbyname_dns_parse] (0x1000): Parsing an A reply (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [request_watch_destructor] (0x0400): Deleting request watch (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [set_server_common_status] (0x0100): Marking server 'ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' as 'name resolved' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [be_resolve_server_process] (0x0200): Found address for server ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com: [10.0.146.167] TTL 25948 (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_uri_callback] (0x0400): Constructed uri 'ldaps://ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com' (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [22] for the connection. (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_ldap_connect_callback_add] (0x1000): New LDAP connection to [ldaps://ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com:636/??base] with fd [22]. (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_rootdse_send] (0x4000): Getting rootdse (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_print_server] (0x2000): Searching 10.0.146.167:636 (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x0400): calling ldap_search_ext with [(objectclass=*)][]. (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [*] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [altServer] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [namingContexts] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [supportedControl] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [supportedExtension] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [supportedFeatures] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [supportedLDAPVersion] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [supportedSASLMechanisms] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [domainControllerFunctionality] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [defaultNamingContext] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [lastUSN] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x1000): Requesting attrs: [highestCommittedUSN] (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_get_generic_ext_step] (0x2000): ldap_search_ext called, msgid = 1 (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_op_add] (0x2000): New operation 1 timeout 6 (Fri Aug 23 05:28:06 2019) [sssd[be[example1]]] [sdap_process_result] (0x2000): Trace: sh[0x5572b5abda90], connected[1 </debug logs> 2. Verify ssh logins are working Login as another user (user which is not in cache yet): [root@ci-vm-10-0-145-194 sssd]# ssh foo4@example1@localhost foo4@example1@localhost's password: Activate the web console with: systemctl enable --now cockpit.socket Could not chdir to home directory : No such file or directory [foo4@example1@ci-vm-10-0-145-194 /]$ Login was successfull 3. Start the first Directory server and stop the second directory server and verify login are successfull (Stop ci-vm-10-0-146-167.hosted.upshift.rdu2.redhat.com). login with another user who is not in cache. (foo5@example1) [root@ci-vm-10-0-145-194 sssd]# ssh foo5@example1@localhost foo5@example1@localhost's password: Activate the web console with: systemctl enable --now cockpit.socket Last login: Fri Aug 23 04:41:09 2019 from ::1 Could not chdir to home directory : No such file or directory [foo5@example1@ci-vm-10-0-145-194 /]$ pwd / [foo5@example1@ci-vm-10-0-145-194 /]$ id uid=14583105(foo5@example1) gid=14564100(ldapusers@example1) groups=14564100(ldapusers@example1) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [foo5@example1@ci-vm-10-0-145-194 /]$ With above results. Marking bug verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:3651 |