Bug 1920500

Summary: Authentication handshake (ldap_install_tls()) fails due to underlying openssl operation failing with EINTR
Product: Red Hat Enterprise Linux 8 Reporter: Alexey Tikhonov <atikhono>
Component: sssdAssignee: Iker Pedrosa <ipedrosa>
Status: CLOSED ERRATA QA Contact: Madhuri <mupadhye>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.4CC: abroy, chorn, dlavu, grajaiya, ipedrosa, jhrozek, lslebodn, mzidek, pbrezina, sgoveas, tscherf, yoguma
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: sssd-2.5.0-1.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:47:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexey Tikhonov 2021-01-26 12:24:41 UTC
This bug was initially created as a copy of Bug #1839972

I am copying this bug because: to track fix for RHEL8



Description of problem:
Customer seeing SSL Errors between RHEL-7.8 SSSD and IBM Directory Server (GLD1117E)
Customer running RHEL-7.8, which are using SSSD to authenticate users against LDAP, running IBM Directory Server.

error message shown on LDAP server
200505 18:46:12.069610 GLD1117E Unable to read SSL data from 1.8.14.218: 410 - SSL message format is incorrect.

After the SSL handshake the LDAP Server is receiving 5 bytes expected to be the TLS1.2 Record Application Data starting with x'170303' but instead receives 5 bytes x'30050201 01'.

It has been identified that the x30 in the first byte indicates an unencrypted LDAP requests with a length of x05. 

Version-Release number of selected component (if applicable):
sssd-1.16.4-37.el7_8.1.x86_64
sssd-ldap-1.16.4-37.el7_8.1.x86_64

How reproducible:
On customer system, it's reproducible by using some expect-script

---
spawn su - testuser
expect "ssword:"
send "secret\r"
expect "$ "
send "exit\r"
---

Steps to Reproduce:
1. login into system
2. run expect-script from above multiple times

Actual results:
Authentication sometimes fail

Expected results:
Authentication should work

Additional info:

In sssd debug logs, following error is shown.

1 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0200): Found address for server LDAP.example.com: [1.7.140.12] TTL 5501
2 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_uri_callback] (0x0400): Constructed uri 'ldaps://LDAP.example.com/'
3 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [decide_tls_usage] (0x2000): [ldaps://LDAP.example.com/] is a secure channel. No need to run START_TLS
4 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [23] for the connection.
5 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting
6 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: [Connect error] [unknown error]
7 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_state_destructor] (0x0400): calling ldap_unbind_ext for ldap:[0x55ffa8c91080] sd:[23]
8 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_state_destructor] (0x0400): closing socket [23]
9 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [5]: Input/output error.
10 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_handle_release] (0x2000): Trace: sh[0x55ffa8c921a0], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0]
11 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: src/providers/ldap/sdap_async_connection.c: sdap_cli_connect_done: 1605
12 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [fo_set_port_status] (0x0100): Marking port 636 of server 'LDAP.example.com' as 'not working'
13 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [fo_set_port_status] (0x0400): Marking port 636 of duplicate server 'LDAP.example.com' as 'not working'

Comment 1 Alexey Tikhonov 2021-01-26 12:36:06 UTC
Very short summary of bz 1839972:

 - read(), interrupted by a WD signal, makes ldap_install_tls() to fail:
```
tls_read: want=5 error=Interrupted system call
TLS trace: SSL_connect:error in SSLv3/TLS write client hello
TLS: can't connect: .
...
 => ldap_install_tls failed: [Connect error] [unknown error]
```

 - re-try of failed() doesn't work (connection state either on client or on server side (or both) doesn't allow for this)

 - issue is reproducible on RHEL8


As a fix we should close connection and try to re-connect but:
     - only in case of this error / if WD signal was sent during operation (as an "evidence")
     - only once
     - make sure it is the same server (not next in the failover list)

Comment 3 Alexey Tikhonov 2021-04-20 09:57:35 UTC
Pushed PR: https://github.com/SSSD/sssd/pull/5532

* `master`
    * da55e3e69707de416b7949d08c165c950090bbb6 - ldap: retry ldap_install_tls() when watchdog interruption

Comment 9 Steeve Goveas 2021-06-14 09:41:57 UTC
On Server
[root@vm-idm-023 ~]# tc qdisc add dev ens2 root netem delay 3s


On Client
[root@auto-hv-01-guest01 ~]# rpm -q sssd
sssd-2.5.0-1.el8.x86_64

[root@auto-hv-01-guest01 offline]# systemctl stop sssd; rm -rf /var/lib/sss/db/*; rm -rf /var/log/sssd/*; systemctl start sssd
[root@auto-hv-01-guest01 offline]# date; su - foo1@LDAP
Mon Jun 14 05:37:40 EDT 2021
su: user foo1@LDAP does not exist

[root@auto-hv-01-guest01 ~]# tail -f /var/log/sssd/*.log | grep -i 'Assuming TLS handshake was interrupted'
(2021-06-14  5:37:42): [be[LDAP]] [sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted

On server
[root@vm-idm-023 ~]# tc qdisc del dev ens2 root

On client
[root@auto-hv-01-guest01 offline]# date; su - foo1@LDAP
Mon Jun 14 05:41:24 EDT 2021
Last login: Mon Jun 14 04:31:27 EDT 2021 on pts/1
su: warning: cannot change directory to /home/foo1: No such file or directory
/usr/bin/id: cannot find name for group ID 10000
[foo1@auto-hv-01-guest01 offline]$ whoami
foo1
[foo1@auto-hv-01-guest01 offline]

Comment 11 errata-xmlrpc 2021-11-09 19:47:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sssd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4435

Comment 12 Alexey Tikhonov 2022-02-01 17:31:03 UTC
*** Bug 2049178 has been marked as a duplicate of this bug. ***