Bug 1839972
| Summary: | Authentication handshake (ldap_install_tls()) fails due to underlying openssl operation failing with EINTR [rhel-7.9.z] | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Steffen Froemer <sfroemer> | ||||||
| Component: | sssd | Assignee: | Iker Pedrosa <ipedrosa> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Madhuri <mupadhye> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 7.8 | CC: | abokovoy, abroy, asakure, atikhono, chorn, ddas, dhellard, dlavu, fweimer, gopalv, grajaiya, ipedrosa, jhrozek, jreznik, lslebodn, mescanfe, mfalz, mreynolds, mzidek, ndavids, onatalen, pbrezina, pkis, pvlasin, rmeggins, sbose, spichugi, sssd-maint, tbordaz, thalman, tmihinto, tscherf, vashirov, yoguma | ||||||
| Target Milestone: | rc | Keywords: | Regression, Reproducer, Triaged, ZStream | ||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | sync-to-jira review | ||||||||
| Fixed In Version: | sssd-1.16.5-10.el7_9.8 | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-06-08 22:34:10 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1900370 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1692753 [details]
SSSD test build with ldap_library_debug_level
Hi,
please find attached a tar ball with a test build of SSSD with adds a new option ldap_library_debug_level.
Please un-tar it, change into the new directory and call
yum update *
which should update all installed SSSD sub-packages, but not install additional ones.
Then add
ldap_library_debug_level = -1
to the [domain/...] section of sssd.conf and restart SSSD. Now SSSD's domain log file should contain line with '[sss_ldap_debug] (0x4000): libldap:' folowing the OpenLDAP debug messages.
bye,
Sumit
https://github.com/SSSD/sssd/issues/5282 seems to be the same issue. Setting the Customer Escalation flag=Yes as result of internal escalation, EN-35411. Upstream ticket: https://github.com/SSSD/sssd/issues/5531 Created attachment 1763588 [details]
strace for battery of tests
Pushed PR: https://github.com/SSSD/sssd/pull/5532 * `master` * da55e3e69707de416b7949d08c165c950090bbb6 - ldap: retry ldap_install_tls() when watchdog interruption Backport to 1.16 branch - https://github.com/SSSD/sssd/pull/5595 Pushed PR: https://github.com/SSSD/sssd/pull/5595 * `sssd-1-16` * ee16c609497f29731c5a590821d27d0db0ffc91f - ldap: retry ldap_install_tls() when watchdog interruption Reproduce with:
[root@ci-vm-10-0-105-43 ~]# rpm -qa sssd
sssd-1.16.5-10.el7_9.7.x86_64
Reproduction steps:
1.
Install and configure ldap server and sssd client
2.
Add users in ldap server
3.
Simulate network latency in the server with "tc qdisc add dev eth0 root netem delay 3s"
[root@ci-vm-10-0-96-123 ~]# tc qdisc add dev eth0 root netem delay 3s
4.
su user from sssd client
[foo3@example1@ci-vm-10-0-96-157 ~]$ time su - foo1@example1
Password:
su: Authentication failure
real 0m7.368s
user 0m0.003s
sys 0m0.005s
[foo3@example1@ci-vm-10-0-96-157 ~]$ time su - foo1@example1
Password:
su: Authentication failure
real 0m16.020s
user 0m0.004s
sys 0m0.005s
[foo3@example1@ci-vm-10-0-96-157 ~]$ time su - foo1@example1
Password:
su: Authentication failure
real 0m11.340s
user 0m0.007s
sys 0m0.002s
[foo3@example1@ci-vm-10-0-96-157 ~]$ time su - foo2@example1
Password:
su: Authentication failure
real 0m16.817s
user 0m0.002s
sys 0m0.006s
[foo3@example1@ci-vm-10-0-96-157 ~]$ time su - foo2@example1
Password:
su: Authentication failure
real 0m7.305s
user 0m0.002s
sys 0m0.007s
with unpatch version, su failed after several seconds
5.
upgrade sssd to patch version,
[root@ci-vm-10-0-96-157 ~]# rpm -qa sssd
sssd-1.16.5-10.el7_9.8.x86_64
6.
with latency in server user is not able to login
[root@ci-vm-10-0-96-157 ~]# su - foo1@example1
su: user foo1@example1 does not exist
7.
Remove the domain log to get updated logs
[root@ci-vm-10-0-96-157 ~]# > /var/log/sssd/sssd_example1.log
8.a.
on Terminal first, check the su (on the client),
[root@ci-vm-10-0-96-157 ~]# su - foo3@example1
9.
on terminal second (on the client),
Grep the log messages 'Assuming TLS handshake was interrupted'
[root@ci-vm-10-0-96-157 ~]# grep 'Assuming TLS handshake was interrupted' /var/log/sssd/sssd_example1.log
[root@ci-vm-10-0-96-157 ~]# grep 'Assuming TLS handshake was interrupted' /var/log/sssd/sssd_example1.log
[root@ci-vm-10-0-96-157 ~]# grep 'Assuming TLS handshake was interrupted' /var/log/sssd/sssd_example1.log
(2021-05-20 6:59:28): [be[example1]] [sss_ldap_init_sys_connect_done] (0x0020): Assuming TLS handshake was interrupted
10.
on terminal third (on the server),
as soon as grep returns the log line remove the network latency on the server
[root@ci-vm-10-0-96-123 ~]# sudo tc qdisc del dev eth0 root
8.b.
on the terminal first, the started su should able to login, as we remove the latency on the server
after getting the required log message in the sssd log.
[root@ci-vm-10-0-96-157 ~]# su - foo3@example1
Last login: Thu May 20 06:58:35 EDT 2021 on pts/0
[foo3@example1@ci-vm-10-0-96-157 ~]$
Thus from above marking, this verified.
So, I have a system where I see this error - I had opened a sssd issue as well systemctl status sssd (output) May 26 17:09:59 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:10:49 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:29 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:39 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:49 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error Questions: 1. I don't see this release being publicly available. yum update sssd with the specific version does not work either. 2. What will I see after I update to the sssd-1.16.5-10.el7_9.8.x86_64 version, will the above warnings go away and there will be a retry? 3. Is there any other setting in sssd.conf required to trigger the code path in the above fix? So, I have a system where I see this error - I had opened a sssd issue as well systemctl status sssd (output) May 26 17:09:59 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:10:49 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:29 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:39 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error May 26 17:16:49 dashmpp-head-0 sssd[be[adldap]][3181121]: Could not start TLS encryption. unknown error Questions: 1. I don't see this release being publicly available. yum update sssd with the specific version does not work either. 2. What will I see after I update to the sssd-1.16.5-10.el7_9.8.x86_64 version, will the above warnings go away and there will be a retry? 3. Is there any other setting in sssd.conf required to trigger the code path in the above fix? Hi, (In reply to VG from comment #181) > > 1. I don't see this release being publicly available. yum update sssd with > the specific version does not work either. It's tentatively planned for a release in June. > 2. What will I see after I update to the sssd-1.16.5-10.el7_9.8.x86_64 > version, will the above warnings go away and there will be a retry? There will be one retry in case TLS handshake was interrupted by an internal WD. > 3. Is there any other setting in sssd.conf required to trigger the code path > in the above fix? No additional sssd.conf setting required. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (sssd bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2319 |
Description of problem: Customer seeing SSL Errors between RHEL-7.8 SSSD and IBM Directory Server (GLD1117E) Customer running RHEL-7.8, which are using SSSD to authenticate users against LDAP, running IBM Directory Server. error message shown on LDAP server 200505 18:46:12.069610 GLD1117E Unable to read SSL data from 1.8.14.218: 410 - SSL message format is incorrect. After the SSL handshake the LDAP Server is receiving 5 bytes expected to be the TLS1.2 Record Application Data starting with x'170303' but instead receives 5 bytes x'30050201 01'. It has been identified that the x30 in the first byte indicates an unencrypted LDAP requests with a length of x05. Version-Release number of selected component (if applicable): sssd-1.16.4-37.el7_8.1.x86_64 sssd-ldap-1.16.4-37.el7_8.1.x86_64 How reproducible: On customer system, it's reproducible by using some expect-script --- spawn su - testuser expect "ssword:" send "secret\r" expect "$ " send "exit\r" --- Steps to Reproduce: 1. login into system 2. run expect-script from above multiple times Actual results: Authentication sometimes fail Expected results: Authentication should work Additional info: In sssd debug logs, following error is shown. 1 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [be_resolve_server_process] (0x0200): Found address for server LDAP.example.com: [1.7.140.12] TTL 5501 2 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_uri_callback] (0x0400): Constructed uri 'ldaps://LDAP.example.com/' 3 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [decide_tls_usage] (0x2000): [ldaps://LDAP.example.com/] is a secure channel. No need to run START_TLS 4 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sssd_async_socket_init_send] (0x4000): Using file descriptor [23] for the connection. 5 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting 6 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: [Connect error] [unknown error] 7 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_state_destructor] (0x0400): calling ldap_unbind_ext for ldap:[0x55ffa8c91080] sd:[23] 8 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sss_ldap_init_state_destructor] (0x0400): closing socket [23] 9 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [5]: Input/output error. 10 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [sdap_handle_release] (0x2000): Trace: sh[0x55ffa8c921a0], connected[0], ops[(nil)], ldap[(nil)], destructor_lock[0], release_memory[0] 11 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: src/providers/ldap/sdap_async_connection.c: sdap_cli_connect_done: 1605 12 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [fo_set_port_status] (0x0100): Marking port 636 of server 'LDAP.example.com' as 'not working' 13 (Fri May 22 07:50:42 2020) [sssd[be[LDAP]]] [fo_set_port_status] (0x0400): Marking port 636 of duplicate server 'LDAP.example.com' as 'not working' Could this be related to the "FIXME" note in code [1] /* FIXME: take care that ldap_install_tls might block */ ret = ldap_install_tls(state->sh->ldap); if (ret != LDAP_SUCCESS) { [1]: https://github.com/SSSD/sssd/blob/d8d743870c459b5ff283c89d78b70d1684bd19a9/src/providers/ldap/sdap_async_connection.c#L403