1785229 – Auth fails with "Could not start TLS encryption. unknown error"

Bug 1785229 - Auth fails with "Could not start TLS encryption. unknown error"

Summary: Auth fails with "Could not start TLS encryption. unknown error"

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	sssd
Sub Component:
Version:	31
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	sssd-maintainers
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-19 12:48 UTC by Pierre Ossman
Modified:	2024-12-20 18:57 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-11-24 18:45:29 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Pierre Ossman 2019-12-19 12:48:14 UTC

Description of problem:

Every know and then sssd starts failing authentication attempts and I can see this in the log:

> Dec 19 13:42:46 ossman.lkpg.cendio.se sssd[be[CENDIO]][11413]: Could not start TLS encryption. unknown error

After a short while it starts working again and I can authenticate. But it happens often enough that it is a major nuisance. Especially since the reported error to the user is the same as if the incorrect password was entered.


Version-Release number of selected component (if applicable):

sssd-2.2.2-3.fc30.x86_64


How reproducible:

Not know what triggers it at this point.

Comment 1 Sumit Bose 2019-12-19 13:08:19 UTC

Hi,

do you see any messages on the LDAP server side around the time this happens on the client?

bye,
Sumit

Comment 2 Pierre Ossman 2019-12-19 13:12:19 UTC

No, I'm afraid not. slapd doesn't seem to log much at all.

The only thing remotely special here is that we have our own CA setup for certificates. But that CA is registered on the Fedora machine so it is normally trusted.

Any way to turn that "unknown error" to something helpful?

Comment 3 Pierre Ossman 2019-12-19 13:13:34 UTC

For reference, it took about 30 seconds until authentication succeeded:

> Dec 19 13:43:18 ossman.lkpg.cendio.se gdm-password][8618]: pam_sss(gdm-password:auth): authentication success; logname= uid=0 euid=0 tty=/dev/tty2 ruser= rhost= user=ossman

Some retry timeout that needs to be made more aggressive?

Comment 6 Sumit Bose 2020-01-10 10:11:32 UTC

(In reply to Pierre Ossman from comment #2)
> No, I'm afraid not. slapd doesn't seem to log much at all.
> 
> The only thing remotely special here is that we have our own CA setup for
> certificates. But that CA is registered on the Fedora machine so it is
> normally trusted.

Hi,

how do you make SSSD aware of the CA certificates for LDAP access, are you setting ldap_tls_cacertdir or ldap_tls_cacert in sssd.conf or do you rely in the settings in /etc/openldap/ldap.conf? Are the CA certificates stored in a directory or as a CA bundle in a PEM file?

bye,
Sumit

> 
> Any way to turn that "unknown error" to something helpful?

Comment 8 Pierre Ossman 2020-01-10 11:36:04 UTC

Neither. They CA is put in /etc/pki/ca-trust/source/anchors/ and /usr/bin/update-ca-trust is executed.

It is a single PEM file with a single certificate in it.

Comment 9 Pierre Ossman 2020-01-10 11:37:45 UTC

Another oddity is that the LDAP server is rather old. It's running RHEL 5. So old versions of openldap and openssl.

Comment 10 Florian Bezdeka 2020-01-22 13:57:22 UTC

I'm facing the same problem. I'm not 100% sure yet, but it looks like sssd (or better openssl) is trying to use TLSv1.3. sometimes.
You mentioned RHEL 5 servers running the ldap part. They will not support that protocol.

I'm currently playing around with the following added to the sssd.conf (domain part):

ldap_tls_cipher_suite = HIGH:+TLSv1.2:-TLSv1.3

This should use HIGH ciphers only based on TLSv1.2 and should disable TLSv1.3.
The ldap part is hosted on some RHEL7 systems in my setup, so TLSv1.2 is supported, TLSv1.3 is not.

So far, the problem did not come back yet, but more monitoring needed...

Comment 11 Rainer Beyel 2020-02-19 08:25:47 UTC

One of my customers is experiencing similar issues [1]. The only way to reproduce it, is to wait for it to happen.

I have build a simple testenvironment [2] and ran ssh (password) logins with 100 different users randomly - no issues with >800k ssh logins. Also taking one of the RHDS systems offline, sssd will use the other one [3]. The "switch/failover" is instant and doesn't impact the logins - no warnings/errors shown in the logs.

What I wasn't able to test yet, is a loadbalancer/floating-ip to connect to the RHDS systems.

@Florian: did you see any issues, after excluding TLSv1.3?
@Pierre, Florian: are there any loadbalances/floating-ips used, to connect to rhe LDAP servers?

Thanks, Rainer

References:
  1) RHEL 7.7 with sssd.conf, multi-master RHDS 10.4, floating-ip/loadbalancer
  2) 2x RHEL 7.latest with sssd.conf, 2x multi-master RHDS 10.latest, custom rootCA and certs
  3) ldap_uri = ldaps://rhds10-1.example.com/,ldaps://rhds10-2.example.com/

Comment 12 Pierre Ossman 2020-02-19 11:23:19 UTC

No, we have an extremely simple setup with a single LDAP server that clients connect to directly. We have a dedicated IP on the machine for this service, but it is statically assigned.

Comment 13 Florian Bezdeka 2020-02-19 14:04:58 UTC

The situation was much better after disabling TLSv1.3, but the problem came back from time to time.

To fully solve this problem I changed my ldap_uri from ldaps://hostname to ldap://hostname and enabled STARTTLS by setting ldap_id_use_start_tls to true.
(both parameters are part of the sssd configuration)

The ldap servers have a floating ip managed by pacemaker.
But overwriting the hostname to one fixed IP address did not solve the problem.
I played around with IPv4 and IPv6 only, but the problem was always the same.

I'm using a certificate signed by the letsencrypt CA.

Comment 14 Pierre Ossman 2020-03-09 07:54:54 UTC

FYI, this seemed to disappear after an upgrade to Fedora 31. However I unfortunately got it today after almost two weeks of peace and calm
:

> Mar 09 08:51:26 ossman.lkpg.cendio.se sssd[be[CENDIO]][1231]: Could not start TLS encryption. unknown error
> Mar 09 08:52:02 ossman.lkpg.cendio.se sssd[be[CENDIO]][1231]: Backend is online

Comment 15 R Bruce Hoffman 2020-04-14 18:49:08 UTC

plus one.

Arch is ppc64le partition, F30 on IBM Power 8 server.
Same message from SSSD:  sssd[be[ldap]][1865]: Could not start TLS encryption. unknown error

Random occurences from once or twice a day to multiple times a day.  Appears to clear itself up for a while between occurences.  So far, it's just a major annoyance to the users connecting for mail (ldap used for authentication, but not ID)

In addition, the target is IBM i OS at V7R3 with Tivoli Directory Server... the messages in the job log show:
GLD0113 - 410... message format incorrect
GLD015C - records the client IP, that of the mail server in this case
GLD0154 - close of the connection

But as I said, it seems to self correct after a period of anywhere from 45 seconds to a few minutes and the client's connection completes and authentication occurs.  Users connecting with Outlook get a prompt for password and have to press enter to clear it and connect, so... annoying.

Cert in this case is signed by DigiCert.

Comment 16 Ben Cotton 2020-04-30 20:14:05 UTC

This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Sumit Bose 2020-06-05 09:03:24 UTC

Hi,

it might be possible that the issue is triggered by an unexpected interaction between SSSD's watchdog implementation and libldap. By default SSSD's watchdog sends a signal every 10s and it looks like libldap treats at least in some places a read() interrupted by a signal as error.

As a workaround you can try and set 'timeout = 20' in the [domain/...] section of sssd.conf which tells the watchdog to send the signal only every 20s which should make the issue happen less often. Please be careful with increasing 'timeout' because the longer the timeout the longer the time SSSD will detect a deadlock in the process and will try to restart it.

bye,
Sumit

Comment 18 R Bruce Hoffman 2020-06-11 11:53:38 UTC

Change to timeout = 20 reduced the occurrences in a day by half.  Changed to 30 didn't have a significant impact, reduced by only a few occurrences more per day.  This is on F31.

Comment 19 Fedora Admin user for bugzilla script actions 2020-06-18 14:59:28 UTC

This package has changed maintainer in the Fedora.
Reassigning to the new maintainer of this component.

Comment 20 VG 2020-08-19 16:28:48 UTC

Do we know if there is on openldap issue related to this?

Comment 21 Ben Cotton 2020-11-03 16:01:40 UTC

This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 22 Ben Cotton 2020-11-24 18:45:29 UTC

Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.