Description of problem: Every know and then sssd starts failing authentication attempts and I can see this in the log: > Dec 19 13:42:46 ossman.lkpg.cendio.se sssd[be[CENDIO]][11413]: Could not start TLS encryption. unknown error After a short while it starts working again and I can authenticate. But it happens often enough that it is a major nuisance. Especially since the reported error to the user is the same as if the incorrect password was entered. Version-Release number of selected component (if applicable): sssd-2.2.2-3.fc30.x86_64 How reproducible: Not know what triggers it at this point.
Hi, do you see any messages on the LDAP server side around the time this happens on the client? bye, Sumit
No, I'm afraid not. slapd doesn't seem to log much at all. The only thing remotely special here is that we have our own CA setup for certificates. But that CA is registered on the Fedora machine so it is normally trusted. Any way to turn that "unknown error" to something helpful?
For reference, it took about 30 seconds until authentication succeeded: > Dec 19 13:43:18 ossman.lkpg.cendio.se gdm-password][8618]: pam_sss(gdm-password:auth): authentication success; logname= uid=0 euid=0 tty=/dev/tty2 ruser= rhost= user=ossman Some retry timeout that needs to be made more aggressive?
(In reply to Pierre Ossman from comment #2) > No, I'm afraid not. slapd doesn't seem to log much at all. > > The only thing remotely special here is that we have our own CA setup for > certificates. But that CA is registered on the Fedora machine so it is > normally trusted. Hi, how do you make SSSD aware of the CA certificates for LDAP access, are you setting ldap_tls_cacertdir or ldap_tls_cacert in sssd.conf or do you rely in the settings in /etc/openldap/ldap.conf? Are the CA certificates stored in a directory or as a CA bundle in a PEM file? bye, Sumit > > Any way to turn that "unknown error" to something helpful?
Neither. They CA is put in /etc/pki/ca-trust/source/anchors/ and /usr/bin/update-ca-trust is executed. It is a single PEM file with a single certificate in it.
Another oddity is that the LDAP server is rather old. It's running RHEL 5. So old versions of openldap and openssl.
I'm facing the same problem. I'm not 100% sure yet, but it looks like sssd (or better openssl) is trying to use TLSv1.3. sometimes. You mentioned RHEL 5 servers running the ldap part. They will not support that protocol. I'm currently playing around with the following added to the sssd.conf (domain part): ldap_tls_cipher_suite = HIGH:+TLSv1.2:-TLSv1.3 This should use HIGH ciphers only based on TLSv1.2 and should disable TLSv1.3. The ldap part is hosted on some RHEL7 systems in my setup, so TLSv1.2 is supported, TLSv1.3 is not. So far, the problem did not come back yet, but more monitoring needed...
One of my customers is experiencing similar issues [1]. The only way to reproduce it, is to wait for it to happen. I have build a simple testenvironment [2] and ran ssh (password) logins with 100 different users randomly - no issues with >800k ssh logins. Also taking one of the RHDS systems offline, sssd will use the other one [3]. The "switch/failover" is instant and doesn't impact the logins - no warnings/errors shown in the logs. What I wasn't able to test yet, is a loadbalancer/floating-ip to connect to the RHDS systems. @Florian: did you see any issues, after excluding TLSv1.3? @Pierre, Florian: are there any loadbalances/floating-ips used, to connect to rhe LDAP servers? Thanks, Rainer References: 1) RHEL 7.7 with sssd.conf, multi-master RHDS 10.4, floating-ip/loadbalancer 2) 2x RHEL 7.latest with sssd.conf, 2x multi-master RHDS 10.latest, custom rootCA and certs 3) ldap_uri = ldaps://rhds10-1.example.com/,ldaps://rhds10-2.example.com/
No, we have an extremely simple setup with a single LDAP server that clients connect to directly. We have a dedicated IP on the machine for this service, but it is statically assigned.
The situation was much better after disabling TLSv1.3, but the problem came back from time to time. To fully solve this problem I changed my ldap_uri from ldaps://hostname to ldap://hostname and enabled STARTTLS by setting ldap_id_use_start_tls to true. (both parameters are part of the sssd configuration) The ldap servers have a floating ip managed by pacemaker. But overwriting the hostname to one fixed IP address did not solve the problem. I played around with IPv4 and IPv6 only, but the problem was always the same. I'm using a certificate signed by the letsencrypt CA.
FYI, this seemed to disappear after an upgrade to Fedora 31. However I unfortunately got it today after almost two weeks of peace and calm : > Mar 09 08:51:26 ossman.lkpg.cendio.se sssd[be[CENDIO]][1231]: Could not start TLS encryption. unknown error > Mar 09 08:52:02 ossman.lkpg.cendio.se sssd[be[CENDIO]][1231]: Backend is online
plus one. Arch is ppc64le partition, F30 on IBM Power 8 server. Same message from SSSD: sssd[be[ldap]][1865]: Could not start TLS encryption. unknown error Random occurences from once or twice a day to multiple times a day. Appears to clear itself up for a while between occurences. So far, it's just a major annoyance to the users connecting for mail (ldap used for authentication, but not ID) In addition, the target is IBM i OS at V7R3 with Tivoli Directory Server... the messages in the job log show: GLD0113 - 410... message format incorrect GLD015C - records the client IP, that of the mail server in this case GLD0154 - close of the connection But as I said, it seems to self correct after a period of anywhere from 45 seconds to a few minutes and the client's connection completes and authentication occurs. Users connecting with Outlook get a prompt for password and have to press enter to clear it and connect, so... annoying. Cert in this case is signed by DigiCert.
This message is a reminder that Fedora 30 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '30'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 30 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Hi, it might be possible that the issue is triggered by an unexpected interaction between SSSD's watchdog implementation and libldap. By default SSSD's watchdog sends a signal every 10s and it looks like libldap treats at least in some places a read() interrupted by a signal as error. As a workaround you can try and set 'timeout = 20' in the [domain/...] section of sssd.conf which tells the watchdog to send the signal only every 20s which should make the issue happen less often. Please be careful with increasing 'timeout' because the longer the timeout the longer the time SSSD will detect a deadlock in the process and will try to restart it. bye, Sumit
Change to timeout = 20 reduced the occurrences in a day by half. Changed to 30 didn't have a significant impact, reduced by only a few occurrences more per day. This is on F31.
This package has changed maintainer in the Fedora. Reassigning to the new maintainer of this component.
Do we know if there is on openldap issue related to this?
This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 31 changed to end-of-life (EOL) status on 2020-11-24. Fedora 31 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.