Description of problem:
sssd.service should depend on time-sync.target to make sure the system has the correct time. Otherwise krb5 and probably other authentication mechanisms won't work.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Configure sssd to use krb5 authentication
2. Make sure your hwclock is off my an hour or so
3. Configure ntpd/ntpdate or chronyd/chrony-wait
5. Try to log in
/lib/systemd/system/sssd.service needs to be changed to read:
This is a bit of a chicken and egg problem. time-sync.target (IIRC) is dependent on the network target. SSSD has to be available to handle requests before the network comes up. SSSD is capable of handling this because it has built-in caching (unlike nss_ldap which would in many cases hang for minutes during startup).
What you should be dealing with here is a race-condition. If you try to log in during the period where the network is up, but ntp is not yet started, then SSSD will contact the KDC and get an error about clock skew. But once ntp adjusts the time, logins should work correctly. This race-condition should be *very* small.
If you're seeing this always fail, then there's a bug in NTP/chrony, not SSSD.
Stephen, what you say is right, but then we need to patch sssd to use offline authentication if we get a clock skew error. Failing authentication in this case is not the expected outcome.
I think we should open a ticket in sssd's trac if we fail authentication on clock skew.
chronyd-wait.service does work, i.e. the time is corrected within a couple of seconds. The system's time is correct when I try to login, but authentication will fail.
As soon as I restart sssd, authentication works.
Now tell me where this is a chrony bug and not a SSSD bug.
If the system time is correct, but SSSD fails then something else is going on here.
Are you using cached credentials? Also, are you using GSSAPI to connect to the LDAP provider (or using the IPA provider?)
What shows up in /var/log/secure when this happens?
We don't cache credentials as the user homes are on a CIFS share so login in offline state is useless in this case anyway.
Yes, we are using GSSAPI to connect to the ldap provider.
I'll attach the secure log, the syslog (grep'ed for sss and chrony) and the output of two 'systemctl status sssd.service'.
I did the following:
2) login as (remote user) sandroma -> fail (3 times, just to make sure)
3) login as (local user) root
4) systemctl status sssd.service
5) systemctl restart sssd.service
6) login as sandroma -> success
7) systemctl status sssd.service
If you put the information of those logs into one timeline, you'll clearly see chrony has synced the time well ahead oh the user trying to log in. Oh, in case it matters somehow: I logged in on a tty, even though gdm was started.
Created attachment 535807 [details]
secure log of what's described in comment #6
Created attachment 535808 [details]
syslog of what's described in comment #6
Created attachment 535809 [details]
systemctl status sssd.service output as described in comment #6
Sorry for the long delay. It was a holiday in the US.
Sandro, could you set
debug_level = 9
in the [domain/<DOMAINNAME>] section of /etc/sssd/sssd.conf and then reboot to rerun the failing test?
Then please attach /var/log/sssd/sssd_<DOMAINNAME>.log and /var/log/ldap_child.log to this ticket (sanitized if needed).
Also, could you please try one more thing with your test? Please try waiting five minutes after the first failure before attempting a second login. I have a hunch that what may be happening is this:
SSSD comes up before NTP starts.
Something asks SSSD for a lookup before NTP has started.
SSSD tries to connect to LDAP and is denied the GSSAPI bind, so SSSD goes into "offline mode" to answer requests from the cache for two minutes.
The system finishes booting and you try to log in before those two minutes are complete. Since you don't have cached credentials, SSSD has to return failure (since from its perspective, there's no way to validate you until we return to online mode).
I should point out that in this configuration, the cached credentials would still be valuable, as you are technically "online" in a network sense, but "offline" from SSSD's perspective.
Anyway, I want to rule out whether this is a timing issue or somehow we're entering an offline state from which we will never return (which is a serious issue).
Thanks for your help sorting this out.