Bug 1910169

Summary: sssd-kcm causing periodic auth failures after 2.4.0-2
Product: [Fedora] Fedora Reporter: Rob Foehl <rwf>
Component: sssdAssignee: Pavel Březina <pbrezina>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 33CC: abokovoy, atikhono, jhrozek, johannespfau, lslebodn, mzidek, pbrezina, rharwood, sbose, ssorce, sssd-maintainers
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-05 13:13:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rob Foehl 2020-12-22 23:46:55 UTC
sssd-kcm is causing periodic auth failures on builds newer than 2.4.0-2, presumably due to an issue introduced with the (surprising number of) patches applied in 2.4.0-3.  (Incidentally, CPU usage by the process has gotten substantially worse in these versions, as well.)

This manifests as all plaintext auth failing on Fedora 33 systems running sssd 2.4.0-4, joined to a FreeIPA domain and authenticating domain users, which will emit errors like the following examples for the lifetime of the sssd-kcm process that incurred the problem:

Dec 22 17:59:11 fedora33 systemd[1]: Starting SSSD Kerberos Cache Manager...
Dec 22 17:59:11 fedora33 systemd[1]: Started SSSD Kerberos Cache Manager.
Dec 22 17:59:11 fedora33 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=sssd-kcm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 22 17:59:11 fedora33 kcm[10556]: Starting up
Dec 22 17:59:11 fedora33 krb5_child[10554][10554]: Generic error (see e-text)
Dec 22 17:59:11 fedora33 krb5_child[10554][10554]: Generic error (see e-text)
Dec 22 17:59:11 fedora33 auth[10552]: pam_sss(dovecot:auth): authentication failure; logname= uid=97 euid=97 tty=dovecot ruser=rob rhost=2001:db8::1 user=rob
Dec 22 17:59:11 fedora33 auth[10552]: pam_sss(dovecot:auth): received for user rob: 4 (System error)
Dec 22 17:59:19 fedora33 krb5_child[10559][10559]: Generic error (see e-text)
Dec 22 17:59:19 fedora33 krb5_child[10559][10559]: Generic error (see e-text)
Dec 22 17:59:19 fedora33 auth[10552]: pam_sss(dovecot:auth): authentication failure; logname= uid=97 euid=97 tty=dovecot ruser=rob rhost=2001:db8::1 user=rob
Dec 22 17:59:19 fedora33 auth[10552]: pam_sss(dovecot:auth): received for user rob: 4 (System error)
Dec 22 17:59:29 fedora33 krb5_child[10562][10562]: Generic error (see e-text)
Dec 22 17:59:29 fedora33 krb5_child[10562][10562]: Generic error (see e-text)
Dec 22 17:59:29 fedora33 auth[10552]: pam_sss(dovecot:auth): authentication failure; logname= uid=97 euid=97 tty=dovecot ruser=rob rhost=2001:db8::1 user=rob
Dec 22 17:59:29 fedora33 auth[10552]: pam_sss(dovecot:auth): received for user rob: 4 (System error)


Nothing else is logged anywhere.  Also affected are sudo and the like, but not any authentication using Kerberos tickets.  Relatively short-lived workarounds include waiting it out (sssd-kcm process exits, next auth attempt spawns a new one and succeeds) or rebooting the host.  klist as affected users during these events reports no KCM cache for the respective UID available, and kinit also usually fails with the same 'Generic error' text until the sssd-kcm process goes away.  Downgrading sssd* packages to 2.4.0-2 restores correct behavior.

Comment 1 Rob Foehl 2020-12-23 08:11:05 UTC
This appears to be due to a crashed LDAP instance on one of the FreeIPA servers, which at least would explain the intermittent issues; it doesn't account for why only the Fedora 33 clients with current 2.4.0-4 packages were having trouble, though.  I'll see whether I can reproduce on demand tomorrow.

Comment 2 Pavel Březina 2021-02-01 12:16:49 UTC
Hi Rob, are you still experiencing the issue?

Comment 3 Pavel Březina 2021-03-05 13:13:41 UTC
I'm closing this for inactivity. Please reopen the ticket if the issue still persist.