Created attachment 1485410 [details] sssd kcm and secrets debug Description of problem: I'm seeing from time to time the that kinit fails with KCM ccache. Especially on some veaker virtual test machines. # kinit alice kinit: Matching credential not found while getting default ccache # grep ccache /etc/krb5.conf default_ccache_name = KCM: If changing the cache to other type, kinit works. It looks like there is some garbage in KCM. Version-Release number of selected component (if applicable): krb5-libs-1.15.1-34.el7.x86_64 sssd-kcm-1.16.2-13.el7.x86_64 How reproducible: 50% Steps to Reproduce: 1. Just try to kinit quite a lot with kcm ccache and after some time the cache os secrets seem become corrupted 2. sss_cache oes not seem to help Actual results: (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_db_list_keys] (0x2000): Searching for [(type=simple)] at [cn=ccache,cn=0,cn=persistent,cn=kcm] with scope=subtree (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=8a6d89f6-b031-4b8c-8bf6-9d6378b14319-0:55576,cn=ccache,cn=0,cn=persistent,cn=kcm] is [8a6d89f6-b031-4b8c-8bf6-9d6378b14319-0:55576] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=0d3b06aa-26a7-42a8-93e0-7f21d777078b-0,cn=ccache,cn=0,cn=persistent,cn=kcm] is [0d3b06aa-26a7-42a8-93e0-7f21d777078b-0] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=aa5b16d4-fc50-495c-84b6-0bdbe5e48df9-0:19774,cn=ccache,cn=0,cn=persistent,cn=kcm] is [aa5b16d4-fc50-495c-84b6-0bdbe5e48df9-0:19774] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=d64dc960-d292-4121-b4a5-fa6e4ce1b3dc-0:71239,cn=ccache,cn=0,cn=persistent,cn=kcm] is [d64dc960-d292-4121-b4a5-fa6e4ce1b3dc-0:71239] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=8e02c96d-bcb1-44ce-a33f-7d6e1636a46d-0:25077,cn=ccache,cn=0,cn=persistent,cn=kcm] is [8e02c96d-bcb1-44ce-a33f-7d6e1636a46d-0:25077] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=dfe3753e-0496-47d9-9a29-89615ae3b38e-0:30386,cn=ccache,cn=0,cn=persistent,cn=kcm] is [dfe3753e-0496-47d9-9a29-89615ae3b38e-0:30386] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=0e09e29a-190a-42c7-933b-50bb782e07ef-0:47879,cn=ccache,cn=0,cn=persistent,cn=kcm] is [0e09e29a-190a-42c7-933b-50bb782e07ef-0:47879] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=43823dd4-ff27-4f39-9c82-a8ea8d2bdce4-0:62303,cn=ccache,cn=0,cn=persistent,cn=kcm] is [43823dd4-ff27-4f39-9c82-a8ea8d2bdce4-0:62303] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_dn_to_path] (0x2000): Secrets path for [cn=112ebbd0-5947-41db-851a-38c4ef4b0cc7-0:42104,cn=ccache,cn=0,cn=persistent,cn=kcm] is [112ebbd0-5947-41db-851a-38c4ef4b0cc7-0:42104] (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_db_list_keys] (0x1000): Returning 9 secrets (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [sec_http_reply_with_body] (0x1000): HTTP reply 200: OK (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [local_secret_req] (0x2000): Local secrets request done (Fri Sep 21 08:04:08 2018) [sssd[secrets]] [sec_provider_recv] (0x2000): Request finished Full debug attached. Expected results: kinit just works
There are two issues: 1) That if there is an UUID of a cache stored, but the UUID can't be resolved, KCM fails hard. I think we might want to just fall back to using the first cache as if there was no default rather than failing. But I will also check what does Heimdal do in this case. 2) For whatever reason the actual secrets do not contain a secret with the UUID that is stored in the /defaults record for the user. I'll try to find out why. btw was this a manual test or some automation?
It's an automated tests that exercise kerberos with different credential caches including kcm.
OK, I think I finally know what is going on. There are separate records that hold the actual ccache and a separate record that holds which ccache is the default. Suppose we have: default: ABC ccache: ABC ...contents ccache: XYZ ...contents Now, libkrb5 sends the operation "INITIALIZE" that is supposed to re-initialize a named ccache storage for the default ccache ABC. Because an initialized ccache should not contain any data (credentials), we first check if the ccache exists (it does) and then remove its contents. The removal operation is done in sssd-secrets and there is a timeout for the operation. If the removal takes too long (e.g. the machine is heavily saturated), the remove request in sssd-kcm times out and causes the whole INITIALIZE operation to fail, BUT in sssd-secrets it eventually finishes. So we end up with: default: ABC ccache: XYZ ...contents Next, if someone runs kinit, the first operation that runs is GET_DEFAULT which returns ABC, the next is GET_CCACHE which fails because ABC does not exist. I think there are several things to do: - GET_DEFAULT should fall back if the default entry points nowhere - for the sssd_secrets back end, ideally we should not do a separate delete and create, but update instead. But because the sssd_secrets API is mirroring the Custodia API, there is no PATCH, only POST (=create) and DELETE. So we would end up with two requests either way... To keep the code nice, we should probably still call mod_cc instead of del_cc && create_cc, but I don't know if we can solve this with the current secrets API. - for the secdb back end (default in sssd-2.0), we should wrap every KCM operation that does any writes in a transaction -
btw I don't know how to reproduce outside gdb but here is what I did: kinit user1 kinit user2 gdb $(pidof sssd_secrets), set breakpoint to the delete operation kinit user1 wait in the breakpoint until the tcurl operation in sssd_kcm times out, aborting the INITIALIZE operation klist
Upstream tickets filed: https://pagure.io/SSSD/sssd/issue/3838 https://pagure.io/SSSD/sssd/issue/3839 https://pagure.io/SSSD/sssd/issue/3840 I'll clone the first one against RHEL-8 as well..
Upstream ticket: https://pagure.io/SSSD/sssd/issue/3838
* master: 6bf5bca * sssd-1-16: 6c568c9
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2177