Bug 2152695
Summary: | Kerberos auth not working after updating to gnome-online-accounts-3.46.0-2 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Daniel Rusek <drusek> |
Component: | gnome-online-accounts | Assignee: | Gwyn Ciesla <gwync> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 37 | CC: | gnome-sig, gwync, jistone, mail, mcatanza, rstrode, sam, vashirov |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | gnome-online-accounts-3.46.0-3.fc37 gnome-online-accounts-3.46.0-4.fc37 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-01-26 01:21:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Daniel Rusek
2022-12-12 17:52:07 UTC
I assume your kerberos works with kinit/kdestroy? Do you get any other errors? Yep, it seems to work fine with kinit. Nope, sadly no other errors or anything relevant in system journal. https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/merge_requests/112 This is the upstream fix that causes the regression. It seems to be the only thing that gnome-online-accounts-3.46.0-2 added. Might be worth running goa-identity-service with G_MESSAGES_DEBUG=all and collecting some debug logs? Created attachment 1932408 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-1
Created attachment 1932409 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-2
Interesting. I've updated my system to 3.46.0-2 and it works for me. How does this behave with other kerberos IDs/realms? If you have such access? I sadly do not have access to other Kerberos IDs/realms. Did you add this account to GOA manually or did you set it up via cli at some point and it detected it? I added it manually using GUI (the Online Accounts tab of GNOME Settings). uhhh nice (process:4570): libgoaidentity-DEBUG: 19:19:39.690: GoaIdentityService: could not ensure credentials for account drusek.COM: GDBus.Error:org.gnome.OnlineAccounts.Error.NotAuthorized: Unknown error I'm guessing I somehow broke kernel keyring kerberos when fixing kcm kerberos. will try to reproduce Interesting. looks like it's this commit: https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/commit/4acfcc323e986526975ede981673dd173be4e267 It's "expected" the identifier will be NULL if it's a fresh cache. actually there's still a lingering issue, where the newly initialized cache isn't switched to. one moment. okay I think I got the issues now. Would someone mind putting a build together and testing it? I gotta run to my train. Will do and I'll post a link to the build here. 115 and 116 don't apply cleanly to 3.46.0 with 112 applied, so I refactored a bit. Scratch: https://koji.fedoraproject.org/koji/taskinfo?taskID=95415387 thanks! Sadly still the same issue when using the scratch build from #c18. See the attached log. Created attachment 1933152 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-3
okay, how about: Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155 ? (In reply to Ray Strode [halfline] from comment #22) > okay, how about: > > Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155 > > ? That build seems to work fine! I now have a valid Kerberos ticket. Thanks! :-) FEDORA-2022-ec01c8fadb has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-ec01c8fadb FEDORA-2022-ec01c8fadb has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-ec01c8fadb` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-ec01c8fadb See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. (In reply to Ray Strode [halfline] from comment #22) > okay, how about: > > Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155 > > ? Thank you! What additional changes do I need to apply? (Maybe we should do a new upstream release, so Gwyn doesn't have to guess what needs to be cherry-picked? Right now we have in Fedora only a few of the many Kerberos-related fixes that are present upstream.) She would love that. :) FEDORA-2022-ec01c8fadb has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report. I notice that with this update, gnome-online-accounts has just totally stopped renewing my Fedora kerberos ticket. Hi Ray, let's plan to debug this after holiday break.... I have GOA kerberos accounts for both Fedora and internal Red Hat, and with 3.46.0-3, some Red Hat auth stopped working, like GSSAPI ssh. With "ssh -v", I see messages like this: debug1: Next authentication method: gssapi-with-mic debug1: No credentials were supplied, or the credentials were unavailable or inaccessible Credentials cache keyring 'persistent:1000:krb_ccache_zIBIVq1' not found With "klist -A", I see that same cache not found, but there are also good caches with valid tickets for each of my accounts. If I "kdestroy -A", then after a few moments "klist -A" comes back with two good caches and one not found again. This was working fine with 3.46.0-1, and does again if I downgrade. It does not show any caches not found in that case, though that may be a red herring. I can confirm that after updating to gnome-online-accounts-3.46.0-3.fc37.x86_64 my ticket expires and is not renewed. I Reproducer is the same as in https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79#note_1604664 For now I rolled back to gnome-online-accounts-3.46.0-1.fc37.x86_64 i've put a scratch build up with some of the other missing fixes from upstream. One of those fixes was a threading problem that could prevent renewal. scratch build is here: https://koji.fedoraproject.org/koji/taskinfo?taskID=96100798 Note the reproducer from https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79#note_1604664 isn't expected to have functioning renewal. It's intentionally using "weird" inputs to induce a bug in KCM credential cache handling. If you want working renewal, you need to give a bigger value for -r than -l. Normally -r would be some multiple of -l so you can renew e.g. 5 times before requiring a password or whatever. let's reopen this... That scratch build has the same problems for me. I tried this scratch build -- didn't work. I also tried gnome-online-accounts from master -- also didn't work. great thanks guys. Are you guys using KCM or KEYRING credential cache type? I'll spend some time today trying to reproduce. I guess gnome-online-accounts really needs some kerberos tests... I'm using KEYRING. And I'm using KCM. Okay I think I found the problem. I believe it's not actualy an issue with renewal (a feature built into the KDC that doesn't require a password at all the perform) but with automatic reinitialization (which requires a password that gets pulled from GNOME keyring). Can you try this build https://koji.fedoraproject.org/koji/taskinfo?taskID=96217939 ? Note it has the same nvr as the last scratch build, so you may need to --force install it. Thanks, this works when I have just one principal<->cache mapping, but when I have 2 caches for the same principal, automatic reinitialization doesn't work for that principal. Another side effect is that it switches the default principal to the expired ticket instead of renewing it. $ cat ./goa_test.sh #!/bin/bash echo '[+] destroy all kerberos tickets' kdestroy -A klist -l echo '[+] obtain long-lived tgt' echo Secret123 | kinit employee.ORG -c KCM:$UID:2 >/dev/null klist -l sleep 1 echo '[+] obtain short-lived tgt' echo Secret123 | kinit employee.ORG -c KCM:$UID:3 -l 1s >/dev/null sleep 1 klist -l echo '[+] start goa-daemon and goa-identity-service' /usr/libexec/goa-daemon --replace & /usr/libexec/goa-identity-service & sleep 1 echo '[+] list tickets' klist -l $ ./goa_test.sh [+] destroy all kerberos tickets Principal name Cache name -------------- ---------- [+] obtain long-lived tgt Principal name Cache name -------------- ---------- employee.ORG KCM:1000:2 [+] obtain short-lived tgt Principal name Cache name -------------- ---------- employee.ORG KCM:1000:2 employee.ORG KCM:1000:3 (Expired) [+] start goa-daemon and goa-identity-service (process:94500): libgoaidentity-WARNING **: 23:59:08.277: GoaKerberosIdentityManager: Using polling for change notification for credential cache type 'KCM' goa-daemon-Message: 23:59:08.317: goa-daemon version 3.46.0 starting goa-daemon-Message: 23:59:08.325: goa-daemon version 3.46.0 exiting [+] list tickets Principal name Cache name -------------- ---------- employee.ORG KCM:1000:3 (Expired) employee.ORG KCM:1000:2 Thanks for the testing and feedback, I'll look into the issues you've uncovered. I've made a few changes to your reproducer: ╎❯ cat goa_test.sh #!/bin/bash echo -e '\n[+] Stopping identity serivce' pkill -f -STOP goa-identity-service echo -e '\n[+] destroy all kerberos tickets' kdestroy -A sleep 1 klist -A echo -e '\n[+] obtain long-lived tgt' echo Secret123 | kinit employee.ORG -c KCM:$UID:long-term >/dev/null sleep 2 klist echo -e '\n[+] obtain short-lived tgt' echo Secret123 | kinit employee.ORG -c KCM:$UID:short-term -l 5s >/dev/null sleep 2 klist echo -e '\n[+] Waiting 5 seconds for tgt to expire' sleep 5 echo -e '\n[+] Demonstrating short-lived tgt is expired' klist -l echo -e '\n[+] starting goa-identity-service' pkill -f -TERM goa-identity-service pkill -f -CONT goa-identity-service G_MESSAGES_DEBUG=all /usr/libexec/goa-identity-service >& identity.log & sleep 10 echo -e '\n[+] list default ticket' klist echo -e '\n[+] list all tickets' klist -l The main 3 things I did were 1) stop the existing identity service if it's running before doing anything, so we can be sure it's not stepping on toes during the initial setup 2) Add more sleeps to give kcm and identity service more time to get through the changes we're looking for. 3) Don't bother bouncing goa-daemon. It's not really part of the change and having it restart complicates things. The other big detail, of course, is the password has to be in the keyring. This means, in order for the test to work, the user has to have set up employee.ORG in control-center and save Secret123 into the keyring. I assume you did that already, but I thought it should be explicitly pointed out for those following along. I was able to reproduce the issue you were talking about and fixed it along with some related issues. The scratch build here: https://koji.fedoraproject.org/koji/taskinfo?taskID=96263247 I apologize for not being clear, your assumptions are correct. And thanks for improving the test script. With the new build I see that default principal is switched to the valid one. But I still see the difference in behaviour between gnome-online-accounts-3.46.0-1 and gnome-online-accounts-3.46.0-4: gnome-online-accounts-3.46.0-1 ./goa_test.sh ... [+] list all tickets Principal name Cache name -------------- ---------- employee.ORG KCM:1000:long-term employee.ORG KCM:1000:short-term gnome-online-accounts-3.46.0-4 ./goa_test.sh ... [+] list all tickets Principal name Cache name -------------- ---------- employee.ORG KCM:1000:long-term employee.ORG KCM:1000:short-term (Expired) With the new version it doesn't reinit expired tickets. While with the old one it would reinit all expired tickets. And if I destroy the valid ticket, leaving only expired tickets, it doesn't attempt to reinit them. Created attachment 1939011 [details]
updated test script
With the updated test script:
gnome-online-accounts-3.46.0-1
[+] list all tickets
Principal name Cache name
-------------- ----------
employee.ORG KCM:1000:short-term
employee.ORG KCM:1000:another-short-term
gnome-online-accounts-3.46.0-4
[+] list all tickets
Principal name Cache name
-------------- ----------
employee.ORG KCM:1000:another-short-term (Expired)
employee.ORG KCM:1000:short-term (Expired)
(In reply to Viktor Ashirov from comment #44) > With the new build I see that default principal is switched to the valid > one. But I still see the difference in behaviour between > gnome-online-accounts-3.46.0-1 and gnome-online-accounts-3.46.0-4: ... > With the new version it doesn't reinit expired tickets. While with the old > one it would reinit all expired tickets. It's expected that it doesn't reinit all tickets. One of the changes in design was to look at the available credentials caches and figure out which one is the best, deeming it the "active" one for a given principal and ignoring the others. > And if I destroy the valid ticket, leaving only expired tickets, it doesn't > attempt to reinit them. So this part is not expected. When the active one goes away or expires a new active one should get picked. I'll look into it, thanks. Okay I think this should be settled now, hopefully, (well...we'll see ;-). Fixing this was a little more invasive than I would have liked, but it is what it is: https://koji.fedoraproject.org/scratch/rstrode/task_96368408/ Yes, with this build I have my tickets renewed :) Thank you! Thank you all! If you submit a PR I'll merge and build ASAP. Unfortunately, my problem remains: I still get two good caches for my two principals, plus one "not found", and ssh trips on the one not found. We can split mine into a separate bz though, if you prefer. Josh, if you use kswitch does it start working and stay working? can you post your klist -A output? It does not help to kswitch, even though klist (without -A) does show the expected principal afterward. $ klist -A Ticket cache: KEYRING:persistent:10719:krb_ccache_CoB10Fw Default principal: jistone Valid starting Expires Service principal 01/19/2023 12:27:26 01/19/2023 22:27:26 krbtgt/REDHAT.COM klist: Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found Ticket cache: KEYRING:persistent:10719:krb_ccache_qrLoYk1 Default principal: jistone Valid starting Expires Service principal 01/19/2023 12:27:09 01/20/2023 12:27:09 krbtgt/FEDORAPROJECT.ORG renew until 01/26/2023 12:27:09 And for example, ssh -v says: debug1: Next authentication method: gssapi-with-mic debug1: No credentials were supplied, or the credentials were unavailable or inaccessible Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found debug1: No credentials were supplied, or the credentials were unavailable or inaccessible Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found debug1: No more authentication methods to try. ... but ssh works if I force -o GSSAPIClientIdentity=jistone Okay I switched from KCM to KEYRING again and immediately hit a crasher bug that might explain what's going on. Can you try: https://koji.fedoraproject.org/koji/taskinfo?taskID=96378150 ? I haven't seen any crashes, but that build still has the problem. if you kdestroy -c just the inaccessible cache does the problem go away? I guess it's possible ssh is just giving up when it hits a partial cache instead of trying the next one (or something). it's weird it's not trying the default cache first though. i'll see i can make goa-identity-service clean it up. Aha, yes, after "kdestroy -c KEYRING:..." it works! alright, there was a bug with KEYRING where it was creating a credentials cache that it never initialized and then orphaned. This was confusing ssh. I think this scratch build should fix it: https://koji.fedoraproject.org/koji/taskinfo?taskID=96381837 I'll check back tomorrow, and assuming it resolves all the issues (which ... we're already on comment 56 so who knows...) I'll push it upstream and do a downstream pull request. That build works for me, thanks! (In reply to Ray Strode [halfline] from comment #57) > I'll check back tomorrow, and assuming it resolves all the issues (which ... > we're already on comment 56 so who knows...) I'll push it upstream and do a > downstream pull request. Let's talk to Emmanuele about doing an upstream 3.46.1 release. I might help with this. So I think the best way forward is: 1. I'll do a new build right now, there's no reason to delay, especially since people have been living with a regression for a couple weeks now, and we already have scratch srpm tested... 2. You wrangle the release, though. 3. We can just drop the patch and do another update then FEDORA-2023-e966905644 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-e966905644 FEDORA-2023-e966905644 has been pushed to the Fedora 37 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-e966905644` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e966905644 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2023-e966905644 has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report. |