Bug 2152695

Summary: Kerberos auth not working after updating to gnome-online-accounts-3.46.0-2
Product: [Fedora] Fedora Reporter: Daniel Rusek <drusek>
Component: gnome-online-accountsAssignee: Gwyn Ciesla <gwync>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 37CC: gnome-sig, gwync, jistone, mail, mcatanza, rstrode, sam, vashirov
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: gnome-online-accounts-3.46.0-3.fc37 gnome-online-accounts-3.46.0-4.fc37 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-01-26 01:21:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
goa-identity-service log from gnome-online-accounts-3.46.0-1
none
goa-identity-service log from gnome-online-accounts-3.46.0-2
none
goa-identity-service log from gnome-online-accounts-3.46.0-3
none
updated test script none

Description Daniel Rusek 2022-12-12 17:52:07 UTC
Description of problem:
After updating to gnome-online-accounts-3.46.0-2, my Kerberos account (added using GNOME Settings) does not seem to work anymore. Everything works fine after downgrading to gnome-online-accounts-3.46.0-1.

Version-Release number of selected component (if applicable):
gnome-online-accounts-3.46.0-2.fc37.x86_64

How reproducible:
Every time.

Steps to Reproduce:
1. Add Kerberos account using GNOME Settings on a system with gnome-online-accounts-3.46.0-1 or older.
2. Confirm that the Kerberos account works fine (using klist or any tool/web page that uses Kerberos auth).
3. Upgrade gnome-online-accounts to 3.46.0-2 and reboot the system.
4. Confirm that the Kerberos account works fine (using klist or any tool/web page that uses Kerberos auth).

Actual results:
$ klist
klist: Credentials cache keyring 'persistent:1000:1000' not found

Expected results:
Kerberos auth is working properly and has valid ticket.

Comment 1 Gwyn Ciesla 2022-12-12 18:11:34 UTC
I assume your kerberos works with kinit/kdestroy?

Do you get any other errors?

Comment 2 Daniel Rusek 2022-12-12 18:16:40 UTC
Yep, it seems to work fine with kinit. Nope, sadly no other errors or anything relevant in system journal.

Comment 3 Daniel Rusek 2022-12-13 10:48:08 UTC
https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/merge_requests/112

This is the upstream fix that causes the regression. It seems to be the only thing that gnome-online-accounts-3.46.0-2 added.

Comment 4 Sam Morris 2022-12-13 12:07:48 UTC
Might be worth running goa-identity-service with G_MESSAGES_DEBUG=all and collecting some debug logs?

Comment 5 Daniel Rusek 2022-12-13 18:32:42 UTC
Created attachment 1932408 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-1

Comment 6 Daniel Rusek 2022-12-13 18:33:10 UTC
Created attachment 1932409 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-2

Comment 8 Gwyn Ciesla 2022-12-14 15:57:10 UTC
Interesting. I've updated my system to 3.46.0-2 and it works for me.

How does this behave with other kerberos IDs/realms? If you have such access?

Comment 9 Daniel Rusek 2022-12-14 16:24:12 UTC
I sadly do not have access to other Kerberos IDs/realms.

Comment 10 Gwyn Ciesla 2022-12-14 19:24:42 UTC
Did you add this account to GOA manually or did you set it up via cli at some point and it detected it?

Comment 11 Daniel Rusek 2022-12-14 21:52:55 UTC
I added it manually using GUI (the Online Accounts tab of GNOME Settings).

Comment 12 Ray Strode [halfline] 2022-12-15 19:04:49 UTC
uhhh nice

(process:4570): libgoaidentity-DEBUG: 19:19:39.690: GoaIdentityService: could not ensure credentials for account drusek.COM: GDBus.Error:org.gnome.OnlineAccounts.Error.NotAuthorized: Unknown error

I'm guessing I somehow broke kernel keyring kerberos when fixing kcm kerberos. will try to reproduce

Comment 13 Gwyn Ciesla 2022-12-15 19:43:56 UTC
Interesting.

Comment 14 Ray Strode [halfline] 2022-12-15 19:50:20 UTC
looks like it's this commit: https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/commit/4acfcc323e986526975ede981673dd173be4e267

It's "expected" the identifier will be NULL if it's a fresh cache.

Comment 15 Ray Strode [halfline] 2022-12-15 20:19:19 UTC
actually there's still a lingering issue, where the newly initialized cache isn't switched to. one moment.

Comment 16 Ray Strode [halfline] 2022-12-15 21:35:53 UTC
okay I think I got the issues now. Would someone mind putting a build together and testing it? I gotta run to my train.

Comment 17 Gwyn Ciesla 2022-12-15 21:40:18 UTC
Will do and I'll post a link to the build here.

Comment 18 Gwyn Ciesla 2022-12-15 22:09:07 UTC
115 and 116 don't apply cleanly to 3.46.0 with 112 applied, so I refactored a bit.

Scratch: https://koji.fedoraproject.org/koji/taskinfo?taskID=95415387

Comment 19 Ray Strode [halfline] 2022-12-16 14:09:08 UTC
thanks!

Comment 20 Daniel Rusek 2022-12-16 22:42:11 UTC
Sadly still the same issue when using the scratch build from #c18. See the attached log.

Comment 21 Daniel Rusek 2022-12-16 22:42:46 UTC
Created attachment 1933152 [details]
goa-identity-service log from gnome-online-accounts-3.46.0-3

Comment 22 Ray Strode [halfline] 2022-12-17 05:08:47 UTC
okay, how about:

Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155

?

Comment 23 Daniel Rusek 2022-12-17 13:28:17 UTC
(In reply to Ray Strode [halfline] from comment #22)
> okay, how about:
> 
> Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155
> 
> ?

That build seems to work fine! I now have a valid Kerberos ticket. Thanks! :-)

Comment 24 Fedora Update System 2022-12-18 04:18:47 UTC
FEDORA-2022-ec01c8fadb has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-ec01c8fadb

Comment 25 Fedora Update System 2022-12-19 02:15:34 UTC
FEDORA-2022-ec01c8fadb has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-ec01c8fadb`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-ec01c8fadb

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 26 Gwyn Ciesla 2022-12-19 14:26:56 UTC
(In reply to Ray Strode [halfline] from comment #22)
> okay, how about:
> 
> Task info: https://koji.fedoraproject.org/koji/taskinfo?taskID=95464155
> 
> ?

Thank you!

What additional changes do I need to apply?

Comment 27 Michael Catanzaro 2022-12-19 14:40:07 UTC
(Maybe we should do a new upstream release, so Gwyn doesn't have to guess what needs to be cherry-picked? Right now we have in Fedora only a few of the many Kerberos-related fixes that are present upstream.)

Comment 28 Gwyn Ciesla 2022-12-19 15:27:32 UTC
She would love that. :)

Comment 29 Fedora Update System 2022-12-20 01:28:36 UTC
FEDORA-2022-ec01c8fadb has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 30 Michael Catanzaro 2022-12-23 18:32:59 UTC
I notice that with this update, gnome-online-accounts has just totally stopped renewing my Fedora kerberos ticket. Hi Ray, let's plan to debug this after holiday break....

Comment 31 Josh Stone 2023-01-04 21:14:57 UTC
I have GOA kerberos accounts for both Fedora and internal Red Hat, and with 3.46.0-3, some Red Hat auth stopped working, like GSSAPI ssh. With "ssh -v", I see messages like this:

debug1: Next authentication method: gssapi-with-mic
debug1: No credentials were supplied, or the credentials were unavailable or inaccessible
Credentials cache keyring 'persistent:1000:krb_ccache_zIBIVq1' not found

With "klist -A", I see that same cache not found, but there are also good caches with valid tickets for each of my accounts. If I "kdestroy -A", then after a few moments "klist -A" comes back with two good caches and one not found again.

This was working fine with 3.46.0-1, and does again if I downgrade. It does not show any caches not found in that case, though that may be a red herring.

Comment 32 Viktor Ashirov 2023-01-10 16:50:11 UTC
I can confirm that after updating to gnome-online-accounts-3.46.0-3.fc37.x86_64 my ticket expires and is not renewed. I 
Reproducer is the same as in https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79#note_1604664
For now I rolled back to gnome-online-accounts-3.46.0-1.fc37.x86_64

Comment 33 Ray Strode [halfline] 2023-01-13 19:42:21 UTC
i've put a scratch build up with some of the other missing fixes from upstream.

One of those fixes was a threading problem that could prevent renewal.

scratch build is here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=96100798

Note the reproducer from https://gitlab.gnome.org/GNOME/gnome-online-accounts/-/issues/79#note_1604664 isn't expected to have functioning renewal. It's intentionally using "weird" inputs to induce a bug in KCM credential cache handling.

If you want working renewal, you need to give a bigger value for -r than -l. Normally -r would be some multiple of -l so you can renew e.g. 5 times before requiring a password or whatever.

Comment 34 Ray Strode [halfline] 2023-01-13 19:43:04 UTC
let's reopen this...

Comment 35 Josh Stone 2023-01-14 00:23:54 UTC
That scratch build has the same problems for me.

Comment 36 Viktor Ashirov 2023-01-15 14:36:21 UTC
I tried this scratch build -- didn't work. I also tried gnome-online-accounts from master -- also didn't work.

Comment 37 Ray Strode [halfline] 2023-01-16 15:30:32 UTC
great thanks guys. Are you guys using KCM or KEYRING credential cache type? I'll spend some time today trying to reproduce.

I guess gnome-online-accounts really needs some kerberos tests...

Comment 38 Josh Stone 2023-01-16 18:08:27 UTC
I'm using KEYRING.

Comment 39 Viktor Ashirov 2023-01-16 19:13:56 UTC
And I'm using KCM.

Comment 40 Ray Strode [halfline] 2023-01-16 20:40:37 UTC
Okay I think I found the problem. I believe it's not actualy an issue with renewal (a feature built into the KDC that doesn't require a password at all the perform) but with automatic reinitialization (which requires a password that gets pulled from GNOME keyring).

Can you try this build https://koji.fedoraproject.org/koji/taskinfo?taskID=96217939 ? 

Note it has the same nvr as the last scratch build, so you may need to --force install it.

Comment 41 Viktor Ashirov 2023-01-16 23:02:06 UTC
Thanks, this works when I have just one principal<->cache mapping, but when I have 2 caches for the same principal, automatic reinitialization doesn't work for that principal. Another side effect is that it switches the default principal to the expired ticket instead of renewing it.

$ cat ./goa_test.sh
#!/bin/bash
echo '[+] destroy all kerberos tickets'
kdestroy -A
klist -l

echo '[+] obtain long-lived tgt'
echo Secret123 | kinit employee.ORG -c KCM:$UID:2 >/dev/null
klist -l
sleep 1

echo '[+] obtain short-lived tgt'
echo Secret123 | kinit employee.ORG -c KCM:$UID:3 -l 1s >/dev/null
sleep 1
klist -l

echo '[+] start goa-daemon and goa-identity-service'
/usr/libexec/goa-daemon --replace &
/usr/libexec/goa-identity-service &
sleep 1

echo '[+] list tickets'
klist -l


$ ./goa_test.sh
[+] destroy all kerberos tickets
Principal name                 Cache name
--------------                 ----------
[+] obtain long-lived tgt
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:2
[+] obtain short-lived tgt
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:2
employee.ORG     KCM:1000:3 (Expired)
[+] start goa-daemon and goa-identity-service

(process:94500): libgoaidentity-WARNING **: 23:59:08.277: GoaKerberosIdentityManager: Using polling for change notification for credential cache type 'KCM'
goa-daemon-Message: 23:59:08.317: goa-daemon version 3.46.0 starting
goa-daemon-Message: 23:59:08.325: goa-daemon version 3.46.0 exiting
[+] list tickets
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:3 (Expired)
employee.ORG     KCM:1000:2

Comment 42 Ray Strode [halfline] 2023-01-17 15:39:27 UTC
Thanks for the testing and feedback, I'll look into the issues you've uncovered.

Comment 43 Ray Strode [halfline] 2023-01-17 19:38:25 UTC
I've made a few changes to your reproducer:

╎❯ cat goa_test.sh 
#!/bin/bash
echo -e '\n[+] Stopping identity serivce'
pkill -f -STOP goa-identity-service

echo -e '\n[+] destroy all kerberos tickets'
kdestroy -A
sleep 1
klist -A

echo -e '\n[+] obtain long-lived tgt'
echo Secret123 | kinit employee.ORG -c KCM:$UID:long-term >/dev/null
sleep 2
klist

echo -e '\n[+] obtain short-lived tgt'
echo Secret123 | kinit employee.ORG -c KCM:$UID:short-term -l 5s >/dev/null
sleep 2
klist
echo -e '\n[+] Waiting 5 seconds for tgt to expire'
sleep 5

echo -e '\n[+] Demonstrating short-lived tgt is expired'
klist -l

echo -e '\n[+] starting goa-identity-service'
pkill -f -TERM goa-identity-service
pkill -f -CONT goa-identity-service
G_MESSAGES_DEBUG=all /usr/libexec/goa-identity-service >& identity.log &

sleep 10

echo -e '\n[+] list default ticket'
klist

echo -e '\n[+] list all tickets'
klist -l

The main 3 things I did were 

1) stop the existing identity service if it's running before doing anything, so we can be sure it's not stepping on toes during the initial setup
2) Add more sleeps to give kcm and identity service more time to get through the changes we're looking for.
3) Don't bother bouncing goa-daemon. It's not really part of the change and having it restart complicates things.

The other big detail, of course, is the password has to be in the keyring. This means, in order for the test to work, the user has to have set up employee.ORG in control-center and save Secret123 into the keyring. I assume you did that already, but I thought it should be explicitly pointed out for those following along.

I was able to reproduce the issue you were talking about and fixed it along with some related issues. The scratch build here:

https://koji.fedoraproject.org/koji/taskinfo?taskID=96263247

Comment 44 Viktor Ashirov 2023-01-18 18:34:04 UTC
I apologize for not being clear, your assumptions are correct. And thanks for improving the test script.

With the new build I see that default principal is switched to the valid one. But I still see the difference in behaviour between gnome-online-accounts-3.46.0-1 and gnome-online-accounts-3.46.0-4:

gnome-online-accounts-3.46.0-1
./goa_test.sh
...
[+] list all tickets
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:long-term
employee.ORG     KCM:1000:short-term

gnome-online-accounts-3.46.0-4
./goa_test.sh
...
[+] list all tickets
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:long-term
employee.ORG     KCM:1000:short-term (Expired)


With the new version it doesn't reinit expired tickets. While with the old one it would reinit all expired tickets.
And if I destroy the valid ticket, leaving only expired tickets, it doesn't attempt to reinit them.

Comment 45 Viktor Ashirov 2023-01-18 18:37:37 UTC
Created attachment 1939011 [details]
updated test script

With the updated test script:
gnome-online-accounts-3.46.0-1
[+] list all tickets
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:short-term
employee.ORG     KCM:1000:another-short-term


gnome-online-accounts-3.46.0-4
[+] list all tickets
Principal name                 Cache name
--------------                 ----------
employee.ORG     KCM:1000:another-short-term (Expired)
employee.ORG     KCM:1000:short-term (Expired)

Comment 46 Ray Strode [halfline] 2023-01-18 19:14:36 UTC
(In reply to Viktor Ashirov from comment #44)
> With the new build I see that default principal is switched to the valid
> one. But I still see the difference in behaviour between
> gnome-online-accounts-3.46.0-1 and gnome-online-accounts-3.46.0-4:
...
> With the new version it doesn't reinit expired tickets. While with the old
> one it would reinit all expired tickets.
It's expected that it doesn't reinit all tickets. One of the changes in design
was to look at the available credentials caches and figure out which one is the
best, deeming it the "active" one for a given principal and ignoring the others. 

> And if I destroy the valid ticket, leaving only expired tickets, it doesn't
> attempt to reinit them.
So this part is not expected. When the active one goes away or expires a new
active one should get picked. I'll look into it, thanks.

Comment 47 Ray Strode [halfline] 2023-01-19 18:27:31 UTC
Okay I think this should be settled now, hopefully, (well...we'll see ;-). Fixing this was a little more invasive than I would have liked, but it is what it is:

https://koji.fedoraproject.org/scratch/rstrode/task_96368408/

Comment 48 Viktor Ashirov 2023-01-19 20:01:02 UTC
Yes, with this build I have my tickets renewed :)
Thank you!

Comment 49 Gwyn Ciesla 2023-01-19 20:12:26 UTC
Thank you all! If you submit a PR I'll merge and build ASAP.

Comment 50 Josh Stone 2023-01-19 20:21:29 UTC
Unfortunately, my problem remains: I still get two good caches for my two principals, plus one "not found", and ssh trips on the one not found. We can split mine into a separate bz though, if you prefer.

Comment 51 Ray Strode [halfline] 2023-01-19 20:27:36 UTC
Josh, if you use kswitch does it start working and stay working? can you post your klist -A output?

Comment 52 Josh Stone 2023-01-19 20:35:07 UTC
It does not help to kswitch, even though klist (without -A) does show the expected principal afterward.

$ klist -A
Ticket cache: KEYRING:persistent:10719:krb_ccache_CoB10Fw
Default principal: jistone

Valid starting       Expires              Service principal
01/19/2023 12:27:26  01/19/2023 22:27:26  krbtgt/REDHAT.COM

klist: Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found

Ticket cache: KEYRING:persistent:10719:krb_ccache_qrLoYk1
Default principal: jistone

Valid starting       Expires              Service principal
01/19/2023 12:27:09  01/20/2023 12:27:09  krbtgt/FEDORAPROJECT.ORG
        renew until 01/26/2023 12:27:09


And for example, ssh -v says:

debug1: Next authentication method: gssapi-with-mic
debug1: No credentials were supplied, or the credentials were unavailable or inaccessible
Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found


debug1: No credentials were supplied, or the credentials were unavailable or inaccessible
Credentials cache keyring 'persistent:10719:krb_ccache_eKFjpsv' not found


debug1: No more authentication methods to try.

... but ssh works if I force -o GSSAPIClientIdentity=jistone

Comment 53 Ray Strode [halfline] 2023-01-19 20:41:20 UTC
Okay I switched from KCM to KEYRING again and immediately hit a crasher bug that might explain what's going on.

Can you try:

https://koji.fedoraproject.org/koji/taskinfo?taskID=96378150

?

Comment 54 Josh Stone 2023-01-19 20:52:37 UTC
I haven't seen any crashes, but that build still has the problem.

Comment 55 Ray Strode [halfline] 2023-01-19 20:58:03 UTC
if you kdestroy -c just the inaccessible cache does the problem go away? I guess it's possible ssh is just giving up when it hits a partial cache instead of trying the next one (or something). it's weird it's not trying the default cache first though.

i'll see i can make goa-identity-service clean it up.

Comment 56 Josh Stone 2023-01-19 21:04:53 UTC
Aha, yes, after "kdestroy -c KEYRING:..." it works!

Comment 57 Ray Strode [halfline] 2023-01-19 22:03:42 UTC
alright, there was a bug with KEYRING where it was creating a credentials cache that it never initialized and then orphaned. This was confusing ssh.

I think this scratch build should fix it:

https://koji.fedoraproject.org/koji/taskinfo?taskID=96381837

I'll check back tomorrow, and assuming it resolves all the issues (which ... we're already on comment 56 so who knows...) I'll push it upstream and do a downstream pull request.

Comment 58 Josh Stone 2023-01-19 22:48:33 UTC
That build works for me, thanks!

Comment 59 Michael Catanzaro 2023-01-19 22:50:18 UTC
(In reply to Ray Strode [halfline] from comment #57)
> I'll check back tomorrow, and assuming it resolves all the issues (which ...
> we're already on comment 56 so who knows...) I'll push it upstream and do a
> downstream pull request.

Let's talk to Emmanuele about doing an upstream 3.46.1 release. I might help with this.

Comment 60 Ray Strode [halfline] 2023-01-20 14:49:21 UTC
So I think the best way forward is:

1. I'll do a new build right now, there's no reason to delay, especially since people have been living with a regression for a couple weeks now, and we already have scratch srpm tested...
2. You wrangle the release, though.
3. We can just drop the patch and do another update then

Comment 61 Fedora Update System 2023-01-20 15:05:44 UTC
FEDORA-2023-e966905644 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2023-e966905644

Comment 62 Fedora Update System 2023-01-22 02:41:01 UTC
FEDORA-2023-e966905644 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2023-e966905644`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2023-e966905644

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 63 Fedora Update System 2023-01-26 01:21:38 UTC
FEDORA-2023-e966905644 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.