Bug 1017683 - Invalid UID in persistent keyring name
Summary: Invalid UID in persistent keyring name
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: David Howells
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedFreezeException
Depends On:
Blocks: F20BetaFreezeException 1017806 1018371
TreeView+ depends on / blocked
 
Reported: 2013-10-10 10:36 UTC by Vít Ondruch
Modified: 2014-03-03 10:30 UTC (History)
19 users (show)

Fixed In Version: kernel-3.11.5-302.fc20
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1017806 1018371 (view as bug list)
Environment:
Last Closed: 2013-11-05 05:10:09 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Today'r part of may journalctl output. (58.05 KB, text/plain)
2013-10-10 10:36 UTC, Vít Ondruch
no flags Details
keys-modalias (2.70 KB, text/plain)
2013-10-11 20:06 UTC, Josh Boyer
no flags Details
keys-modalias (2.04 KB, text/plain)
2013-10-12 01:19 UTC, Josh Boyer
no flags Details
kernel-3.11.5-302.fc20.x86_64 (55.69 KB, text/plain)
2013-10-18 12:02 UTC, Vít Ondruch
no flags Details
Fix keyring quota leak (3.22 KB, patch)
2013-10-18 16:59 UTC, David Howells
no flags Details | Diff

Description Vít Ondruch 2013-10-10 10:36:32 UTC
Created attachment 810441 [details]
Today'r part of may journalctl output.

Description of problem:
My system is running fine for 4 days, but suddenly, I cannot log in using my kerberos password. I see following report in my journalctl output:

říj 10 12:10:28 unused-4-115.brq.redhat.com [sssd[krb5_child[23818]: Invalid UID in persistent keyring name

but it was preceded by

říj 10 07:56:49 unused-4-115.brq.redhat.com [sssd[krb5_child[19287]: Disk quota exceeded

Which was probably my today's first (successful) login.

Please see attached log.



Version-Release number of selected component (if applicable):
$ rpm -q sssd
sssd-1.11.1-2.fc20.x86_64
$ rpm -q krb5-libs 
krb5-libs-1.11.3-21.fc20.x86_64


How reproducible:
Don't know. 


Steps to Reproduce:
1.
2.
3.

Actual results:
I can always login using my kerberos password

Expected results:
I does not work reliably.


Additional info:
Not sure if that is related, but the sssd experience in F20 is not that smooth as one would expect: https://bugzilla.gnome.org/show_bug.cgi?id=709607

Comment 1 Jakub Hrozek 2013-10-10 10:42:45 UTC
I've seen this bug myself this week but could never reproduce it..

What is the value of $KRB5CCNAME when the bug strikes?

Comment 2 Vít Ondruch 2013-10-10 10:44:52 UTC
$ echo $KRB5CCNAME
KEYRING:persistent:16025

$ id
uid=16025(vondruch) gid=16025(vondruch) skupiny=16025(vondruch),10(wheel),135(mock) kontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

Comment 3 Jakub Hrozek 2013-10-10 10:44:52 UTC
About the other bug, it would be nice to see /var/log/secure, the gdm logs only say the authentication failed, but nothing more.

Comment 4 Vít Ondruch 2013-10-10 11:22:33 UTC
With SSSD guys, we were able to get this backtrace:



(gdb) bt full
#0  __keyctl (arg5=140737333600352, arg4=0, arg3=4294967294, arg2=16025, cmd=22) at keyutils.c:62
No locals.
#1  keyctl (cmd=cmd@entry=22) at keyutils.c:78
        va = {{gp_offset = 4160607904, fp_offset = 32767, overflow_arg_area = 0x7fffffffbf40, reg_save_area = 0x7fffffffbf00}}
        arg2 = 16025
        arg3 = 4294967294
        arg4 = 0
        arg5 = 140737333600352
#2  0x00007ffff72b3b10 in keyctl_get_persistent (uid=uid@entry=16025, id=id@entry=-2) at keyutils.c:234
No locals.
#3  0x00007ffff7b34149 in get_persistent_real (uid=16025) at cc_keyring.c:446
        key = <optimized out>
#4  get_collection (anchor_name=<optimized out>, collection_name=<optimized out>, collection_id_out=collection_id_out@entry=0x7fffffffbf8c) at cc_keyring.c:627
        ret = <optimized out>
        anchor_id = <optimized out>
        possess_id = 0
        ckname = 0x555500000001 <Address 0x555500000001 out of bounds>
        uidnum = <optimized out>
#5  0x00007ffff7b3687c in krb5_krcc_resolve (context=<optimized out>, id=0x7fffffffc020, residual=<optimized out>) at cc_keyring.c:1061
        ret = 0
        collection_id = 0
        cache_id = <optimized out>
        anchor_name = 0x5555558d34f0 "persistent"
        collection_name = 0x5555558d33a0 "16025"
        subsidiary_name = 0x0
#6  0x00007ffff7b2b81a in krb5_cc_resolve (context=0x55555575c830, name=<optimized out>, cache=0x7fffffffc020) at ccbase.c:241
        pfx = 0x5555558d34f0 "persistent"
        cp = <optimized out>
        resid = 0x555555760d08 "persistent:16025"
        pfxlen = <optimized out>
        err = 0
        ops = 0x7ffff7dd4aa0 <krb5_krcc_ops>
#7  0x0000555555558142 in do_ccache_name (name=0x0) at klist.c:444
        code = <optimized out>
        cache = 0x0
#8  0x00005555555562d5 in main (argc=<optimized out>, argv=<optimized out>) at klist.c:248
        retval = 0
        c = <optimized out>
        name = 0x0
        mode = 0



keyctl_get_persistent returns errno 122.

$ rpm -q keyutils
keyutils-1.5.8-1.fc20.x86_64

Comment 5 Simo Sorce 2013-10-10 15:41:03 UTC
I think this is happening because either your kernel or your keyutils do not support persistent/big_key.
In the libkrb5 code we try to help the user and fallback to use the user keyring with user keys.
however there is one negative aspect about this fallback and that is that is uses the normal user quota for keys
the quota is set very low by default, around 12/15KiB, so it is easy to fill it up quickly when you have a full ccache in the keyring.

The simplest workaround is to just change your ccache type until you can use a kernel that has proper persistent/big_key support.

Simo.

Comment 6 Jakub Hrozek 2013-10-10 17:46:45 UTC
I can't speak for Vit, but I was hit by the issue earlier this week as well, just couldn't gather the required data as the bug locked me out completely.

I am running kernel-3.11.3-301.fc20.x86_64 and the ccache was working well most of the day, then suddenly stopped working..

Comment 7 Vít Ondruch 2013-10-11 10:14:04 UTC
(In reply to Simo Sorce from comment #5)
I installed F20 aplha together with SSSD more then two week ago. It was working. It was working for 4 days in my previous session when it suddenly stopped working (after 4 days of uptime). After restart, it works OK again.

These are kernels I was using during that period:

$ rpm -q kernel
kernel-3.11.1-300.fc20.x86_64
kernel-3.11.2-301.fc20.x86_64
kernel-3.11.3-301.fc20.x86_64

Comment 8 Stephen Gallagher 2013-10-11 12:11:01 UTC
All kernels newer than kernel-3.11.0-3.fc20.x86_64 are supposed to have the persistent/big_key support in them.

David, we need your help investigating this please. Can you identify the reasons why this error code (122) could be returned here?

Comment 9 Stephen Gallagher 2013-10-11 19:14:45 UTC
Ok, so a bunch of us did some digging today on IRC and discovered that the root cause of this error is that the keyring was being created as a "user" keyring type, not a "big_key" type.

With some investigation, we realized that this is because the big_key functionality in the kernel was compiled as a module and not loaded by default. The fallback to the user keyring type has very limited storage space and thus might hit the EDQUOT error condition.

With discussion with Josh Boyer, who determined that the following needs to be done: "[A]dd a modalias and have keyctl call out to load the module if it isn't there".

So for systems that aren't using big_key, we don't need to load the module, but for those that are we will have the kernel load it automatically.

Comment 10 Josh Boyer 2013-10-11 20:06:22 UTC
Created attachment 811375 [details]
keys-modalias

To be clear, I also said that if big_key wasn't really useful as a module then it shouldn't be configurable as such.

Anyway, the attached patch is along the lines of what I was thinking.  I discussed it a bit with Kyle McMartin as well, so credit should also go to him.

Here's a scratch build for people to test with (I haven't tested it myself yet):

http://koji.fedoraproject.org/koji/taskinfo?taskID=6051781

Comment 11 Josh Boyer 2013-10-11 21:47:59 UTC
(In reply to Josh Boyer from comment #10)
> Created attachment 811375 [details]
> keys-modalias
> 
> To be clear, I also said that if big_key wasn't really useful as a module
> then it shouldn't be configurable as such.
> 
> Anyway, the attached patch is along the lines of what I was thinking.  I
> discussed it a bit with Kyle McMartin as well, so credit should also go to
> him.
> 
> Here's a scratch build for people to test with (I haven't tested it myself
> yet):
> 
> http://koji.fedoraproject.org/koji/taskinfo?taskID=6051781

That might not work.  I'm working on a slightly different patch.

Comment 12 Josh Boyer 2013-10-12 01:19:13 UTC
Created attachment 811489 [details]
keys-modalias

This one actually works, at least in terms of getting the module auto-loaded.

[jwboyer@vader ~]$ modinfo big_key
filename:       /lib/modules/3.11.4-301.3.fc20.x86_64/kernel/security/keys/big_key.ko
alias:          keys-big_key
license:        GPL
depends:        
intree:         Y
vermagic:       3.11.4-301.3.fc20.x86_64 SMP mod_unload 
signer:         Fedora kernel signing key
sig_key:        A8:61:C4:8B:C1:2C:39:8F:87:89:F8:55:F9:17:B6:4D:92:58:8A:61
sig_hashalgo:   sha256
[jwboyer@vader ~]$ lsmod | grep big_key
[jwboyer@vader ~]$ cat bar.c
/* cc -o foo foo.c -lkeyutils */
#include <stdlib.h>
#include <keyutils.h>
int main(void) {
	return request_key("big_key", "big_key", NULL, KEY_REQKEY_DEFL_DEFAULT);
}

[jwboyer@vader ~]$ gcc -o bar bar.c -lkeyutils
[jwboyer@vader ~]$ ./bar 
[jwboyer@vader ~]$ lsmod | grep big_key
big_key                12672  0 
[jwboyer@vader ~]$ 

Scratch build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=6052333

Comment 13 Stephen Gallagher 2013-10-14 12:05:57 UTC
I installed this kernel scratch-build this morning.

Good News/Bad News:

Bad news: When I tried to log in using SSSD/Kerberos, I got an AVC denial[1]. This resulted in the module not loading and kerberos again falling back to the user keyring.

Good news: When I added policy to allow SSSD's krb5_child process to call module_request, it worked fine (so thanks, Josh!).

There are two ways (that I know of) to address this in the SELinux policy.

1) We can enable the domain_kernel_load_modules boolean to allow any process to trigger the loading of modules in the kernel. This is certainly not ideal and probably won't fly.

2)The other approach would be to whitelist applications that are allowed to trigger this request. The problem with this approach is that we may not be able to adequately determine which applications are at risk. The obvious ones are 'krb5_child', 'kinit', 'sshd' and 'rpc.gssd', but there may be others. However, given that it can fall back to the user keyring when it gets this denial, it may be acceptable to catch them as we go (since the fallback will still work until the number of retrieved keys grows large). I'm adding Dan Walsh and Miroslav Grepl to the CC list to chime in on this.







[1]
type=AVC msg=audit(1381751549.195:514): avc:  denied  { module_request } for  pid=2746 comm="krb5_child" kmod="keys-big_key" scontext=system_u:system_r:sssd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system


type=SYSCALL msg=audit(1381751549.195:514): arch=x86_64 syscall=add_key success=no exit=ENODEV a0=7f70a73bc688 a1=7f70a93d9360 a2=7f70a93daeb0 a3=1f1 items=0 ppid=818 pid=2746 auid=4294967295 uid=13041 gid=13041 euid=13041 suid=13041 fsuid=13041 egid=13041 sgid=13041 fsgid=13041 ses=4294967295 tty=(none) comm=krb5_child exe=/usr/libexec/sssd/krb5_child subj=system_u:system_r:sssd_t:s0 key=(null)

Comment 14 Simo Sorce 2013-10-14 21:15:26 UTC
Any third party application can do a 'kinit'-like operation, and they should be allowed to do so.
I think we should switch the module to be not modular if we can't easily allow any user to load this kernel module, or we'll keep having odd issues.

Comment 15 Josh Boyer 2013-10-15 11:32:52 UTC
We discussed this on IRC.  Apparently fixing SELinux here is untenable, so we'll build BIG_KEYS into the kernel.  A follow up patch to make it a bool in Kconfig will come later.

Comment 16 Fedora Blocker Bugs Application 2013-10-15 11:52:23 UTC
Proposed as a Freeze Exception for 20-beta by Fedora user sgallagh using the blocker tracking app because:

 Without this kernel patch, users relying on Kerberos for authentication and SSO will intermittently receive failures and errors (notably with misleading error messages).

Comment 17 Fedora Update System 2013-10-15 15:40:34 UTC
kernel-3.11.5-301.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.5-301.fc20

Comment 18 Adam Williamson 2013-10-16 17:45:16 UTC
Discussed at 2013-10-16 freeze exception review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-16/f20beta-blocker-review-4.2013-10-16-16.02.log.txt . Accepted as a freeze exception issue: this is a serious problem for users doing authentication via Kerberos that could interfere with their ability to access and use the system at all, and the fix is small and straightforward (simply build a capability into the kernel rather than as a module).

Comment 19 Fedora Update System 2013-10-17 20:27:07 UTC
Package kernel-3.11.5-302.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.5-302.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-19165/kernel-3.11.5-302.fc20
then log in and leave karma (feedback).

Comment 20 Vít Ondruch 2013-10-18 12:02:33 UTC
Created attachment 813758 [details]
kernel-3.11.5-302.fc20.x86_64

So after update with the state of this issue is worse then before! Previously, it took approximately 4 days until this bug appeared, now I can't login using my krb password after 2 hours from boot.

One positive news is that I can't see the "[sssd[krb5_child[19287]: Disk quota exceeded" message in my log anymore.


$ kinit
kinit: Invalid UID in persistent keyring name while getting default ccache

$ rpm -q kernel
kernel-3.11.3-301.fc20.x86_64
kernel-3.11.4-301.fc20.x86_64
kernel-3.11.5-302.fc20.x86_64

$ uname -a
Linux localhost 3.11.5-302.fc20.x86_64 #1 SMP Wed Oct 16 18:09:11 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ rpm -q keyutils
keyutils-1.5.8-1.fc20.x86_64

$ rpm -q sssd
sssd-1.11.1-5.fc20.x86_64

$ rpm -q krb5-libs 
krb5-libs-1.11.3-24.fc20.x86_64

Comment 21 Jakub Hrozek 2013-10-18 12:11:38 UTC
(In reply to Vít Ondruch from comment #20)
> One positive news is that I can't see the "[sssd[krb5_child[19287]: Disk
> quota exceeded" message in my log anymore.

FWIW, this error message was just a strerror() representation of error code that bubbled up from keyutils to libkrb5 and to the sssd.

Comment 22 David Howells 2013-10-18 16:59:37 UTC
Created attachment 813856 [details]
Fix keyring quota leak

There are two leaks in keyring quota handling:

 (1) If a key is replaced in a keyring, the keyring allocates four extra bytes of the user's quota, even though the keyring did not change in size.

 (2) When a key is unlinked from a keyring, the quota recovered is not credited to the user's account.

This patch should fix both of those.

This patch will need to go to James Morris's security tree next branch, not Linus's tree.

Note that "keyctl clear <keyring>" will recover the quota because the unrecovered/excessive quota is still recorded on the keyring.

Comment 23 Josh Boyer 2013-10-18 18:54:22 UTC
Patch applied in git.

Comment 24 Adam Williamson 2013-10-20 15:20:31 UTC
Same result as Vit Ondruch. I don't know if I'm mis-reading comment #22, but 'keyctl clear @u' doesn't help - I still can't use my password after issuing that.

Comment 25 Adam Williamson 2013-10-20 15:23:54 UTC
jwb: I see the "Build BIG_KEYS into the kernel (rhbz 1017683)" commit in f20, but not in master, and in master's config-generic, there's still:

CONFIG_BIG_KEYS=m

Does that change need to go to master too?

Comment 26 Josh Boyer 2013-10-20 23:50:04 UTC
(In reply to Adam Williamson from comment #24)
> Same result as Vit Ondruch. I don't know if I'm mis-reading comment #22, but
> 'keyctl clear @u' doesn't help - I still can't use my password after issuing
> that.

If you aren't testing with 3.11.6-300, you don't have the fix from David.  The original kernel update for this was just to build the config change in, then we added the patch  with a later commit (referred to in comment #23).  That got built with 3.11.6-300.

http://koji.fedoraproject.org/koji/buildinfo?buildID=472397

(In reply to Adam Williamson from comment #25)
> jwb: I see the "Build BIG_KEYS into the kernel (rhbz 1017683)" commit in
> f20, but not in master, and in master's config-generic, there's still:
> 
> CONFIG_BIG_KEYS=m
> 
> Does that change need to go to master too?

Yes.  Done now.  Thanks for the reminder.

Comment 27 Vít Ondruch 2013-10-21 11:29:38 UTC
I have booted kernel 3.11.6-300 for 2.5h and so far so good (in comparison to the morning, when using 3.11.5-302 kernel, 2.5h was enough to get out of the keyring space). /proc/key-users reports constant values.

@Josh I assume that new kernel update is not filled since you are waiting for  previous kernel to get into stable, is that right? Or is it just omission?

Comment 28 Josh Boyer 2013-10-21 11:52:41 UTC
(In reply to Vít Ondruch from comment #27)
> I have booted kernel 3.11.6-300 for 2.5h and so far so good (in comparison
> to the morning, when using 3.11.5-302 kernel, 2.5h was enough to get out of
> the keyring space). /proc/key-users reports constant values.
> 
> @Josh I assume that new kernel update is not filled since you are waiting
> for  previous kernel to get into stable, is that right? Or is it just
> omission?

Simply a timing issue.  I'll file it today.

Comment 29 Fedora Update System 2013-10-21 12:07:44 UTC
kernel-3.11.6-300.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/kernel-3.11.6-300.fc20

Comment 30 Fedora Update System 2013-10-21 18:27:46 UTC
Package kernel-3.11.6-300.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-3.11.6-300.fc20'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-19611/kernel-3.11.6-300.fc20
then log in and leave karma (feedback).

Comment 31 Adam Williamson 2013-10-21 18:31:35 UTC
Fix is looking good here so far too.

Comment 32 Fedora Update System 2013-10-22 05:38:53 UTC
kernel-3.11.6-300.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 33 Daniel Walsh 2013-10-22 14:32:08 UTC
Do we have this fix in Rawhide?

Comment 34 Josh Boyer 2013-10-22 14:40:45 UTC
(In reply to Daniel Walsh from comment #33)
> Do we have this fix in Rawhide?

In the kernel repo, yes.  Should be in a built kernel in rawhide tomorrow with this build:

http://koji.fedoraproject.org/koji/buildinfo?buildID=472963

(the 3.12.0-0.rc6-git0.1 build has the patch, but big_keys was still a module).

Comment 35 Stephen Gallagher 2013-10-24 17:56:21 UTC
Re-opening. In order for this fix to be complete, keyutils-1.5.8-1.fc20 also needs to land in the stable repo (it's been queued for it since the day of Beta Freeze, but apparently missed the cutoff).

Comment 36 Fedora Update System 2013-10-24 18:08:15 UTC
keyutils-1.5.8-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/FEDORA-2013-18334/keyutils-1.5.8-1.fc20

Comment 37 Adam Williamson 2013-10-24 18:36:57 UTC
I'll put it in the next F20 stable push request.

Comment 38 Fedora Update System 2013-11-05 03:39:57 UTC
keyutils-1.5.8-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 39 Adam Williamson 2013-11-05 05:10:09 UTC
This can be closed, now.

Comment 40 Fedora Update System 2013-11-10 08:04:58 UTC
kernel-3.11.5-302.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.