Created attachment 810441 [details] Today'r part of may journalctl output. Description of problem: My system is running fine for 4 days, but suddenly, I cannot log in using my kerberos password. I see following report in my journalctl output: říj 10 12:10:28 unused-4-115.brq.redhat.com [sssd[krb5_child[23818]: Invalid UID in persistent keyring name but it was preceded by říj 10 07:56:49 unused-4-115.brq.redhat.com [sssd[krb5_child[19287]: Disk quota exceeded Which was probably my today's first (successful) login. Please see attached log. Version-Release number of selected component (if applicable): $ rpm -q sssd sssd-1.11.1-2.fc20.x86_64 $ rpm -q krb5-libs krb5-libs-1.11.3-21.fc20.x86_64 How reproducible: Don't know. Steps to Reproduce: 1. 2. 3. Actual results: I can always login using my kerberos password Expected results: I does not work reliably. Additional info: Not sure if that is related, but the sssd experience in F20 is not that smooth as one would expect: https://bugzilla.gnome.org/show_bug.cgi?id=709607
I've seen this bug myself this week but could never reproduce it.. What is the value of $KRB5CCNAME when the bug strikes?
$ echo $KRB5CCNAME KEYRING:persistent:16025 $ id uid=16025(vondruch) gid=16025(vondruch) skupiny=16025(vondruch),10(wheel),135(mock) kontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
About the other bug, it would be nice to see /var/log/secure, the gdm logs only say the authentication failed, but nothing more.
With SSSD guys, we were able to get this backtrace: (gdb) bt full #0 __keyctl (arg5=140737333600352, arg4=0, arg3=4294967294, arg2=16025, cmd=22) at keyutils.c:62 No locals. #1 keyctl (cmd=cmd@entry=22) at keyutils.c:78 va = {{gp_offset = 4160607904, fp_offset = 32767, overflow_arg_area = 0x7fffffffbf40, reg_save_area = 0x7fffffffbf00}} arg2 = 16025 arg3 = 4294967294 arg4 = 0 arg5 = 140737333600352 #2 0x00007ffff72b3b10 in keyctl_get_persistent (uid=uid@entry=16025, id=id@entry=-2) at keyutils.c:234 No locals. #3 0x00007ffff7b34149 in get_persistent_real (uid=16025) at cc_keyring.c:446 key = <optimized out> #4 get_collection (anchor_name=<optimized out>, collection_name=<optimized out>, collection_id_out=collection_id_out@entry=0x7fffffffbf8c) at cc_keyring.c:627 ret = <optimized out> anchor_id = <optimized out> possess_id = 0 ckname = 0x555500000001 <Address 0x555500000001 out of bounds> uidnum = <optimized out> #5 0x00007ffff7b3687c in krb5_krcc_resolve (context=<optimized out>, id=0x7fffffffc020, residual=<optimized out>) at cc_keyring.c:1061 ret = 0 collection_id = 0 cache_id = <optimized out> anchor_name = 0x5555558d34f0 "persistent" collection_name = 0x5555558d33a0 "16025" subsidiary_name = 0x0 #6 0x00007ffff7b2b81a in krb5_cc_resolve (context=0x55555575c830, name=<optimized out>, cache=0x7fffffffc020) at ccbase.c:241 pfx = 0x5555558d34f0 "persistent" cp = <optimized out> resid = 0x555555760d08 "persistent:16025" pfxlen = <optimized out> err = 0 ops = 0x7ffff7dd4aa0 <krb5_krcc_ops> #7 0x0000555555558142 in do_ccache_name (name=0x0) at klist.c:444 code = <optimized out> cache = 0x0 #8 0x00005555555562d5 in main (argc=<optimized out>, argv=<optimized out>) at klist.c:248 retval = 0 c = <optimized out> name = 0x0 mode = 0 keyctl_get_persistent returns errno 122. $ rpm -q keyutils keyutils-1.5.8-1.fc20.x86_64
I think this is happening because either your kernel or your keyutils do not support persistent/big_key. In the libkrb5 code we try to help the user and fallback to use the user keyring with user keys. however there is one negative aspect about this fallback and that is that is uses the normal user quota for keys the quota is set very low by default, around 12/15KiB, so it is easy to fill it up quickly when you have a full ccache in the keyring. The simplest workaround is to just change your ccache type until you can use a kernel that has proper persistent/big_key support. Simo.
I can't speak for Vit, but I was hit by the issue earlier this week as well, just couldn't gather the required data as the bug locked me out completely. I am running kernel-3.11.3-301.fc20.x86_64 and the ccache was working well most of the day, then suddenly stopped working..
(In reply to Simo Sorce from comment #5) I installed F20 aplha together with SSSD more then two week ago. It was working. It was working for 4 days in my previous session when it suddenly stopped working (after 4 days of uptime). After restart, it works OK again. These are kernels I was using during that period: $ rpm -q kernel kernel-3.11.1-300.fc20.x86_64 kernel-3.11.2-301.fc20.x86_64 kernel-3.11.3-301.fc20.x86_64
All kernels newer than kernel-3.11.0-3.fc20.x86_64 are supposed to have the persistent/big_key support in them. David, we need your help investigating this please. Can you identify the reasons why this error code (122) could be returned here?
Ok, so a bunch of us did some digging today on IRC and discovered that the root cause of this error is that the keyring was being created as a "user" keyring type, not a "big_key" type. With some investigation, we realized that this is because the big_key functionality in the kernel was compiled as a module and not loaded by default. The fallback to the user keyring type has very limited storage space and thus might hit the EDQUOT error condition. With discussion with Josh Boyer, who determined that the following needs to be done: "[A]dd a modalias and have keyctl call out to load the module if it isn't there". So for systems that aren't using big_key, we don't need to load the module, but for those that are we will have the kernel load it automatically.
Created attachment 811375 [details] keys-modalias To be clear, I also said that if big_key wasn't really useful as a module then it shouldn't be configurable as such. Anyway, the attached patch is along the lines of what I was thinking. I discussed it a bit with Kyle McMartin as well, so credit should also go to him. Here's a scratch build for people to test with (I haven't tested it myself yet): http://koji.fedoraproject.org/koji/taskinfo?taskID=6051781
(In reply to Josh Boyer from comment #10) > Created attachment 811375 [details] > keys-modalias > > To be clear, I also said that if big_key wasn't really useful as a module > then it shouldn't be configurable as such. > > Anyway, the attached patch is along the lines of what I was thinking. I > discussed it a bit with Kyle McMartin as well, so credit should also go to > him. > > Here's a scratch build for people to test with (I haven't tested it myself > yet): > > http://koji.fedoraproject.org/koji/taskinfo?taskID=6051781 That might not work. I'm working on a slightly different patch.
Created attachment 811489 [details] keys-modalias This one actually works, at least in terms of getting the module auto-loaded. [jwboyer@vader ~]$ modinfo big_key filename: /lib/modules/3.11.4-301.3.fc20.x86_64/kernel/security/keys/big_key.ko alias: keys-big_key license: GPL depends: intree: Y vermagic: 3.11.4-301.3.fc20.x86_64 SMP mod_unload signer: Fedora kernel signing key sig_key: A8:61:C4:8B:C1:2C:39:8F:87:89:F8:55:F9:17:B6:4D:92:58:8A:61 sig_hashalgo: sha256 [jwboyer@vader ~]$ lsmod | grep big_key [jwboyer@vader ~]$ cat bar.c /* cc -o foo foo.c -lkeyutils */ #include <stdlib.h> #include <keyutils.h> int main(void) { return request_key("big_key", "big_key", NULL, KEY_REQKEY_DEFL_DEFAULT); } [jwboyer@vader ~]$ gcc -o bar bar.c -lkeyutils [jwboyer@vader ~]$ ./bar [jwboyer@vader ~]$ lsmod | grep big_key big_key 12672 0 [jwboyer@vader ~]$ Scratch build: http://koji.fedoraproject.org/koji/taskinfo?taskID=6052333
I installed this kernel scratch-build this morning. Good News/Bad News: Bad news: When I tried to log in using SSSD/Kerberos, I got an AVC denial[1]. This resulted in the module not loading and kerberos again falling back to the user keyring. Good news: When I added policy to allow SSSD's krb5_child process to call module_request, it worked fine (so thanks, Josh!). There are two ways (that I know of) to address this in the SELinux policy. 1) We can enable the domain_kernel_load_modules boolean to allow any process to trigger the loading of modules in the kernel. This is certainly not ideal and probably won't fly. 2)The other approach would be to whitelist applications that are allowed to trigger this request. The problem with this approach is that we may not be able to adequately determine which applications are at risk. The obvious ones are 'krb5_child', 'kinit', 'sshd' and 'rpc.gssd', but there may be others. However, given that it can fall back to the user keyring when it gets this denial, it may be acceptable to catch them as we go (since the fallback will still work until the number of retrieved keys grows large). I'm adding Dan Walsh and Miroslav Grepl to the CC list to chime in on this. [1] type=AVC msg=audit(1381751549.195:514): avc: denied { module_request } for pid=2746 comm="krb5_child" kmod="keys-big_key" scontext=system_u:system_r:sssd_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclass=system type=SYSCALL msg=audit(1381751549.195:514): arch=x86_64 syscall=add_key success=no exit=ENODEV a0=7f70a73bc688 a1=7f70a93d9360 a2=7f70a93daeb0 a3=1f1 items=0 ppid=818 pid=2746 auid=4294967295 uid=13041 gid=13041 euid=13041 suid=13041 fsuid=13041 egid=13041 sgid=13041 fsgid=13041 ses=4294967295 tty=(none) comm=krb5_child exe=/usr/libexec/sssd/krb5_child subj=system_u:system_r:sssd_t:s0 key=(null)
Any third party application can do a 'kinit'-like operation, and they should be allowed to do so. I think we should switch the module to be not modular if we can't easily allow any user to load this kernel module, or we'll keep having odd issues.
We discussed this on IRC. Apparently fixing SELinux here is untenable, so we'll build BIG_KEYS into the kernel. A follow up patch to make it a bool in Kconfig will come later.
Proposed as a Freeze Exception for 20-beta by Fedora user sgallagh using the blocker tracking app because: Without this kernel patch, users relying on Kerberos for authentication and SSO will intermittently receive failures and errors (notably with misleading error messages).
kernel-3.11.5-301.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.5-301.fc20
Discussed at 2013-10-16 freeze exception review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2013-10-16/f20beta-blocker-review-4.2013-10-16-16.02.log.txt . Accepted as a freeze exception issue: this is a serious problem for users doing authentication via Kerberos that could interfere with their ability to access and use the system at all, and the fix is small and straightforward (simply build a capability into the kernel rather than as a module).
Package kernel-3.11.5-302.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.5-302.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-19165/kernel-3.11.5-302.fc20 then log in and leave karma (feedback).
Created attachment 813758 [details] kernel-3.11.5-302.fc20.x86_64 So after update with the state of this issue is worse then before! Previously, it took approximately 4 days until this bug appeared, now I can't login using my krb password after 2 hours from boot. One positive news is that I can't see the "[sssd[krb5_child[19287]: Disk quota exceeded" message in my log anymore. $ kinit kinit: Invalid UID in persistent keyring name while getting default ccache $ rpm -q kernel kernel-3.11.3-301.fc20.x86_64 kernel-3.11.4-301.fc20.x86_64 kernel-3.11.5-302.fc20.x86_64 $ uname -a Linux localhost 3.11.5-302.fc20.x86_64 #1 SMP Wed Oct 16 18:09:11 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux $ rpm -q keyutils keyutils-1.5.8-1.fc20.x86_64 $ rpm -q sssd sssd-1.11.1-5.fc20.x86_64 $ rpm -q krb5-libs krb5-libs-1.11.3-24.fc20.x86_64
(In reply to Vít Ondruch from comment #20) > One positive news is that I can't see the "[sssd[krb5_child[19287]: Disk > quota exceeded" message in my log anymore. FWIW, this error message was just a strerror() representation of error code that bubbled up from keyutils to libkrb5 and to the sssd.
Created attachment 813856 [details] Fix keyring quota leak There are two leaks in keyring quota handling: (1) If a key is replaced in a keyring, the keyring allocates four extra bytes of the user's quota, even though the keyring did not change in size. (2) When a key is unlinked from a keyring, the quota recovered is not credited to the user's account. This patch should fix both of those. This patch will need to go to James Morris's security tree next branch, not Linus's tree. Note that "keyctl clear <keyring>" will recover the quota because the unrecovered/excessive quota is still recorded on the keyring.
Patch applied in git.
Same result as Vit Ondruch. I don't know if I'm mis-reading comment #22, but 'keyctl clear @u' doesn't help - I still can't use my password after issuing that.
jwb: I see the "Build BIG_KEYS into the kernel (rhbz 1017683)" commit in f20, but not in master, and in master's config-generic, there's still: CONFIG_BIG_KEYS=m Does that change need to go to master too?
(In reply to Adam Williamson from comment #24) > Same result as Vit Ondruch. I don't know if I'm mis-reading comment #22, but > 'keyctl clear @u' doesn't help - I still can't use my password after issuing > that. If you aren't testing with 3.11.6-300, you don't have the fix from David. The original kernel update for this was just to build the config change in, then we added the patch with a later commit (referred to in comment #23). That got built with 3.11.6-300. http://koji.fedoraproject.org/koji/buildinfo?buildID=472397 (In reply to Adam Williamson from comment #25) > jwb: I see the "Build BIG_KEYS into the kernel (rhbz 1017683)" commit in > f20, but not in master, and in master's config-generic, there's still: > > CONFIG_BIG_KEYS=m > > Does that change need to go to master too? Yes. Done now. Thanks for the reminder.
I have booted kernel 3.11.6-300 for 2.5h and so far so good (in comparison to the morning, when using 3.11.5-302 kernel, 2.5h was enough to get out of the keyring space). /proc/key-users reports constant values. @Josh I assume that new kernel update is not filled since you are waiting for previous kernel to get into stable, is that right? Or is it just omission?
(In reply to Vít Ondruch from comment #27) > I have booted kernel 3.11.6-300 for 2.5h and so far so good (in comparison > to the morning, when using 3.11.5-302 kernel, 2.5h was enough to get out of > the keyring space). /proc/key-users reports constant values. > > @Josh I assume that new kernel update is not filled since you are waiting > for previous kernel to get into stable, is that right? Or is it just > omission? Simply a timing issue. I'll file it today.
kernel-3.11.6-300.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/kernel-3.11.6-300.fc20
Package kernel-3.11.6-300.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.11.6-300.fc20' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-19611/kernel-3.11.6-300.fc20 then log in and leave karma (feedback).
Fix is looking good here so far too.
kernel-3.11.6-300.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
Do we have this fix in Rawhide?
(In reply to Daniel Walsh from comment #33) > Do we have this fix in Rawhide? In the kernel repo, yes. Should be in a built kernel in rawhide tomorrow with this build: http://koji.fedoraproject.org/koji/buildinfo?buildID=472963 (the 3.12.0-0.rc6-git0.1 build has the patch, but big_keys was still a module).
Re-opening. In order for this fix to be complete, keyutils-1.5.8-1.fc20 also needs to land in the stable repo (it's been queued for it since the day of Beta Freeze, but apparently missed the cutoff).
keyutils-1.5.8-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/FEDORA-2013-18334/keyutils-1.5.8-1.fc20
I'll put it in the next F20 stable push request.
keyutils-1.5.8-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
This can be closed, now.
kernel-3.11.5-302.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.