Description of problem: kernel BUG at kernel/cred.c:168! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745 RIP: 0010:[<ffffffff81069881>] [<ffffffff81069881>] __put_cred+0xc/0x45 RSP: 0018:ffff88019e7e9eb8 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0 RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0 R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001 FS: 00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0) Stack: ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45 <0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000 <0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246 Call Trace: [<ffffffff810698cd>] put_cred+0x13/0x15 [<ffffffff81069b45>] commit_creds+0x16b/0x175 [<ffffffff8106aace>] set_current_groups+0x47/0x4e [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 RIP [<ffffffff81069881>] __put_cred+0xc/0x45 RSP <ffff88019e7e9eb8> ---[ end trace df391256a100ebdd ]--- Version-Release number of selected component (if applicable): kernel 2.6.33.3-85.fc13.x86_64 How reproducible: don't know Steps to Reproduce: 1. run system 2. 3. Actual results: kernel BUG Expected results: no kernel BUG Additional info: The BUG seems to be in ksm, but there are no kvm processes running that could have their pages merged.
hi, could you please share the kernel config file? thanks, jirka
Created attachment 415388 [details] kernel config
I've attached the requested kernel config file. The processes that gets killed always seems to be the postfix master process.
thanks, so you can reproduce the issue? if thats the case, could you please specify more details?
I don't have a recipe to reproduce the problem. Sometimes it happens a few minutes after boot, but I've had it a few times when I was still logged in but not using the computer (at 2 am). The only processes running on the system at that time were firefox and a few konsole processes with an idle ssh session. The affected PID always seems to be the postfix master process. When the BUG has occurred the system still works OK but I have to restart postfix. If there's a debug kernel or postfix I can run for a few days, I would be willing to run it.
that's great, I made debug kernel and switched on some credentials debug output.. you can download the kernel from http://people.redhat.com/jolsa/cred/ I'm not sure how much online I'll be next week, but will definitely get back the week after thanks, jirka
I'm running the debug kernel now so let's hope we see the BUG now. The kernel does spew a lot of debug into my messages file. Let's hope my disk doesn't fill up before the bug is triggered...
hopefully we could trigger the bug with all the output info, you could play with logrotate and cron.hourly to keep the output on some reasonable size please let me know if you'd need help with that, I played with it before, but would need some man reading to refresh :)
looks like no results over the last week.. any news?
The bug hasn't occurred with the debug kernel :-(. Maybe it was fixed somewhere between 2.6.33.3-85 and 2.6.33.4-95 or the debug kernel? Does the line last sysfs file: /sys/kernel/mm/ksm/run in the bug report indicate any relation with ksm? Or is this line added in all BUG printouts now? I've had a similar BUG on my home computer (running F12) this weekend (not in creds.c) and it also showed the 'last sysfs file' line.
the "last sysfs file" line is common for bug report, so it's probably not ksm related AFAICS can you see the issue in the latest fedora kernel?
I'll boot the 2.6.33.4-95.fc13.x86_64 kernel and see if the bug shows up. It used to show in with the 2.6.33.3-85.fc13.x86_64 kernel.
After running the debug kernel without triggering the bug for a while, I've now booted 2.6.33.5-112.fc13.x86_64 and the bug has already triggered. Once again it's the postfix master process that gets killed. [ 778.589226] [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on tmds encoder (output 2) [30469.713179] ------------[ cut here ]------------ [30469.713184] kernel BUG at kernel/cred.c:168! [30469.713187] invalid opcode: 0000 [#1] SMP [30469.713191] last sysfs file: /sys/kernel/mm/ksm/run [30469.713193] CPU 1 [30469.713198] Pid: 2231, comm: master Not tainted 2.6.33.5-112.fc13.x86_64 #1 0HR330/OptiPlex 745 [30469.713202] RIP: 0010:[<ffffffff81069875>] [<ffffffff81069875>] __put_cred+0xc/0x45 [30469.713212] RSP: 0018:ffff8801a4b7beb8 EFLAGS: 00010202 [30469.713215] RAX: 0000000000000001 RBX: ffff880196e153c0 RCX: 00000000ffffffff [30469.713219] RDX: 00000000ffffffff RSI: ffff880196e15180 RDI: ffff880196e15180 [30469.713222] RBP: ffff8801a4b7beb8 R08: 00000000000000d0 R09: 0000000000000000 [30469.713225] R10: 0000000000000001 R11: 0000000000000040 R12: ffff880196e15180 [30469.713228] R13: ffff8801a4b5dd40 R14: 00007fff23da3b7c R15: 0000000000000001 [30469.713232] FS: 00007f0f84a487c0(0000) GS:ffff880007440000(0000) knlGS:0000000000000000 [30469.713236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [30469.713239] CR2: 00007ff53afc8000 CR3: 00000001b3495000 CR4: 00000000000006e0 [30469.713242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [30469.713246] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [30469.713249] Process master (pid: 2231, threadinfo ffff8801a4b7a000, task ffff8801a4b5dd40) [30469.713252] Stack: [30469.713254] ffff8801a4b7bec8 ffffffff810698c1 ffff8801a4b7bef8 ffffffff81069b39 [30469.713259] <0> ffff880196e15480 ffff880196e153c0 ffff880196e15480 0000000000000000 [30469.713264] <0> ffff8801a4b7bf28 ffffffff8106aac2 0000000000000001 0000000000000246 [30469.713270] Call Trace: [30469.713275] [<ffffffff810698c1>] put_cred+0x13/0x15 [30469.713279] [<ffffffff81069b39>] commit_creds+0x16b/0x175 [30469.713284] [<ffffffff8106aac2>] set_current_groups+0x47/0x4e [30469.713288] [<ffffffff8106ac7d>] sys_setgroups+0xf6/0x105 [30469.713294] [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b [30469.713297] Code: 48 8d 71 ff e8 e2 4c 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 53 49 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 [30469.713338] RIP [<ffffffff81069875>] __put_cred+0xc/0x45 [30469.713343] RSP <ffff8801a4b7beb8> [30469.713347] ---[ end trace 0b23929def5cb677 ]--- [39357.914793] Slow work thread pool: Starting up [39357.914908] Slow work thread pool: Ready [39357.915003] FS-Cache: Loaded [39357.951002] FS-Cache: Netfs 'nfs' registered for caching
I managed to write a reproducer, hopefully the fix will follow soon ;) thanks for the provided info
plz try kernel from http://people.redhat.com/jolsa/cred/
I've booted the kernel you provided and so far no issues :-). Only booted it 1.5 hours ago. I will let you know if the issue comes up.
Created attachment 424394 [details] proposed fix
Created attachment 424395 [details] reproducer run the program with the current user group number in another terminal run command which is printed as the reproducer output
When running the reproducer under 2.6.33.5-129.cred.fc13.x86_64, I get a segmentation fault: [rtheys@lucifer download]$ ./exs run: while [ 1 ]; do cat /proc/3912/status; done Segmentation fault Is that the expected behaviour on a "fixed" kernel? I also tried in on a F13 system running 2.6.33.5-112.fc13.x86_64 and it also segfaults there. I compiled the program using: gcc -o exs exs.c
you need to pass your group number as a parameter to the exs program
When running the program as non-root, I get a setgroups: Operation not permitted error. Running it as root seems to work. I started the printed command in another terminal and kept it running. So far it hasn't triggered the bug yet. How fast should I see it trigger the bug? I'm currently running 2.6.33.5-129.cred.fc13.x86_64
hm in few seconds for me you're probably sure you're not running the fixed kernel.. :) I sent out the patch, goot some ACKs, but the patch was not applied so far jirka
you can see the status in here: http://marc.info/?l=linux-security-module&m=127747281817689&w=2
Upstream fix: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff_plain;h=de09a9771a5346029f4d11e4ac886be7f9bfdd75
The fix will be in 2.6.34.2-33
kernel-2.6.34.2-34.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13
kernel-2.6.34.2-34.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13
kernel-2.6.34.3-37.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13
kernel-2.6.34.3-37.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13
2.6.34 kernel has been withdrawn.