Bug 591015 - kernel BUG at kernel/cred.c:168
Summary: kernel BUG at kernel/cred.c:168
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 620380
TreeView+ depends on / blocked
 
Reported: 2010-05-11 08:21 UTC by Rik Theys
Modified: 2010-09-03 13:12 UTC (History)
9 users (show)

Fixed In Version: 2.6.34.6-47.fc13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 620380 (view as bug list)
Environment:
Last Closed: 2010-09-03 13:12:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
kernel config (102.32 KB, application/octet-stream)
2010-05-20 11:11 UTC, Rik Theys
no flags Details
proposed fix (683 bytes, patch)
2010-06-16 10:14 UTC, Jiri Olsa
no flags Details | Diff
reproducer (396 bytes, text/x-csrc)
2010-06-16 10:15 UTC, Jiri Olsa
no flags Details

Description Rik Theys 2010-05-11 08:21:48 UTC
Description of problem:

kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0 
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex 745                 
RIP: 0010:[<ffffffff81069881>]  [<ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:ffff88019e7e9eb8  EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff880161514480 RCX: 00000000ffffffff
RDX: 00000000ffffffff RSI: ffff880140c690c0 RDI: ffff880140c690c0
RBP: ffff88019e7e9eb8 R08: 00000000000000d0 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000040 R12: ffff880140c690c0
R13: ffff88019e77aea0 R14: 00007fff336b0a5c R15: 0000000000000001
FS:  00007f12f50d97c0(0000) GS:ffff880007400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8f461bc000 CR3: 00000001b26ce000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process master (pid: 2436, threadinfo ffff88019e7e8000, task ffff88019e77aea0)
Stack:
 ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0> ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0> ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
 [<ffffffff810698cd>] put_cred+0x13/0x15
 [<ffffffff81069b45>] commit_creds+0x16b/0x175
 [<ffffffff8106aace>] set_current_groups+0x47/0x4e
 [<ffffffff8106ac89>] sys_setgroups+0xf6/0x105
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 
RIP  [<ffffffff81069881>] __put_cred+0xc/0x45
 RSP <ffff88019e7e9eb8>
---[ end trace df391256a100ebdd ]---


Version-Release number of selected component (if applicable):

kernel 2.6.33.3-85.fc13.x86_64

How reproducible:

don't know

Steps to Reproduce:
1. run system
2.
3.
  
Actual results:

kernel BUG

Expected results:

no kernel BUG

Additional info:

The BUG seems to be in ksm, but there are no kvm processes running that could have their pages merged.

Comment 1 Jiri Olsa 2010-05-20 10:23:52 UTC
hi,
could you please share the kernel config file?

thanks,
jirka

Comment 2 Rik Theys 2010-05-20 11:11:59 UTC
Created attachment 415388 [details]
kernel config

Comment 3 Rik Theys 2010-05-20 11:12:47 UTC
I've attached the requested kernel config file. The processes that gets killed always seems to be the postfix master process.

Comment 4 Jiri Olsa 2010-05-20 13:52:05 UTC
thanks,
so you can reproduce the issue?
if thats the case, could you please specify more details?

Comment 5 Rik Theys 2010-05-21 08:10:03 UTC
I don't have a recipe to reproduce the problem. Sometimes it happens a few minutes after boot, but I've had it a few times when I was still logged in but not using the computer (at 2 am). The only processes running on the system at that time were firefox and a few konsole processes with an idle ssh session.

The affected PID always seems to be the postfix master process. When the BUG has occurred the system still works OK but I have to restart postfix.

If there's a debug kernel or postfix I can run for a few days, I would be willing to run it.

Comment 6 Jiri Olsa 2010-05-21 12:11:36 UTC
that's great,

I made debug kernel and switched on some credentials debug output..
you can download the kernel from

http://people.redhat.com/jolsa/cred/

I'm not sure how much online I'll be next week, but will definitely get back
the week after

thanks,
jirka

Comment 7 Rik Theys 2010-05-21 12:42:14 UTC
I'm running the debug kernel now so let's hope we see the BUG now. The kernel does spew a lot of debug into my messages file. Let's hope my disk doesn't fill up before the bug is triggered...

Comment 8 Jiri Olsa 2010-05-21 14:30:42 UTC
hopefully we could trigger the bug with all the output info,

you could play with logrotate and cron.hourly to keep the output
on some reasonable size

please let me know if you'd need help with that, I played with it before,
but would need some man reading to refresh :)

Comment 9 Jiri Olsa 2010-05-31 06:32:53 UTC
looks like no results over the last week.. any news?

Comment 10 Rik Theys 2010-05-31 08:13:56 UTC
The bug hasn't occurred with the debug kernel :-(. Maybe it was fixed somewhere between 2.6.33.3-85 and 2.6.33.4-95 or the debug kernel? 

Does the line

last sysfs file: /sys/kernel/mm/ksm/run

in the bug report indicate any relation with ksm? Or is this line added in all BUG printouts now?

I've had a similar BUG on my home computer (running F12) this weekend (not in creds.c) and it also showed the 'last sysfs file' line.

Comment 11 Jiri Olsa 2010-05-31 10:07:34 UTC
the "last sysfs file" line is common for bug report, so it's probably not
ksm related AFAICS

can you see the issue in the latest fedora kernel?

Comment 12 Rik Theys 2010-05-31 11:11:24 UTC
I'll boot the 2.6.33.4-95.fc13.x86_64 kernel and see if the bug shows up. It used to show in with the 2.6.33.3-85.fc13.x86_64 kernel.

Comment 13 Rik Theys 2010-05-31 11:30:49 UTC
I'll boot the 2.6.33.4-95.fc13.x86_64 kernel and see if the bug shows up. It used to show in with the 2.6.33.3-85.fc13.x86_64 kernel.

Comment 14 Rik Theys 2010-06-11 07:45:50 UTC
After running the debug kernel without triggering the bug for a while, I've now booted 2.6.33.5-112.fc13.x86_64 and the bug has already triggered. Once again it's the postfix master process that gets killed.


[  778.589226] [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on tmds encoder (output 2)
[30469.713179] ------------[ cut here ]------------
[30469.713184] kernel BUG at kernel/cred.c:168!
[30469.713187] invalid opcode: 0000 [#1] SMP 
[30469.713191] last sysfs file: /sys/kernel/mm/ksm/run
[30469.713193] CPU 1 
[30469.713198] Pid: 2231, comm: master Not tainted 2.6.33.5-112.fc13.x86_64 #1 0HR330/OptiPlex 745                 
[30469.713202] RIP: 0010:[<ffffffff81069875>]  [<ffffffff81069875>] __put_cred+0xc/0x45
[30469.713212] RSP: 0018:ffff8801a4b7beb8  EFLAGS: 00010202
[30469.713215] RAX: 0000000000000001 RBX: ffff880196e153c0 RCX: 00000000ffffffff
[30469.713219] RDX: 00000000ffffffff RSI: ffff880196e15180 RDI: ffff880196e15180
[30469.713222] RBP: ffff8801a4b7beb8 R08: 00000000000000d0 R09: 0000000000000000
[30469.713225] R10: 0000000000000001 R11: 0000000000000040 R12: ffff880196e15180
[30469.713228] R13: ffff8801a4b5dd40 R14: 00007fff23da3b7c R15: 0000000000000001
[30469.713232] FS:  00007f0f84a487c0(0000) GS:ffff880007440000(0000) knlGS:0000000000000000
[30469.713236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30469.713239] CR2: 00007ff53afc8000 CR3: 00000001b3495000 CR4: 00000000000006e0
[30469.713242] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[30469.713246] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[30469.713249] Process master (pid: 2231, threadinfo ffff8801a4b7a000, task ffff8801a4b5dd40)
[30469.713252] Stack:
[30469.713254]  ffff8801a4b7bec8 ffffffff810698c1 ffff8801a4b7bef8 ffffffff81069b39
[30469.713259] <0> ffff880196e15480 ffff880196e153c0 ffff880196e15480 0000000000000000
[30469.713264] <0> ffff8801a4b7bf28 ffffffff8106aac2 0000000000000001 0000000000000246
[30469.713270] Call Trace:
[30469.713275]  [<ffffffff810698c1>] put_cred+0x13/0x15
[30469.713279]  [<ffffffff81069b39>] commit_creds+0x16b/0x175
[30469.713284]  [<ffffffff8106aac2>] set_current_groups+0x47/0x4e
[30469.713288]  [<ffffffff8106ac7d>] sys_setgroups+0xf6/0x105
[30469.713294]  [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
[30469.713297] Code: 48 8d 71 ff e8 e2 4c 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 53 49 15 00 48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b 04 25 00 cc 00 00 48 3b b8 58 04 00 00 75 
[30469.713338] RIP  [<ffffffff81069875>] __put_cred+0xc/0x45
[30469.713343]  RSP <ffff8801a4b7beb8>
[30469.713347] ---[ end trace 0b23929def5cb677 ]---
[39357.914793] Slow work thread pool: Starting up
[39357.914908] Slow work thread pool: Ready
[39357.915003] FS-Cache: Loaded
[39357.951002] FS-Cache: Netfs 'nfs' registered for caching

Comment 15 Jiri Olsa 2010-06-14 10:22:11 UTC
I managed to write a reproducer, hopefully the fix will follow soon ;)
thanks for the provided info

Comment 16 Jiri Olsa 2010-06-14 21:52:45 UTC
plz try kernel from http://people.redhat.com/jolsa/cred/

Comment 17 Rik Theys 2010-06-15 07:49:56 UTC
I've booted the kernel you provided and so far no issues :-). Only booted it 1.5 hours ago. I will let you know if the issue comes up.

Comment 18 Jiri Olsa 2010-06-16 10:14:11 UTC
Created attachment 424394 [details]
proposed fix

Comment 19 Jiri Olsa 2010-06-16 10:15:49 UTC
Created attachment 424395 [details]
reproducer

run the program with the current user group number
in another terminal run command which is printed as the reproducer output

Comment 20 Rik Theys 2010-06-16 12:24:54 UTC
When running the reproducer under 2.6.33.5-129.cred.fc13.x86_64, I get a segmentation fault:

[rtheys@lucifer download]$ ./exs
run:
while [ 1 ]; do cat /proc/3912/status; done
Segmentation fault

Is that the expected behaviour on a "fixed" kernel?

I also tried in on a F13 system running 2.6.33.5-112.fc13.x86_64 and it also segfaults there.

I compiled the program using:

gcc -o exs exs.c

Comment 21 Jiri Olsa 2010-06-16 14:22:31 UTC
you need to pass your group number as a parameter to the exs program

Comment 22 Rik Theys 2010-06-24 07:28:15 UTC
When running the program as non-root, I get a

setgroups: Operation not permitted

error.

Running it as root seems to work.

I started the printed command in another terminal and kept it running. So far it hasn't triggered the bug yet. How fast should I see it trigger the bug?

I'm currently running 2.6.33.5-129.cred.fc13.x86_64

Comment 23 Jiri Olsa 2010-06-30 16:14:34 UTC
hm in few seconds for me
you're probably sure you're not running the fixed kernel.. :)

I sent out the patch, goot some ACKs, but the patch was not applied
so far

jirka

Comment 24 Jiri Olsa 2010-06-30 16:17:51 UTC
you can see the status in here:

http://marc.info/?l=linux-security-module&m=127747281817689&w=2

Comment 26 Chuck Ebbert 2010-08-04 14:09:30 UTC
The fix will be in 2.6.34.2-33

Comment 27 Fedora Update System 2010-08-07 05:00:32 UTC
kernel-2.6.34.2-34.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13

Comment 28 Fedora Update System 2010-08-07 23:28:27 UTC
kernel-2.6.34.2-34.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13

Comment 29 Fedora Update System 2010-08-10 23:53:32 UTC
kernel-2.6.34.3-37.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13

Comment 30 Fedora Update System 2010-08-11 07:25:53 UTC
kernel-2.6.34.3-37.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13

Comment 31 Chuck Ebbert 2010-08-18 09:46:34 UTC
2.6.34 kernel has been withdrawn.


Note You need to log in before you can comment on or make changes to this bug.