Description of problem: We are running MRG Kernel on Dell PowerEdge R610 NEHALEM Servers When we run "cset set -l" the server panics and hangs. We cannot do anything, except powercycling the server. Version-Release number of selected component (if applicable): Original MRG Kernel is : kernel-rt-2.6.24.7-108.el5rt We even upgraded to : kernel-rt-2.6.24.7-137.el5rt cpuset-1.5.1-1.1 How reproducible: out of 4 reboots, it crashes atleast once. Steps to Reproduce: 1. Boot the server with MRG Kernel 2. cset set -l OR cset set 3. Actual results: [<ffffffff8106d725>] cgroup_iter_next+0x11/0x39 PGD 63c55a067 PUD 62f8a7067 PMD 0 Oops: 0000 [1] PREEMPT SMP CPU 0 Modules linked in: ipv6 nfs lockd nfs_acl sunrpc dm_mirror dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac parport_pc lp parport joyddPid: 7223, comm: cset.base Not tainted 2.6.24.7-137.el5rt #1 RIP: 0010:[<ffffffff8106d725>] [<ffffffff8106d725>] cgroup_iter_next+0x11/0x39 RSP: 0018:ffff8106395a7d70 EFLAGS: 00010286 RAX: 0000000000100100 RBX: 0000000000000000 RCX: ffff81033c1dff00 RDX: 0000000000100100 RSI: ffff8106395a7da8 RDI: ffff81033a890028 RBP: ffff8106395a7d78 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000003 R11: ffff8106395a7d58 R12: ffff8106395a7da8 R13: ffff81063a112720 R14: 0000000000000000 R15: 0000000000000153 FS: 00007f538f4676e0(0000) GS:ffffffff813f5100(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000100100 CR3: 000000063cc37000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cset.base (pid: 7223, threadinfo ffff8106395a6000, task ffff81062cc64280) Stack: ffff81062f83c800 ffff8106395a7df8 ffffffff8106fc3b ffff8106395a7d98 ffff81062f9337c0 00000001395a7dd8 ffff81033a890028 ffff81033c1dff00 0000000000100100 0000000000000010 ffff81033a48f160 0000000000000000 Call Trace: [<ffffffff8106fc3b>] cgroup_tasks_open+0xe9/0x1a8 [<ffffffff8106f8e1>] ? cgroup_file_open+0x0/0x49 [<ffffffff8106f921>] cgroup_file_open+0x40/0x49 [<ffffffff810af1c5>] __dentry_open+0x139/0x212 [<ffffffff810af336>] nameidata_to_filp+0x2d/0x3f [<ffffffff810af37e>] do_filp_open+0x36/0x46 [<ffffffff810abc28>] ? kmem_cache_alloc+0xbb/0xe9 [<ffffffff810af071>] ? get_unused_fd_flags+0x113/0x121 [<ffffffff810af3df>] do_sys_open+0x51/0xd2 [<ffffffff810af489>] sys_open+0x1b/0x1d [<ffffffff8100c23e>] system_call_ret+0x0/0x5 Code: 8b 51 20 48 8d 42 18 48 39 42 18 74 dd 48 89 0e 48 8b 42 18 48 89 46 08 c9 c3 55 48 8b 46 08 48 89 e5 53 31 db 48 83 3e 00 74 22 <48> 8b 10 4 RIP [<ffffffff8106d725>] cgroup_iter_next+0x11/0x39 RSP <ffff8106395a7d70> CR2: 0000000000100100 Kernel panic - not syncing: Fatal exception Pid: 7223, comm: cset.base Tainted: G D 2.6.24.7-137.el5rt #1 Call Trace: [<ffffffff8103dcec>] panic+0xaf/0x160 [<ffffffff8100c886>] ? retint_kernel+0x26/0x30 [<ffffffff8128aa24>] ? oops_end+0x3d/0x5d [<ffffffff8128aa3b>] oops_end+0x54/0x5d [<ffffffff8128c574>] do_page_fault+0x67e/0x76d [<ffffffff8105f3dd>] ? try_to_take_rw_read+0x4ae/0x5a8 [<ffffffff81060486>] ? rt_read_slowlock+0x7c/0x302 [<ffffffff81060486>] ? rt_read_slowlock+0x7c/0x302 [<ffffffff8128a6c9>] error_exit+0x0/0x51 [<ffffffff8106d725>] ? cgroup_iter_next+0x11/0x39 [<ffffffff8106fc3b>] ? cgroup_tasks_open+0xe9/0x1a8 [<ffffffff8106f8e1>] ? cgroup_file_open+0x0/0x49 [<ffffffff8106f921>] ? cgroup_file_open+0x40/0x49 [<ffffffff810af1c5>] ? __dentry_open+0x139/0x212 [<ffffffff810af336>] ? nameidata_to_filp+0x2d/0x3f [<ffffffff810af37e>] ? do_filp_open+0x36/0x46 [<ffffffff810abc28>] ? kmem_cache_alloc+0xbb/0xe9 [<ffffffff810af071>] ? get_unused_fd_flags+0x113/0x121 [<ffffffff810af3df>] ? do_sys_open+0x51/0xd2 [<ffffffff810af489>] ? sys_open+0x1b/0x1d [<ffffffff8100c23e>] ? system_call_ret+0x0/0x5 Expected results: Additional info:
I tried this on a del 610 with 16 CPUs (2x4-core hyperthreaded). I downloaded libbitmask and libcpuset from: ftp://oss.sgi.com/projects/cpusets/download/libbitmask-2.0.tar.bz2 ftp://oss.sgi.com/projects/cpusets/download/libcpuset-1.0.tar.bz2 I built and installed them and installed cpuset 1.5.2 I then ran: [root@dell-r610-1 cpuset-1.5.2]# ./cset set cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0-1 y 458 0 / [root@dell-r610-1 cpuset-1.5.2]# ./cset set -l cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0-1 y 458 0 / I did both of the above several times. Then just to play, I did: [root@dell-r610-1 cpuset-1.5.2]# ./cset set -c 8-15 test cset: --> created cpuset "test" [root@dell-r610-1 cpuset-1.5.2]# ./cset set -l cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0-1 y 458 1 / test 8-15 n 0 n 0 0 /test [root@dell-r610-1 cpuset-1.5.2]# ./cset set -m 1 test cset: --> modified cpuset "test" [root@dell-r610-1 cpuset-1.5.2]# ./cset set -l cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-15 y 0-1 y 458 1 / test 8-15 n 1 n 0 0 /test And everything worked fine. I could not reproduce the bug. Perhaps there is some other configuration I need to perform. Can you please run sosreport and attach the resulting file.
I have provided the sosreport that was attached to the IT.
Created attachment 375927 [details] sosreport for RT system with cgroup oops
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0041.html
Event posted on 03-12-2010 11:04am CST by jbrier Customer doesn't have a vmcore. Has Engineering made any progress on this? === we are using "cset set -l" command - this crashes/panics the system. I dont have core at this time. Meanwhile can you please investigate why the patch that you recommended is not working ? === Internal Status set to 'Waiting on SEG' This event sent from IssueTracker by streeter issue 371974
I haven't been able to reproduce this on our 2.6.33-based kernel.