While running cgroup controller test on F16 PPC64 (Fedora-20111017-ppc64-DVD.iso) on P7 Juno-L systems, noticed following test case failure and oomkiller. During this instance of time , system was not responding. To reclaim the system needed to hard reboot for HMC. FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-2011_Oct_24-06h_41m_29s.failed Running tests....... <<<test_start>>> tag=cgroup stime=1319452890 cmdline=" cgroup_regression_test.sh" contacts="" analysis=exit <<<test_output>>> cgroup_regression_test 1 TPASS : no kernel bug was found /opt/ltp/testcases/bin/cgroup_regression_test.sh: line 99: 1032 Terminated ./fork_processes cgroup_regression_test 2 TPASS : notify_on_release is inherited mount: xxx already mounted or cgroup/ busy cgroup_regression_test 3 TFAIL : Failed to mount cpu subsys cgroup_regression_test 4 TCONF : CONFIG_LOCKDEP is not enabled mount: xxx already mounted or cgroup/ busy cgroup_regression_test 5 TFAIL : mount perf_event and blkio failed cgroup_regression_test 6 TCONF : CONFIG_CGROUP_NS /opt/ltp/testcases/bin/cgroup_regression_test.sh: line 360: 11042 Terminated sleep 100 < cgroup/0 And I got following call traces in dmesg and /var/log/messages : [ 426.746939] Memory cgroup out of memory: Kill process 23182 (memcg_test_1) score 1 or sacrifice child [ 426.746948] Killed process 23182 (memcg_test_1) total-vm:3840kB, anon-rss:640kB, file-rss:0kB [ 426.761928] Memory cgroup out of memory: Kill process 23183 (memcg_test_1) score 1 or sacrifice child [ 426.761936] Killed process 23183 (memcg_test_1) total-vm:3840kB, anon-rss:640kB, file-rss:0kB [ 434.408312] Adding 6225856k swap on /dev/mapper/vg_elm17f131-lv_swap. Priority:-1 extents:1 across:6225856k [ 476.177845] oom_kill_process: 83 callbacks suppressed [ 476.177869] memcg_process invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0 [ 476.177891] memcg_process cpuset=/ mems_allowed=0 [ 476.177898] Call Trace: [ 476.177907] [c0000000ebc9b240] [c000000000014c64] .show_stack+0x94/0x144 (unreliable) [ 476.177922] [c0000000ebc9b300] [c00000000066df0c] .dump_stack+0x24/0x2c [ 476.177933] [c0000000ebc9b380] [c00000000015aaac] .dump_header+0xac/0x1dc [ 476.177943] [c0000000ebc9b490] [c00000000015aedc] .oom_kill_process+0x68/0x2b0 [ 476.177955] [c0000000ebc9b570] [c00000000015b200] .mem_cgroup_out_of_memory+0xdc/0x11c [ 476.177966] [c0000000ebc9b620] [c0000000001aeac8] .mem_cgroup_handle_oom+0x1ec/0x328 [ 476.177978] [c0000000ebc9b720] [c0000000001aefa8] .__mem_cgroup_try_charge.constprop.15+0x3a4/0x4e8 [ 476.177990] [c0000000ebc9b860] [c0000000001af150] .mem_cgroup_charge_common+0x64/0x94 [ 476.178001] [c0000000ebc9b920] [c0000000001afdb8] .mem_cgroup_newpage_charge+0x74/0x8c [ 476.178013] [c0000000ebc9b9c0] [c00000000017d8c8] .handle_pte_fault+0x290/0xb84 [ 476.178023] [c0000000ebc9bac0] [c00000000017f180] .handle_mm_fault+0x1a4/0x1b4 [ 476.178034] [c0000000ebc9bb80] [c0000000006677ac] .do_page_fault+0x474/0x704 [ 476.178045] [c0000000ebc9be30] [c000000000006438] handle_page_fault+0x20/0x74 [ 476.178054] Task in /11 killed as a result of limit of /11 [ 476.178064] memory: usage 64kB, limit 64kB, failcnt 8 [ 476.178071] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0 [ 476.178078] Mem-Info: [ 476.178082] Node 0 DMA per-cpu: [ 476.178090] CPU 0: hi: 6, btch: 1 usd: 5 [ 476.178096] CPU 1: hi: 6, btch: 1 usd: 0 [ 476.178103] CPU 2: hi: 6, btch: 1 usd: 5 [ 476.178109] CPU 3: hi: 6, btch: 1 usd: 3 [ 476.178124] active_anon:1963 inactive_anon:2106 isolated_anon:0 [ 476.178126] active_file:666 inactive_file:2630 isolated_file:0 [ 476.178128] unevictable:0 dirty:9 writeback:0 unstable:0 [ 476.178130] free:55567 slab_reclaimable:241 slab_unreclaimable:810 [ 476.178132] mapped:518 shmem:2106 pagetables:227 bounce:0 [ 476.178151] Node 0 DMA free:3556288kB min:8128kB low:10112kB high:12160kB active_anon:125632kB inactive_anon:134784kB active_file:42624kB inactive_file:168320kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4190720kB mlocked:0kB dirty:576kB writeback:0kB mapped:33152kB shmem:134784kB slab_reclaimable:15424kB slab_unreclaimable:51840kB kernel_stack:2784kB pagetables:14528kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 476.178198] lowmem_reserve[]: 0 0 0 [ 476.178209] Node 0 DMA: 221*64kB 134*128kB 122*256kB 57*512kB 19*1024kB 4*2048kB 5*4096kB 1*8192kB 208*16384kB = 3555904kB [ 476.178238] 5400 total pagecache pages [ 476.178243] 0 pages in swap cache [ 476.178249] Swap cache stats: add 828, delete 828, find 3/5 [ 476.178255] Free swap = 6225856kB [ 476.178260] Total swap = 6225856kB [ 476.179384] 65536 pages RAM [ 476.179390] 805 pages reserved [ 476.179396] 2735 pages shared [ 476.179400] 8341 pages non-shared [ 476.179407] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name [ 476.179431] [23504] 0 23504 61 19 0 0 0 memcg_process [ 476.179443] Memory cgroup out of memory: Kill process 23504 (memcg_process) score 1 or sacrifice child [ 476.179455] Killed process 23504 (memcg_process) total-vm:3904kB, anon-rss:576kB, file-rss:640kB [ 480.241991] memcg_process invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0 [ 480.242013] memcg_process cpuset=/ mems_allowed=0 NOTE: I have attached two files named as nohup.out and dmesg.out file for reference , which contains output of test execution & dmesg output respectively. --- More information ---- [root@elm17f131 ~]# uname -a Linux elm17f131.xxx.xxx.xxx 3.1.0-0.rc9.git0.2.fc16.kh.ppc64 #1 SMP Wed Oct 12 22:41:01 UTC 2011 ppc64 ppc64 ppc64 GNU/Linux [root@elm17f131 ~]# lscpu Architecture: ppc64 Byte Order: Big Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 4 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Model: IBM,8231-E2B L1d cache: 32K L1i cache: 32K NUMA node0 CPU(s): 0-3 [root@elm17f131 ~]# free -m total used free shared buffers cached Mem: 4045 586 3459 0 27 365 -/+ buffers/cache: 193 3851 Swap: 6079 0 6079 --- Steps to reproduce --- 1) Install ltp latest tar ball 2) # PRE-REQUISITES: - Kernel >= 2.6.24 with proper config options: CONFIG_CGROUPS=y CONFIG_CGROUP_DEBUG=y CONFIG_CGROUP_NS=y CONFIG_CPUSETS=y CONFIG_PROC_PID_CPUSET=y CONFIG_GROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_RT_GROUP_SCHED=y # CPUSET SPECIFIC PRE-REQUISITES: - Kernel >= 2.6.28 - At least 4 CPUs - At least 3 memory nodes 3)go to /opt/ltp Run the test execute ./runltp -f controllers I booted with updated kernel from : http://ppc.koji.fedoraproject.org/packages/kernel/3.1.0/7.fc16/ppc64/kernel-3.1.0-7.fc16.ppc64.rpm on F16alpha.I could reproduce cgroup failure by running controllers test. [root@elm17f131 ~]# cat /opt/ltp/results/LTP_RUN_ON-2011_Nov_14-05h_24m_13s.log Test Start Time: Mon Nov 14 05:24:14 2011LTP_RUN_ON-2011_Nov_14-05h_24m_13s.log ----------------------------------------- Testcase Result Exit Value -------- ------ ---------- cgroup FAIL 1 memcg_regression PASS 0 memcg_function PASS 0 And below information collected when test gets started cgroup_regression_test 1 TPASS : no kernel bug was found /opt/ltp/testcases/bin/cgroup_regression_test.sh: line 118: 26101 Terminated ./fork_processes cgroup_regression_test 2 TPASS : notify_on_release is inherited mount: xxx already mounted or cgroup/ busy cgroup_regression_test 3 TFAIL : Failed to mount cpu subsys cgroup_regression_test 4 TCONF : CONFIG_LOCKDEP is not enabled mount: xxx already mounted or cgroup/ busy cgroup_regression_test 5 TFAIL : mount perf_event and blkio failed We can see there are two failure lines. # uname -a Linux elm17f131.xxx.xxx.xxx 3.1.0-7.fc16.ppc64 #1 SMP Wed Nov 2 10:04:55 UTC 2011 ppc64 ppc64 ppc64 GNU/Linux This looks somewhat similar to Red Hat Bug 612805, but the failing test is not doing a remount. The test script is attached.
Created attachment 538667 [details] nohup.out
Created attachment 538668 [details] dmesg.out
Created attachment 538669 [details] cgroup_regression_test.sh
Created attachment 542593 [details] cgroup_LTP_failures.failed ------- Comment (attachment only) From maknayak.com 2011-12-08 10:11 EDT-------
Created attachment 542594 [details] cgroup_LTP_testrun.log ------- Comment (attachment only) From maknayak.com 2011-12-08 10:12 EDT-------
Created attachment 542595 [details] cgroup_LTP_testrun.out ------- Comment (attachment only) From maknayak.com 2011-12-08 10:12 EDT-------
F16 for ppc was released. If you can receate with the latest updates, please report this upstream.
------- Comment From maknayak.com 2012-05-08 13:38 EDT------- Hi All, Verified on F17 PPC64 Alpha & could reproduce the issue. There are four failures noticed as below. #cat LTP_RUN_ON-2012_May_02-08h_46m_26s.log Test Start Time: Wed May 2 08:46:26 2012 Testcase Result Exit Value -------- ------ ---------- cgroup FAIL 1 memcg_max_usage_in_bytes FAIL 3 memcg_move_charge_at_immigrate FAIL 6 memcg_memsw_limit_in_bytes PASS 0 memcg_stat PASS 0 memcg_use_hierarchy PASS 0 memcg_usage_in_bytes FAIL 1 memcg_stress PASS 0 memcg_control PASS 0 cgroup_fj PASS 0 controllers PASS 0 ----------------------------------------------- Total Tests: 13 Total Failures: 4 Kernel Version: 3.3.2-8.fc17.ppc64 Machine Architecture: ppc64 Hostname: elm17f130.beaverton.ibm.com For details test log , please see attached files names as "LTP_cgroup-RUN_testresult-summary-log.txt" & "cgroup-test-execution-F17.log" Thanks... Manas
Created attachment 669116 [details] ltp_run_on_dec23 ------- Comment (attachment only) From maheshhi.com 2012-12-26 07:16 EDT-------
Created attachment 669117 [details] dmesg ------- Comment (attachment only) From maheshhi.com 2012-12-26 07:17 EDT-------
Created attachment 669118 [details] nohup ------- Comment (attachment only) From maheshhi.com 2012-12-26 07:18 EDT-------
Created attachment 669119 [details] LTP_RUN_ON-2012_Dec_23-09h_26m_55s.failed ------- Comment (attachment only) From maheshhi.com 2012-12-26 07:19 EDT-------