Bug 758789 - cgroup controller test failed on F16 PPC64
Summary: cgroup controller test failed on F16 PPC64
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 16
Hardware: ppc64
OS: All
unspecified
urgent
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-30 17:43 UTC by IBM Bug Proxy
Modified: 2012-12-26 07:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-01 15:45:37 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
nohup.out (8.45 KB, application/octet-stream)
2011-11-30 17:43 UTC, IBM Bug Proxy
no flags Details
dmesg.out (123.76 KB, application/octet-stream)
2011-11-30 17:43 UTC, IBM Bug Proxy
no flags Details
cgroup_regression_test.sh (13.83 KB, application/x-sh)
2011-11-30 17:43 UTC, IBM Bug Proxy
no flags Details
cgroup_LTP_failures.failed (34 bytes, application/octet-stream)
2011-12-08 15:22 UTC, IBM Bug Proxy
no flags Details
cgroup_LTP_testrun.log (710 bytes, text/x-log)
2011-12-08 15:22 UTC, IBM Bug Proxy
no flags Details
cgroup_LTP_testrun.out (74.41 KB, application/octet-stream)
2011-12-08 15:22 UTC, IBM Bug Proxy
no flags Details
ltp_run_on_dec23 (1002 bytes, text/plain)
2012-12-26 07:21 UTC, IBM Bug Proxy
no flags Details
dmesg (249.19 KB, text/plain)
2012-12-26 07:21 UTC, IBM Bug Proxy
no flags Details
nohup (194.15 KB, text/plain)
2012-12-26 07:22 UTC, IBM Bug Proxy
no flags Details
LTP_RUN_ON-2012_Dec_23-09h_26m_55s.failed (234 bytes, text/plain)
2012-12-26 07:22 UTC, IBM Bug Proxy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
IBM Linux Technology Center 76186 0 None None None Never

Description IBM Bug Proxy 2011-11-30 17:43:05 UTC
While running cgroup controller test on F16 PPC64 (Fedora-20111017-ppc64-DVD.iso) on P7 Juno-L systems, noticed following test case failure and oomkiller. During this instance of time , system was not responding. To reclaim the system needed to hard reboot for HMC.

FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-2011_Oct_24-06h_41m_29s.failed
Running tests.......
<<<test_start>>>
tag=cgroup stime=1319452890
cmdline="	cgroup_regression_test.sh"
contacts=""
analysis=exit
<<<test_output>>>
cgroup_regression_test    1  TPASS  :  no kernel bug was found
/opt/ltp/testcases/bin/cgroup_regression_test.sh: line 99:  1032 Terminated              ./fork_processes
cgroup_regression_test    2  TPASS  :  notify_on_release is inherited
mount: xxx already mounted or cgroup/ busy
cgroup_regression_test    3  TFAIL  :  Failed to mount cpu subsys
cgroup_regression_test    4  TCONF  :  CONFIG_LOCKDEP is not enabled
mount: xxx already mounted or cgroup/ busy
cgroup_regression_test    5  TFAIL  :  mount perf_event and blkio failed
cgroup_regression_test    6  TCONF  :  CONFIG_CGROUP_NS
/opt/ltp/testcases/bin/cgroup_regression_test.sh: line 360: 11042 Terminated              sleep 100 < cgroup/0


And I got following call traces in dmesg and /var/log/messages :
[  426.746939] Memory cgroup out of memory: Kill process 23182 (memcg_test_1) score 1 or sacrifice child
[  426.746948] Killed process 23182 (memcg_test_1) total-vm:3840kB, anon-rss:640kB, file-rss:0kB
[  426.761928] Memory cgroup out of memory: Kill process 23183 (memcg_test_1) score 1 or sacrifice child
[  426.761936] Killed process 23183 (memcg_test_1) total-vm:3840kB, anon-rss:640kB, file-rss:0kB
[  434.408312] Adding 6225856k swap on /dev/mapper/vg_elm17f131-lv_swap.  Priority:-1 extents:1 across:6225856k 
[  476.177845] oom_kill_process: 83 callbacks suppressed
[  476.177869] memcg_process invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
[  476.177891] memcg_process cpuset=/ mems_allowed=0
[  476.177898] Call Trace:
[  476.177907] [c0000000ebc9b240] [c000000000014c64] .show_stack+0x94/0x144 (unreliable)
[  476.177922] [c0000000ebc9b300] [c00000000066df0c] .dump_stack+0x24/0x2c
[  476.177933] [c0000000ebc9b380] [c00000000015aaac] .dump_header+0xac/0x1dc
[  476.177943] [c0000000ebc9b490] [c00000000015aedc] .oom_kill_process+0x68/0x2b0
[  476.177955] [c0000000ebc9b570] [c00000000015b200] .mem_cgroup_out_of_memory+0xdc/0x11c
[  476.177966] [c0000000ebc9b620] [c0000000001aeac8] .mem_cgroup_handle_oom+0x1ec/0x328
[  476.177978] [c0000000ebc9b720] [c0000000001aefa8] .__mem_cgroup_try_charge.constprop.15+0x3a4/0x4e8
[  476.177990] [c0000000ebc9b860] [c0000000001af150] .mem_cgroup_charge_common+0x64/0x94
[  476.178001] [c0000000ebc9b920] [c0000000001afdb8] .mem_cgroup_newpage_charge+0x74/0x8c
[  476.178013] [c0000000ebc9b9c0] [c00000000017d8c8] .handle_pte_fault+0x290/0xb84
[  476.178023] [c0000000ebc9bac0] [c00000000017f180] .handle_mm_fault+0x1a4/0x1b4
[  476.178034] [c0000000ebc9bb80] [c0000000006677ac] .do_page_fault+0x474/0x704
[  476.178045] [c0000000ebc9be30] [c000000000006438] handle_page_fault+0x20/0x74
[  476.178054] Task in /11 killed as a result of limit of /11
[  476.178064] memory: usage 64kB, limit 64kB, failcnt 8
[  476.178071] memory+swap: usage 0kB, limit 9007199254740991kB, failcnt 0
[  476.178078] Mem-Info:
[  476.178082] Node 0 DMA per-cpu:
[  476.178090] CPU    0: hi:    6, btch:   1 usd:   5
[  476.178096] CPU    1: hi:    6, btch:   1 usd:   0
[  476.178103] CPU    2: hi:    6, btch:   1 usd:   5
[  476.178109] CPU    3: hi:    6, btch:   1 usd:   3
[  476.178124] active_anon:1963 inactive_anon:2106 isolated_anon:0
[  476.178126]  active_file:666 inactive_file:2630 isolated_file:0
[  476.178128]  unevictable:0 dirty:9 writeback:0 unstable:0
[  476.178130]  free:55567 slab_reclaimable:241 slab_unreclaimable:810
[  476.178132]  mapped:518 shmem:2106 pagetables:227 bounce:0
[  476.178151] Node 0 DMA free:3556288kB min:8128kB low:10112kB high:12160kB active_anon:125632kB inactive_anon:134784kB active_file:42624kB inactive_file:168320kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:4190720kB mlocked:0kB dirty:576kB writeback:0kB mapped:33152kB shmem:134784kB slab_reclaimable:15424kB slab_unreclaimable:51840kB kernel_stack:2784kB pagetables:14528kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  476.178198] lowmem_reserve[]: 0 0 0
[  476.178209] Node 0 DMA: 221*64kB 134*128kB 122*256kB 57*512kB 19*1024kB 4*2048kB 5*4096kB 1*8192kB 208*16384kB = 3555904kB
[  476.178238] 5400 total pagecache pages
[  476.178243] 0 pages in swap cache
[  476.178249] Swap cache stats: add 828, delete 828, find 3/5
[  476.178255] Free swap  = 6225856kB
[  476.178260] Total swap = 6225856kB
[  476.179384] 65536 pages RAM
[  476.179390] 805 pages reserved
[  476.179396] 2735 pages shared
[  476.179400] 8341 pages non-shared
[  476.179407] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  476.179431] [23504]     0 23504       61       19   0       0             0 memcg_process
[  476.179443] Memory cgroup out of memory: Kill process 23504 (memcg_process) score 1 or sacrifice child
[  476.179455] Killed process 23504 (memcg_process) total-vm:3904kB, anon-rss:576kB, file-rss:640kB
[  480.241991] memcg_process invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
[  480.242013] memcg_process cpuset=/ mems_allowed=0


NOTE: I have attached two files named as nohup.out and dmesg.out file for reference , which contains output of test execution & dmesg output respectively.

--- More information ----

[root@elm17f131 ~]# uname -a
Linux elm17f131.xxx.xxx.xxx 3.1.0-0.rc9.git0.2.fc16.kh.ppc64 #1 SMP Wed Oct 12 22:41:01 UTC 2011 ppc64 ppc64 ppc64 GNU/Linux

[root@elm17f131 ~]# lscpu 
Architecture:          ppc64
Byte Order:            Big Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 IBM,8231-E2B
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0-3
[root@elm17f131 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:          4045        586       3459          0         27        365
-/+ buffers/cache:        193       3851
Swap:         6079          0       6079

--- Steps to reproduce ---
1) Install ltp latest tar ball
2) # PRE-REQUISITES:
- Kernel >= 2.6.24 with proper config options:
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CPUSET SPECIFIC PRE-REQUISITES:
- Kernel >= 2.6.28
- At least 4 CPUs
- At least 3 memory nodes

3)go to /opt/ltp  Run the test 
execute  ./runltp -f controllers


I booted with updated kernel from : http://ppc.koji.fedoraproject.org/packages/kernel/3.1.0/7.fc16/ppc64/kernel-3.1.0-7.fc16.ppc64.rpm on F16alpha.I could reproduce cgroup failure by running controllers test.
            
[root@elm17f131 ~]# cat /opt/ltp/results/LTP_RUN_ON-2011_Nov_14-05h_24m_13s.log  
                                   
Test Start Time: Mon Nov 14 05:24:14 2011LTP_RUN_ON-2011_Nov_14-05h_24m_13s.log  
-----------------------------------------
Testcase                              Result     Exit Value
--------                                   ------     ----------
cgroup                                 FAIL       1    
memcg_regression               PASS       0    
memcg_function                 PASS       0    

And below information collected when test gets started 

cgroup_regression_test    1  TPASS  :  no kernel bug was found
/opt/ltp/testcases/bin/cgroup_regression_test.sh: line 118: 26101 Terminated              ./fork_processes
cgroup_regression_test    2  TPASS  :  notify_on_release is inherited
mount: xxx already mounted or cgroup/ busy
cgroup_regression_test    3  TFAIL  :  Failed to mount cpu subsys
cgroup_regression_test    4  TCONF  :  CONFIG_LOCKDEP is not enabled
mount: xxx already mounted or cgroup/ busy
cgroup_regression_test    5  TFAIL  :  mount perf_event and blkio failed

We can see there are two failure lines.

# uname -a
Linux elm17f131.xxx.xxx.xxx 3.1.0-7.fc16.ppc64 #1 SMP Wed Nov 2 10:04:55 UTC 2011 ppc64 ppc64 ppc64 GNU/Linux

This looks somewhat similar to Red Hat Bug 612805, but the failing test is not doing a remount. The test script is attached.

Comment 1 IBM Bug Proxy 2011-11-30 17:43:19 UTC
Created attachment 538667 [details]
nohup.out

Comment 2 IBM Bug Proxy 2011-11-30 17:43:31 UTC
Created attachment 538668 [details]
dmesg.out

Comment 3 IBM Bug Proxy 2011-11-30 17:43:43 UTC
Created attachment 538669 [details]
cgroup_regression_test.sh

Comment 4 IBM Bug Proxy 2011-12-08 15:22:09 UTC
Created attachment 542593 [details]
cgroup_LTP_failures.failed


------- Comment (attachment only) From maknayak.com 2011-12-08 10:11 EDT-------

Comment 5 IBM Bug Proxy 2011-12-08 15:22:20 UTC
Created attachment 542594 [details]
cgroup_LTP_testrun.log


------- Comment (attachment only) From maknayak.com 2011-12-08 10:12 EDT-------

Comment 6 IBM Bug Proxy 2011-12-08 15:22:30 UTC
Created attachment 542595 [details]
cgroup_LTP_testrun.out


------- Comment (attachment only) From maknayak.com 2011-12-08 10:12 EDT-------

Comment 7 Josh Boyer 2012-03-01 15:45:37 UTC
F16 for ppc was released.  If you can receate with the latest updates, please report this upstream.

Comment 8 IBM Bug Proxy 2012-05-08 13:40:37 UTC
------- Comment From maknayak.com 2012-05-08 13:38 EDT-------
Hi All,
Verified on F17 PPC64 Alpha & could reproduce the issue.

There are four failures noticed as below.

#cat LTP_RUN_ON-2012_May_02-08h_46m_26s.log

Test Start Time: Wed May  2 08:46:26 2012
Testcase                       Result     Exit Value
--------                       ------     ----------
cgroup                         FAIL       1
memcg_max_usage_in_bytes       FAIL       3
memcg_move_charge_at_immigrate FAIL       6
memcg_memsw_limit_in_bytes     PASS       0
memcg_stat                     PASS       0
memcg_use_hierarchy            PASS       0
memcg_usage_in_bytes           FAIL       1
memcg_stress                   PASS       0
memcg_control                  PASS       0
cgroup_fj                      PASS       0
controllers                    PASS       0

-----------------------------------------------
Total Tests: 13
Total Failures: 4
Kernel Version: 3.3.2-8.fc17.ppc64
Machine Architecture: ppc64
Hostname: elm17f130.beaverton.ibm.com

For details test log , please see attached files names as "LTP_cgroup-RUN_testresult-summary-log.txt" & "cgroup-test-execution-F17.log"

Thanks...
Manas

Comment 9 IBM Bug Proxy 2012-12-26 07:21:24 UTC
Created attachment 669116 [details]
ltp_run_on_dec23


------- Comment (attachment only) From maheshhi.com 2012-12-26 07:16 EDT-------

Comment 10 IBM Bug Proxy 2012-12-26 07:21:50 UTC
Created attachment 669117 [details]
dmesg


------- Comment (attachment only) From maheshhi.com 2012-12-26 07:17 EDT-------

Comment 11 IBM Bug Proxy 2012-12-26 07:22:08 UTC
Created attachment 669118 [details]
nohup


------- Comment (attachment only) From maheshhi.com 2012-12-26 07:18 EDT-------

Comment 12 IBM Bug Proxy 2012-12-26 07:22:28 UTC
Created attachment 669119 [details]
LTP_RUN_ON-2012_Dec_23-09h_26m_55s.failed


------- Comment (attachment only) From maheshhi.com 2012-12-26 07:19 EDT-------


Note You need to log in before you can comment on or make changes to this bug.