Problem Description ------------------------------------ While running flail on ppc64 and s390x, the machine generated call traces. Call Trace as seen in ppc64: ++++++++++++++++++++++++++++ Dec 31 19:00:31 kernel: [17576.810786] vmalloc: allocation failure: 70368406243264 bytes Dec 31 19:00:31 kernel: [17576.810801] flail: page allocation failure: order:0, mode:0xd2 Dec 31 19:00:31 kernel: [17576.810805] Call Trace: Dec 31 19:00:31 kernel: [17576.810813] [c00000006c603b00] [c000000000015b80] .show_stack+0x130/0x200 (unreliable) Dec 31 19:00:31 kernel: [17576.810840] [c00000006c603bd0] [c0000000001b6a90] .warn_alloc_failed+0x100/0x160 Dec 31 19:00:31 kernel: [17576.810847] [c00000006c603c80] [c0000000001f734c] .__vmalloc_node_range+0x22c/0x2e0 Dec 31 19:00:31 kernel: [17576.810853] [c00000006c603d40] [c00000000011a0a8] .SyS_init_module+0xa8/0x150 Dec 31 19:00:31 kernel: [17576.810859] [c00000006c603e30] [c000000000009e54] syscall_exit+0x0/0x98 Dec 31 19:00:31 kernel: [17576.810863] Mem-Info: Dec 31 19:00:31 kernel: [17576.810867] Node 1 DMA per-cpu: Dec 31 19:00:31 kernel: [17576.810870] CPU 0: hi: 6, btch: 1 usd: 5 Dec 31 19:00:31 kernel: [17576.810874] CPU 1: hi: 6, btch: 1 usd: 4 Dec 31 19:00:31 kernel: [17576.810877] CPU 2: hi: 6, btch: 1 usd: 0 Dec 31 19:00:31 kernel: [17576.810881] CPU 3: hi: 6, btch: 1 usd: 5 Dec 31 19:00:31 kernel: [17576.810884] CPU 4: hi: 6, btch: 1 usd: 5 Dec 31 19:00:31 kernel: [17576.810888] CPU 5: hi: 6, btch: 1 usd: 5 Dec 31 19:00:31 kernel: [17576.810891] CPU 6: hi: 6, btch: 1 usd: 3 Dec 31 19:00:31 kernel: [17576.810894] CPU 7: hi: 6, btch: 1 usd: 5 Dec 31 19:00:31 kernel: [17576.810901] active_anon:14320 inactive_anon:45 isolated_anon:0 Dec 31 19:00:31 kernel: [17576.810901] active_file:12623 inactive_file:10538 isolated_file:0 Dec 31 19:00:31 kernel: [17576.810901] unevictable:190 dirty:58 writeback:0 unstable:0 Dec 31 19:00:31 kernel: [17576.810901] free:5888 slab_reclaimable:1472 slab_unreclaimable:18266 Dec 31 19:00:31 kernel: [17576.810901] mapped:3435 shmem:199 pagetables:564 bounce:0 Dec 31 19:00:31 kernel: [17576.810901] free_cma:0 Dec 31 19:00:31 kernel: [17576.810915] Node 1 DMA free:376832kB min:8064kB low:10048kB high:12096kB active_anon:916480kB inactive_anon:2880kB active_file:807872kB inactive_file:674432kB unevictable:12160kB isolated(anon):0kB isolated(file):0kB present:4194304kB managed:4110720kB mlocked:12224kB dirty:3712kB writeback:0kB mapped:219840kB shmem:12736kB slab_reclaimable:94208kB slab_unreclaimable:1169024kB kernel_stack:6512kB pagetables:36096kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Dec 31 19:00:31 kernel: [17576.810936] lowmem_reserve[]: 0 0 0 Dec 31 19:00:31 kernel: [17576.810941] Node 1 DMA: 30*64kB (UEM) 57*128kB (UEM) 13*256kB (UEM) 18*512kB (UEM) 39*1024kB (UM) 24*2048kB (UEM) 13*4096kB (UM) 6*8192kB (U) 10*16384kB (UR) = 377088kB Dec 31 19:00:31 kernel: [17576.810964] 23441 total pagecache pages Dec 31 19:00:31 kernel: [17576.810967] 0 pages in swap cache Dec 31 19:00:31 kernel: [17576.810970] Swap cache stats: add 0, delete 0, find 0/0 Dec 31 19:00:31 kernel: [17576.810973] Free swap = 1138624kB Dec 31 19:00:31 kernel: [17576.810976] Total swap = 1138624kB Dec 31 19:00:31 kernel: [17576.811609] 65536 pages RAM Dec 31 19:00:31 kernel: [17576.811614] 921 pages reserved Dec 31 19:00:31 kernel: [17576.811616] 94636 pages shared Dec 31 19:00:31 kernel: [17576.811619] 44273 pages non-shared Dec 31 19:00:32 AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality. Dec 31 19:00:32 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.YlQo'] Dec 31 19:00:35 AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality. Dec 31 19:00:35 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.k69Y'] Dec 31 19:00:42 AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality. Dec 31 19:00:42 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.HrnX'] Dec 31 19:00:45 AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality. Dec 31 19:00:45 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.6Nri'] Dec 31 19:00:49 AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality. Dec 31 19:00:49 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.2Gqj'] +++++++++++++++++++++++++++ Call Trace as seen in s390x: +++++++++++++++++++++++++++ Jun 17 12:00:34 h4215025 kernel: [885692.785453] User process fault: interruption code 0x20010 Jun 17 12:00:34 h4215025 kernel: [885692.785466] failing address: 80001000 Jun 17 12:00:34 h4215025 kernel: [885692.785471] CPU: 0 Not tainted 3.9.4-300.fc19.s390x #1 Jun 17 12:00:34 h4215025 kernel: [885692.785475] Process flail (pid: 16611, task: 0000000031100000, ksp: 000000003110beb0) Jun 17 12:00:34 h4215025 kernel: [885692.785480] User PSW : 0705100180000000 0000000080001634 (0x80001634) Jun 17 12:00:34 h4215025 kernel: [885692.785504] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:1 PM:0 EA:3 Jun 17 12:00:34 h4215025 kernel: [885692.785504] User GPRS: fffffffffffff001 000000000000005b 0000000000000000 00000000800051e8 Jun 17 12:00:34 h4215025 kernel: [885692.785516] 0000000000000001 0000000080000ed8 0000000080000ed8 000003ffffe00310 Jun 17 12:00:34 h4215025 kernel: [885692.785523] 0000000080000da8 000000009166b5f0 000003ffffe00010 000003ffffdffe80 Jun 17 12:00:34 h4215025 kernel: [885692.785530] 00000000800051e8 0000000000000000 0000000080001634 000003ffffdffe80 Jun 17 12:00:34 h4215025 kernel: [885692.785539] User Code: Bad PSW. Jun 17 12:00:34 h4215025 kernel: [885692.785546] Last Breaking-Event-Address: Jun 17 12:00:34 h4215025 kernel: [885692.785553] [<000003fffd34ae04>] 0x3fffd34ae04 Jun 17 12:00:34 h4215025 AutotestCrashHandler: Application flail, PID 16611 crashed Jun 17 12:00:34 h4215025 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/crash.flail.16611'] Jun 17 12:00:34 h4215025 kernel: [885692.850955] User process fault: interruption code 0x6 Jun 17 12:00:34 h4215025 kernel: [885692.850972] CPU: 0 Not tainted 3.9.4-300.fc19.s390x #1 Jun 17 12:00:34 h4215025 kernel: [885692.851071] Process flail (pid: 16670, task: 0000000031d03080, ksp: 0000000031d13eb0) Jun 17 12:00:34 h4215025 kernel: [885692.851074] User PSW : 0705038180000000 000003ff00000080 (0x3ff00000080) Jun 17 12:00:34 h4215025 kernel: [885692.851090] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:3 EA:3 Jun 17 12:00:34 h4215025 kernel: [885692.851090] User GPRS: 0000000000000006 0000000000000001 000003fffd3f5000 000003fffd4c54da Jun 17 12:00:34 h4215025 kernel: [885692.851099] 000003ffffdffee8 000003fffd3f5000 0000000000000000 0000000080001a20 Jun 17 12:00:34 h4215025 kernel: [885692.851105] 000003ffffe00310 0000000080000da8 000000009166b5f0 000000009166b3e0 Jun 17 12:00:34 h4215025 kernel: [885692.851110] 000003ffffdfff48 000003fffd3f5000 0000000000000000 00000000800017e2 Jun 17 12:00:34 h4215025 kernel: [885692.851119] User Code: Bad PSW. Jun 17 12:00:34 h4215025 kernel: [885692.851125] Last Breaking-Event-Address: Jun 17 12:00:34 h4215025 kernel: [885692.851131] [<00000000007451d8>] sysc_restore+0x22/0x26 Jun 17 12:00:34 h4215025 kernel: [885692.908353] warning: process `flail' used the obsolete bdflush system call Jun 17 12:00:34 h4215025 kernel: [885692.908367] Fix your initscripts? Jun 17 12:00:34 h4215025 AutotestCrashHandler: Application flail, PID 16670 crashed Jun 17 12:00:34 h4215025 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/crash.flail.16670'] Jun 17 12:00:40 h4215025 kernel: [885698.944116] User process fault: interruption code 0x6003B in flail[80000000+3000] +++++++++++++++++++++++++++ == Comment: #- Vaishnavi Bhat <vaish123.com> - == Hi Sravan, What's the current value for vm.min_free_kbytes? # sysctl -n vm.min_free_kbytes Can you please try increasing this value and see if it shows any improvement? Please start with the double of the current value. # sysctl -w vm.min_free_kbytes=<new value> Thanks. == Comment: # - Sravan V. Dodla <sradodla.com> -== (In reply to comment #2) > Hi Sravan, > > What's the current value for vm.min_free_kbytes? > > # sysctl -n vm.min_free_kbytes > > Can you please try increasing this value and see if it shows any > improvement? Please start with the double of the current value. > > # sysctl -w vm.min_free_kbytes=<new value> > > Thanks. I have doubled the current value of vm.min_free_kbytes but could see the same call trace. FYI, +++++++++++ On ppc64: [root@ltcfbl6fb13 acl]# sysctl -n vm.min_free_kbytes 8118 [root@ltcfbl6fb13 acl]# sysctl -w vm.min_free_kbytes=16000 vm.min_free_kbytes = 16000 Following is the snap from the core: Program: /usr/local/autotest-0.15.1/client/tmp/flail/src/flail PID: 13455 Signal: 11 Hostname: ltcfbl6fb13.austin.ibm.com Time of the crash (according to kernel): Thu Jun 27 04:44:23 2013 Program backtrace: [New LWP 13455] Core was generated by `/usr/local/autotest-0.15.1/client/tmp/flail/src/flail iso9660 1 '. Program terminated with signal 11, Segmentation fault. #0 0x000000805fe2fa80 in _dl_argc () from /lib64/ld64.so.1 #0 0x000000805fe2fa80 in _dl_argc () from /lib64/ld64.so.1 No symbol table info available. #1 0x0000000000000001 in ?? () No symbol table info available. #2 0x000000805fe8432c in generic_start_main (main=@0x100214e0: 0x10001894, argc=<optimized out>, ubp_av=0x3fffc8ff9208, auxvec=0x3fffc8ff9318, init=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>, fini=<optimized out>) at ../csu/libc-start.c:258 self = 0xfffb510 result = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {2718873438491630834, 551366638688, 2718871816924697574, 0 <repeats 13 times>, 551364525568, 551364529384, 551364524672, 0, 0, -3724539870, 0 <repeats 42 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x3fffc8ff9100, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 16383}}} not_first_call = <optimized out> #3 0x000000805fe84554 in __libc_start_main (argc=<optimized out>, ubp_av=<optimized out>, ubp_ev=<optimized out>, auxvec=<optimized out>, rtld_fini=<optimized out>, stinfo=<optimized out>, stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:91 No locals. #4 0x0000000000000000 in ?? () No symbol table info available. +++++++++++ s390x: +++++++++++ [root@h4215025 tests]# sysctl -n vm.min_free_kbytes 3982 [root@h4215025 tests]# sysctl -w vm.min_free_kbytes=7800 vm.min_free_kbytes = 7800 [root@h4215025 tests]# sysctl -n vm.min_free_kbytes 7800 +++++++++++
Created attachment 767618 [details] s390x var_log_msgs
Created attachment 767619 [details] ppc64 var_log_msgs
Huh, why is this assigned to Python?
------- Comment From vaish123.com 2013-07-02 08:05 EDT------- (In reply to comment #7) > Huh, why is this assigned to Python? Sorry, my bad. This has to be directed to security.
Created attachment 813314 [details] dmesg output ------- Comment on attachment From iranna.ankad.com 2013-10-17 12:53 EDT------- This bug is still reproducible in Fedora20 Alpha. FYI, [root@oc4352187340 bugs]# sysctl -n vm.min_free_kbytes 67584 [root@oc4352187340 bugs]# uname -a Linux oc4352187340.ibm.com 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Nov 28 19:02:25 CET 2012 x86_64 x86_64 x86_64 GNU/Linux [root@oc4352187340 bugs]# Note: attaching fresh dmesg output taken while running flail test
------- Comment From iranna.ankad.com 2013-10-17 13:02 EDT------- (In reply to comment #9) > Created attachment 82823 [details] > dmesg output > This bug is still reproducible in Fedora20 Alpha. > FYI, > [root@oc4352187340 bugs]# sysctl -n vm.min_free_kbytes > 67584 > [root@oc4352187340 bugs]# uname -a > Linux oc4352187340.ibm.com 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Nov 28 > 19:02:25 CET 2012 x86_64 x86_64 x86_64 GNU/Linux > [root@oc4352187340 bugs]# Please ignore o/p of above two commands, I realised that by mistake I ran them on my RH6.3 Thinkpad. Here is the correct o/p from test system. [root@ test]# uname -a Linux 3.11.0-300.fc20.ppc64p7 #1 SMP Thu Sep 5 16:13:50 MST 2013 ppc64 ppc64 ppc64 GNU/Linux [root@ test]# sysctl -n vm.min_free_kbytes 28930 [root@ test]#
------- Comment From gusld.com 2013-10-22 19:08 EDT------- This bug is very confusing. I could not understand what is being asked here. The only call traces I could find in the logs are either flail segfaulting or the kernel complaining it could not complete a request to allocate several terabytes of memory. There is clearly nothing wrong about the kernel call traces. Regarding the flail segfaults, flail is not part of Fedora, so this should be handled somewhere else other than the Fedora bugzilla. Can we close this bug?
------- Comment From iranna.ankad.com 2013-11-16 20:11 EDT------- (In reply to comment #13) > This bug is very confusing. I could not understand what is being asked here. > The only call traces I could find in the logs are either flail segfaulting > or the kernel complaining it could not complete a request to allocate > several terabytes of memory. > There is clearly nothing wrong about the kernel call traces. Regarding the > flail segfaults, flail is not part of Fedora, so this should be handled > somewhere else other than the Fedora bugzilla. > Can we close this bug? Hello Gustavo, Though I could recreate this bug even in F20 beta, sort of I agree that above call traces give any sense of something wrong happening here. Even I wonder why vmalloc is trying to allocate tera bytes of memory. Still..I tried to rerun flail test by increasing the allocation capacity of vmalloc to 7TB (70368000M in this case) and also tried setting the vm.min_free_kbytes to 1.6 TB. Nothing ..helped, above errors are still recreating. Since flail is a system call security fuzzing test case, I would mainly look out for any system crash or kernel oops out of this test, so I am not much worried about above errors. I am OK to close this for now. Thanks! [root@als0153 ~]# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.11.6-301.fc20.ppc64p7 root=UUID=be312012-4d7f-442e-8a02-7de8bb0e4323 ro vconsole.keymap=us vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8 vmalloc=70368000M [root@als0153 ~]# sysctl -n vm.min_free_kbytes 16354000
------- Comment From vaish123.com 2013-11-18 06:18 EDT------- (In reply to comment #13) > This bug is very confusing. I could not understand what is being asked here. > The only call traces I could find in the logs are either flail segfaulting > or the kernel complaining it could not complete a request to allocate > several terabytes of memory. > There is clearly nothing wrong about the kernel call traces. Regarding the > flail segfaults, flail is not part of Fedora, so this should be handled > somewhere else other than the Fedora bugzilla. > Can we close this bug? -- Closing the bug on the IBM side --
This message is a notice that Fedora 19 is now at end of life. Fedora has stopped maintaining and issuing updates for Fedora 19. It is Fedora's policy to close all bug reports from releases that are no longer maintained. Approximately 4 (four) weeks from now this bug will be closed as EOL if it remains open with a Fedora 'version' of '19'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 19 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.