Bug 980324 - Fedora19: Call Traces while running flail
Fedora19: Call Traces while running flail
Status: CLOSED EOL
Product: Fedora
Classification: Fedora
Component: python (Show other bugs)
19
All All
unspecified Severity medium
: ---
: ---
Assigned To: Bohuslav "Slavek" Kabrda
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-02 01:30 EDT by IBM Bug Proxy
Modified: 2015-02-17 10:46 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-02-17 10:46:46 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
s390x var_log_msgs (14.21 KB, text/plain)
2013-07-02 01:31 EDT, IBM Bug Proxy
no flags Details
ppc64 var_log_msgs (434.69 KB, text/plain)
2013-07-02 01:31 EDT, IBM Bug Proxy
no flags Details
dmesg output (103.59 KB, text/plain)
2013-10-17 09:01 EDT, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 95368 None None None Never

  None (edit)
Description IBM Bug Proxy 2013-07-02 01:30:42 EDT
Problem Description
------------------------------------
While running flail on ppc64 and s390x, the machine generated call traces.
Call Trace as seen in ppc64:
++++++++++++++++++++++++++++
Dec 31 19:00:31  kernel: [17576.810786] vmalloc: allocation failure: 70368406243264 bytes
Dec 31 19:00:31  kernel: [17576.810801] flail: page allocation failure: order:0, mode:0xd2
Dec 31 19:00:31  kernel: [17576.810805] Call Trace:
Dec 31 19:00:31  kernel: [17576.810813] [c00000006c603b00] [c000000000015b80] .show_stack+0x130/0x200 (unreliable)
Dec 31 19:00:31  kernel: [17576.810840] [c00000006c603bd0] [c0000000001b6a90] .warn_alloc_failed+0x100/0x160
Dec 31 19:00:31  kernel: [17576.810847] [c00000006c603c80] [c0000000001f734c] .__vmalloc_node_range+0x22c/0x2e0
Dec 31 19:00:31  kernel: [17576.810853] [c00000006c603d40] [c00000000011a0a8] .SyS_init_module+0xa8/0x150
Dec 31 19:00:31  kernel: [17576.810859] [c00000006c603e30] [c000000000009e54] syscall_exit+0x0/0x98
Dec 31 19:00:31  kernel: [17576.810863] Mem-Info:
Dec 31 19:00:31  kernel: [17576.810867] Node 1 DMA per-cpu:
Dec 31 19:00:31  kernel: [17576.810870] CPU    0: hi:    6, btch:   1 usd:   5
Dec 31 19:00:31  kernel: [17576.810874] CPU    1: hi:    6, btch:   1 usd:   4
Dec 31 19:00:31  kernel: [17576.810877] CPU    2: hi:    6, btch:   1 usd:   0
Dec 31 19:00:31  kernel: [17576.810881] CPU    3: hi:    6, btch:   1 usd:   5
Dec 31 19:00:31  kernel: [17576.810884] CPU    4: hi:    6, btch:   1 usd:   5
Dec 31 19:00:31  kernel: [17576.810888] CPU    5: hi:    6, btch:   1 usd:   5
Dec 31 19:00:31  kernel: [17576.810891] CPU    6: hi:    6, btch:   1 usd:   3
Dec 31 19:00:31  kernel: [17576.810894] CPU    7: hi:    6, btch:   1 usd:   5
Dec 31 19:00:31  kernel: [17576.810901] active_anon:14320 inactive_anon:45 isolated_anon:0
Dec 31 19:00:31  kernel: [17576.810901]  active_file:12623 inactive_file:10538 isolated_file:0
Dec 31 19:00:31  kernel: [17576.810901]  unevictable:190 dirty:58 writeback:0 unstable:0
Dec 31 19:00:31  kernel: [17576.810901]  free:5888 slab_reclaimable:1472 slab_unreclaimable:18266
Dec 31 19:00:31  kernel: [17576.810901]  mapped:3435 shmem:199 pagetables:564 bounce:0
Dec 31 19:00:31  kernel: [17576.810901]  free_cma:0
Dec 31 19:00:31  kernel: [17576.810915] Node 1 DMA free:376832kB min:8064kB low:10048kB high:12096kB active_anon:916480kB inactive_anon:2880kB active_file:807872kB inactive_file:674432kB unevictable:12160kB isolated(anon):0kB isolated(file):0kB present:4194304kB managed:4110720kB mlocked:12224kB dirty:3712kB writeback:0kB mapped:219840kB shmem:12736kB slab_reclaimable:94208kB slab_unreclaimable:1169024kB kernel_stack:6512kB pagetables:36096kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Dec 31 19:00:31  kernel: [17576.810936] lowmem_reserve[]: 0 0 0
Dec 31 19:00:31  kernel: [17576.810941] Node 1 DMA: 30*64kB (UEM) 57*128kB (UEM) 13*256kB (UEM) 18*512kB (UEM) 39*1024kB (UM) 24*2048kB (UEM) 13*4096kB (UM) 6*8192kB (U) 10*16384kB (UR) = 377088kB
Dec 31 19:00:31  kernel: [17576.810964] 23441 total pagecache pages
Dec 31 19:00:31  kernel: [17576.810967] 0 pages in swap cache
Dec 31 19:00:31  kernel: [17576.810970] Swap cache stats: add 0, delete 0, find 0/0
Dec 31 19:00:31  kernel: [17576.810973] Free swap  = 1138624kB
Dec 31 19:00:31  kernel: [17576.810976] Total swap = 1138624kB
Dec 31 19:00:31  kernel: [17576.811609] 65536 pages RAM
Dec 31 19:00:31  kernel: [17576.811614] 921 pages reserved
Dec 31 19:00:31  kernel: [17576.811616] 94636 pages shared
Dec 31 19:00:31  kernel: [17576.811619] 44273 pages non-shared
Dec 31 19:00:32  AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality.
Dec 31 19:00:32  AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.YlQo']
Dec 31 19:00:35  AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality.
Dec 31 19:00:35  AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.k69Y']
Dec 31 19:00:42  AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality.
Dec 31 19:00:42  AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.HrnX']
Dec 31 19:00:45  AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality.
Dec 31 19:00:45  AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.6Nri']
Dec 31 19:00:49  AutotestCrashHandler: Unable to unpack parameters passed to the script. Operating with limited functionality.
Dec 31 19:00:49  AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/core.2Gqj']
+++++++++++++++++++++++++++

Call Trace as seen in s390x:
+++++++++++++++++++++++++++
Jun 17 12:00:34 h4215025 kernel: [885692.785453] User process fault: interruption code 0x20010
Jun 17 12:00:34 h4215025 kernel: [885692.785466] failing address: 80001000
Jun 17 12:00:34 h4215025 kernel: [885692.785471] CPU: 0 Not tainted 3.9.4-300.fc19.s390x #1
Jun 17 12:00:34 h4215025 kernel: [885692.785475] Process flail (pid: 16611, task: 0000000031100000, ksp: 000000003110beb0)
Jun 17 12:00:34 h4215025 kernel: [885692.785480] User PSW : 0705100180000000 0000000080001634 (0x80001634)
Jun 17 12:00:34 h4215025 kernel: [885692.785504]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:1 PM:0 EA:3
Jun 17 12:00:34 h4215025 kernel: [885692.785504] User GPRS: fffffffffffff001 000000000000005b 0000000000000000 00000000800051e8
Jun 17 12:00:34 h4215025 kernel: [885692.785516]            0000000000000001 0000000080000ed8 0000000080000ed8 000003ffffe00310
Jun 17 12:00:34 h4215025 kernel: [885692.785523]            0000000080000da8 000000009166b5f0 000003ffffe00010 000003ffffdffe80
Jun 17 12:00:34 h4215025 kernel: [885692.785530]            00000000800051e8 0000000000000000 0000000080001634 000003ffffdffe80
Jun 17 12:00:34 h4215025 kernel: [885692.785539] User Code: Bad PSW.
Jun 17 12:00:34 h4215025 kernel: [885692.785546] Last Breaking-Event-Address:
Jun 17 12:00:34 h4215025 kernel: [885692.785553]  [<000003fffd34ae04>] 0x3fffd34ae04
Jun 17 12:00:34 h4215025 AutotestCrashHandler: Application flail, PID 16611 crashed
Jun 17 12:00:34 h4215025 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/crash.flail.16611']
Jun 17 12:00:34 h4215025 kernel: [885692.850955] User process fault: interruption code 0x6
Jun 17 12:00:34 h4215025 kernel: [885692.850972] CPU: 0 Not tainted 3.9.4-300.fc19.s390x #1
Jun 17 12:00:34 h4215025 kernel: [885692.851071] Process flail (pid: 16670, task: 0000000031d03080, ksp: 0000000031d13eb0)
Jun 17 12:00:34 h4215025 kernel: [885692.851074] User PSW : 0705038180000000 000003ff00000080 (0x3ff00000080)
Jun 17 12:00:34 h4215025 kernel: [885692.851090]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:0 PM:3 EA:3
Jun 17 12:00:34 h4215025 kernel: [885692.851090] User GPRS: 0000000000000006 0000000000000001 000003fffd3f5000 000003fffd4c54da
Jun 17 12:00:34 h4215025 kernel: [885692.851099]            000003ffffdffee8 000003fffd3f5000 0000000000000000 0000000080001a20
Jun 17 12:00:34 h4215025 kernel: [885692.851105]            000003ffffe00310 0000000080000da8 000000009166b5f0 000000009166b3e0
Jun 17 12:00:34 h4215025 kernel: [885692.851110]            000003ffffdfff48 000003fffd3f5000 0000000000000000 00000000800017e2
Jun 17 12:00:34 h4215025 kernel: [885692.851119] User Code: Bad PSW.
Jun 17 12:00:34 h4215025 kernel: [885692.851125] Last Breaking-Event-Address:
Jun 17 12:00:34 h4215025 kernel: [885692.851131]  [<00000000007451d8>] sysc_restore+0x22/0x26
Jun 17 12:00:34 h4215025 kernel: [885692.908353] warning: process `flail' used the obsolete bdflush system call
Jun 17 12:00:34 h4215025 kernel: [885692.908367] Fix your initscripts?
Jun 17 12:00:34 h4215025 AutotestCrashHandler: Application flail, PID 16670 crashed
Jun 17 12:00:34 h4215025 AutotestCrashHandler: Writing core files to ['/usr/local/autotest-0.15.1/client/results/default/flail/debug/crash.flail.16670']
Jun 17 12:00:40 h4215025 kernel: [885698.944116] User process fault: interruption code 0x6003B in flail[80000000+3000]
+++++++++++++++++++++++++++

== Comment: #- Vaishnavi Bhat <vaish123@in.ibm.com> -  ==
Hi Sravan,

What's the current value for vm.min_free_kbytes?

# sysctl -n vm.min_free_kbytes

Can you please try increasing this value and see if it shows any improvement? Please start with the double of the current value.

# sysctl -w vm.min_free_kbytes=<new value>

Thanks.

== Comment: # - Sravan V. Dodla <sradodla@in.ibm.com> -==
(In reply to comment #2)
> Hi Sravan,
> 
> What's the current value for vm.min_free_kbytes?
> 
> # sysctl -n vm.min_free_kbytes
> 
> Can you please try increasing this value and see if it shows any
> improvement? Please start with the double of the current value.
> 
> # sysctl -w vm.min_free_kbytes=<new value>
> 
> Thanks.

I have doubled the current value of vm.min_free_kbytes but could see the same call trace.

FYI, 
+++++++++++
On ppc64:

[root@ltcfbl6fb13 acl]# sysctl -n vm.min_free_kbytes 
8118

[root@ltcfbl6fb13 acl]# sysctl -w vm.min_free_kbytes=16000
vm.min_free_kbytes = 16000

Following is the snap from the core:
Program: /usr/local/autotest-0.15.1/client/tmp/flail/src/flail
PID: 13455
Signal: 11
Hostname: ltcfbl6fb13.austin.ibm.com
Time of the crash (according to kernel): Thu Jun 27 04:44:23 2013
Program backtrace:
[New LWP 13455]
Core was generated by `/usr/local/autotest-0.15.1/client/tmp/flail/src/flail iso9660 1 '.
Program terminated with signal 11, Segmentation fault.
#0  0x000000805fe2fa80 in _dl_argc () from /lib64/ld64.so.1
#0  0x000000805fe2fa80 in _dl_argc () from /lib64/ld64.so.1
No symbol table info available.
#1  0x0000000000000001 in ?? ()
No symbol table info available.
#2  0x000000805fe8432c in generic_start_main (main=@0x100214e0: 0x10001894, argc=<optimized out>, ubp_av=0x3fffc8ff9208, auxvec=0x3fffc8ff9318, init=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>, fini=<optimized out>) at ../csu/libc-start.c:258
        self = 0xfffb510
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {2718873438491630834, 551366638688, 2718871816924697574, 0 <repeats 13 times>, 551364525568, 551364529384, 551364524672, 0, 0, -3724539870, 0 <repeats 42 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x3fffc8ff9100, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 16383}}}
        not_first_call = <optimized out>
#3  0x000000805fe84554 in __libc_start_main (argc=<optimized out>, ubp_av=<optimized out>, ubp_ev=<optimized out>, auxvec=<optimized out>, rtld_fini=<optimized out>, stinfo=<optimized out>, stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:91
No locals.
#4  0x0000000000000000 in ?? ()
No symbol table info available.
+++++++++++

s390x:
+++++++++++
[root@h4215025 tests]# sysctl -n vm.min_free_kbytes
3982

[root@h4215025 tests]# sysctl -w vm.min_free_kbytes=7800
vm.min_free_kbytes = 7800

[root@h4215025 tests]# sysctl -n vm.min_free_kbytes
7800
+++++++++++
Comment 1 IBM Bug Proxy 2013-07-02 01:31:01 EDT
Created attachment 767618 [details]
s390x var_log_msgs
Comment 2 IBM Bug Proxy 2013-07-02 01:31:15 EDT
Created attachment 767619 [details]
ppc64 var_log_msgs
Comment 3 Bohuslav "Slavek" Kabrda 2013-07-02 01:42:05 EDT
Huh, why is this assigned to Python?
Comment 4 IBM Bug Proxy 2013-07-02 04:11:11 EDT
------- Comment From vaish123@in.ibm.com 2013-07-02 08:05 EDT-------
(In reply to comment #7)
> Huh, why is this assigned to Python?

Sorry, my bad.
This has to be directed to security.
Comment 5 IBM Bug Proxy 2013-10-17 09:01:40 EDT
Created attachment 813314 [details]
dmesg output


------- Comment on attachment From iranna.ankad@in.ibm.com 2013-10-17 12:53 EDT-------


This bug is still reproducible in Fedora20 Alpha.

FYI,
[root@oc4352187340 bugs]# sysctl -n vm.min_free_kbytes
67584
[root@oc4352187340 bugs]# uname -a
Linux oc4352187340.ibm.com 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Nov 28 19:02:25 CET 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@oc4352187340 bugs]#

Note: attaching fresh dmesg output taken while running flail test
Comment 6 IBM Bug Proxy 2013-10-17 09:10:50 EDT
------- Comment From iranna.ankad@in.ibm.com 2013-10-17 13:02 EDT-------
(In reply to comment #9)
> Created attachment 82823 [details]
> dmesg output
> This bug is still reproducible in Fedora20 Alpha.
> FYI,
> [root@oc4352187340 bugs]# sysctl -n vm.min_free_kbytes
> 67584
> [root@oc4352187340 bugs]# uname -a
> Linux oc4352187340.ibm.com 2.6.32-279.19.1.el6.x86_64 #1 SMP Wed Nov 28
> 19:02:25 CET 2012 x86_64 x86_64 x86_64 GNU/Linux
> [root@oc4352187340 bugs]#

Please ignore o/p of above two commands, I realised that by mistake I ran them on my RH6.3 Thinkpad.
Here is the correct o/p from test system.

[root@ test]# uname -a
Linux  3.11.0-300.fc20.ppc64p7 #1 SMP Thu Sep 5 16:13:50 MST 2013 ppc64 ppc64 ppc64 GNU/Linux
[root@ test]# sysctl -n vm.min_free_kbytes
28930
[root@ test]#
Comment 7 IBM Bug Proxy 2013-10-22 15:11:18 EDT
------- Comment From gusld@br.ibm.com 2013-10-22 19:08 EDT-------
This bug is very confusing. I could not understand what is being asked here. The only call traces I could find in the logs are either flail segfaulting or the kernel complaining it could not complete a request to allocate several terabytes of memory.
There is clearly nothing wrong about the kernel call traces. Regarding the flail segfaults, flail is not part of Fedora, so this should be handled somewhere else other than the Fedora bugzilla.

Can we close this bug?
Comment 8 IBM Bug Proxy 2013-11-16 15:20:59 EST
------- Comment From iranna.ankad@in.ibm.com 2013-11-16 20:11 EDT-------
(In reply to comment #13)
> This bug is very confusing. I could not understand what is being asked here.
> The only call traces I could find in the logs are either flail segfaulting
> or the kernel complaining it could not complete a request to allocate
> several terabytes of memory.
> There is clearly nothing wrong about the kernel call traces. Regarding the
> flail segfaults, flail is not part of Fedora, so this should be handled
> somewhere else other than the Fedora bugzilla.
> Can we close this bug?

Hello Gustavo,
Though I could recreate this bug even in F20 beta, sort of I agree that above call traces give any sense of something wrong happening here.

Even I wonder why vmalloc is trying to allocate tera bytes of memory. Still..I tried to rerun flail test by increasing the allocation capacity of vmalloc to 7TB (70368000M in this case) and also tried setting the vm.min_free_kbytes to 1.6 TB. Nothing ..helped, above errors are still recreating.

Since flail is a system call security fuzzing test case, I would mainly look out for any system crash or kernel oops out of this test, so I am not much worried about above errors.
I am OK to close this for now.   Thanks!

[root@als0153 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.11.6-301.fc20.ppc64p7 root=UUID=be312012-4d7f-442e-8a02-7de8bb0e4323 ro vconsole.keymap=us vconsole.font=latarcyrheb-sun16 LANG=en_US.UTF-8 vmalloc=70368000M
[root@als0153 ~]# sysctl -n vm.min_free_kbytes
16354000
Comment 9 IBM Bug Proxy 2013-11-18 01:20:54 EST
------- Comment From vaish123@in.ibm.com 2013-11-18 06:18 EDT-------
(In reply to comment #13)
> This bug is very confusing. I could not understand what is being asked here.
> The only call traces I could find in the logs are either flail segfaulting
> or the kernel complaining it could not complete a request to allocate
> several terabytes of memory.
> There is clearly nothing wrong about the kernel call traces. Regarding the
> flail segfaults, flail is not part of Fedora, so this should be handled
> somewhere else other than the Fedora bugzilla.
> Can we close this bug?

-- Closing the bug on the IBM side --
Comment 10 Fedora End Of Life 2015-01-09 13:36:34 EST
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 11 Fedora End Of Life 2015-02-17 10:46:46 EST
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.