== Comment: #0 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:20:10 == Description of problem: Am getting below exception & call trace in "var/log/messages" while running vmmstress test case on a power system installed with F20-Beta. Vmmstress basically stresses the memory, check for errors & system health if the memory is stressed to the range of 80-85%. Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362458] CPU: 75 PID: 0 Comm: swapper/75 Not tainted 3.11.6-301.fc20.ppc64p7 #1 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362467] Call Trace: Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362481] [c000001effd5b760] [c000000000014b80] .show_stack+0x130/0x200 (unreliable) Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362502] [c000001effd5b830] [c00000000083ec1c] .dump_stack+0x88/0xb4 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362514] [c000001effd5b8b0] [c0000000001c2800] .warn_alloc_failed+0x110/0x160 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362523] [c000001effd5b960] [c0000000001c67ac] .__alloc_pages_nodemask+0x7ac/0xa00 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362535] [c000001effd5baf0] [c00000000021731c] .alloc_pages_current+0xbc/0x1a0 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362553] [c000001effd5bb90] [d0000000100811ec] .be_post_rx_frags+0x2ac/0x3a0 [be2net] Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362564] [c000001effd5bc70] [d000000010087fc0] .be_poll+0x880/0xa20 [be2net] Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362577] [c000001effd5bdb0] [c0000000006e275c] .net_rx_action+0x1dc/0x310 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362588] [c000001effd5be90] [c00000000009f208] .__do_softirq+0x158/0x330 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362598] [c000001effd5bf90] [c000000000023768] .call_do_softirq+0x14/0x24 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362606] [c0000007e22d3620] [c0000000000109e4] .do_softirq+0xf4/0x130 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362615] [c0000007e22d36b0] [c00000000009f618] .irq_exit+0xc8/0x110 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362623] [c0000007e22d3730] [c00000000001060c] .do_IRQ+0xdc/0x2c0 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362632] [c0000007e22d37e0] [c0000000000023d4] hardware_interrupt_common+0x154/0x180 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362645] --- Exception: 501 at .snooze_loop+0xd8/0x1c0 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362645] LR = .snooze_loop+0x120/0x1c0 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362662] [c0000007e22d3ad0] [c0000007e22d3b80] 0xc0000007e22d3b80 (unreliable) Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362673] [c0000007e22d3b80] [c00000000068d86c] .cpuidle_enter_state+0x6c/0x150 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362682] [c0000007e22d3c30] [c00000000068da48] .cpuidle_idle_call+0xf8/0x310 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362692] [c0000007e22d3ce0] [c00000000006ede8] .pseries_lpar_idle+0x18/0x60 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362701] [c0000007e22d3d50] [c00000000001697c] .arch_cpu_idle+0x6c/0x160 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362712] [c0000007e22d3dd0] [c000000000107300] .cpu_startup_entry+0x1c0/0x2d0 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362722] [c0000007e22d3eb0] [c00000000003f8a0] .start_secondary+0x390/0x450 Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362733] [c0000007e22d3f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14 How reproducible: I tried running only once & got this exception & trace Version-Release number of selected component like OS, packages etc [root@transam2s-lp1 ~]# uname -a Linux transam2s-lp1.xxxx.ibm.com 3.11.6-301.fc20.ppc64p7 #1 SMP Mon Oct 21 18:49:17 MST 2013 ppc64 ppc64 ppc64 GNU/Linux Steps to Reproduce: 1) Stress 85% of memory using vmmstress on a full system installed with F20-Beta. Actual results: Getting above mentioned exception & call trace Expected results: I should not get any such exceptions. Additional info: Attaching varlogmessages & sosreport == Comment: #2 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:50:38 == == Comment: #3 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:52:14 == == Comment: #4 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:53:06 == == Comment: #5 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:53:42 == == Comment: #6 - Onkar N. Mahajan <onmahaja.com> - 2013-11-22 05:12:13 == == Comment: #7 - Anton Blanchard <antonb.com> - 2013-11-22 06:14:06 == You missed the first line of the failure, it is a page allocation issue. Modifying the bug summary to be more accurate. swapper/75: page allocation failure: order:0, mode:0x20 CPU: 75 PID: 0 Comm: swapper/75 Not tainted 3.11.6-301.fc20.ppc64p7 #1 Call Trace: [c000001effd5b760] [c000000000014b80] .show_stack+0x130/0x200 (unreliable) [c000001effd5b830] [c00000000083ec1c] .dump_stack+0x88/0xb4 [c000001effd5b8b0] [c0000000001c2800] .warn_alloc_failed+0x110/0x160 [c000001effd5b960] [c0000000001c67ac] .__alloc_pages_nodemask+0x7ac/0xa00 [c000001effd5baf0] [c00000000021731c] .alloc_pages_current+0xbc/0x1a0 [c000001effd5bb90] [d0000000100811ec] .be_post_rx_frags+0x2ac/0x3a0 [be2net] [c000001effd5bc70] [d000000010087fc0] .be_poll+0x880/0xa20 [be2net] [c000001effd5bdb0] [c0000000006e275c] .net_rx_action+0x1dc/0x310 [c000001effd5be90] [c00000000009f208] .__do_softirq+0x158/0x330 [c000001effd5bf90] [c000000000023768] .call_do_softirq+0x14/0x24 [c0000007e22d3620] [c0000000000109e4] .do_softirq+0xf4/0x130 [c0000007e22d36b0] [c00000000009f618] .irq_exit+0xc8/0x110 [c0000007e22d3730] [c00000000001060c] .do_IRQ+0xdc/0x2c0 [c0000007e22d37e0] [c0000000000023d4] hardware_interrupt_common+0x154/0x180 --- Exception: 501 at .snooze_loop+0xd8/0x1c0 LR = .snooze_loop+0x120/0x1c0 [c0000007e22d3ad0] [c0000007e22d3b80] 0xc0000007e22d3b80 (unreliable) [c0000007e22d3b80] [c00000000068d86c] .cpuidle_enter_state+0x6c/0x150 [c0000007e22d3c30] [c00000000068da48] .cpuidle_idle_call+0xf8/0x310 [c0000007e22d3ce0] [c00000000006ede8] .pseries_lpar_idle+0x18/0x60 [c0000007e22d3d50] [c00000000001697c] .arch_cpu_idle+0x6c/0x160 [c0000007e22d3dd0] [c000000000107300] .cpu_startup_entry+0x1c0/0x2d0 == Comment: #11 - Onkar N. Mahajan <onmahaja.com> - 2013-11-25 23:43:23 == == Comment: #12 - Onkar N. Mahajan <onmahaja.com> - 2013-11-25 23:46:19 == == Comment: #15 - Onkar N. Mahajan <onmahaja.com> - 2013-11-26 00:49:11 == Hi Murilo, Seems like we have an issue with be2net driver. I did a bit of debugging ... # gdb ./vmlinux GNU gdb (GDB) Fedora 7.6.50.20130731-12.fc20 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word". .. Reading symbols from /root/rpmbuild/BUILD/kernel-3.11.fc20/linux-3.11.6-301.fc20.ppc64/vmlinux...done. (gdb) list *(be_post_rx_frags+0x2ac) No symbol "be_post_rx_frags" in current context. (gdb) list *(net_rx_action+0x1dc) 0xc0000000006fdacc is in net_rx_action (net/core/dev.c:4316). 4311 * actually make the ->poll() call. Therefore we avoid 4312 * accidentally calling ->poll() when NAPI is not scheduled. 4313 */ 4314 work = 0; 4315 if (test_bit(NAPI_STATE_SCHED, &n->state)) { 4316 work = n->poll(n, weight); 4317 trace_napi_poll(n); 4318 } 4319 4320 WARN_ON_ONCE(work > weight); (gdb) add-symbol-file drivers/net/ethernet/emulex/benet/ .be2net.ko.cmd .be_main.o.cmd be.h be_cmds.c be_hw.h be_roce.o .be2net.mod.o.cmd .be_roce.o.cmd be2net.ko be_cmds.h be_main.c built-in.o .be2net.o.cmd .built-in.o.cmd be2net.mod.c be_cmds.o be_main.o modules.order .be_cmds.o.cmd Kconfig be2net.mod.o be_ethtool.c be_roce.c .be_ethtool.o.cmd Makefile be2net.o be_ethtool.o be_roce.h (gdb) add-symbol-file drivers/net/ethernet/emulex/benet/be2net.ko 0xd00000000ff00000 -s .data 0xd00000000ff197a0 -s .bss 0xd00000000ff1af50 add symbol table from file "drivers/net/ethernet/emulex/benet/be2net.ko" at .text_addr = 0xd00000000ff00000 .data_addr = 0xd00000000ff197a0 .bss_addr = 0xd00000000ff1af50 (y or n) y Reading symbols from /root/rpmbuild/BUILD/kernel-3.11.fc20/linux-3.11.6-301.fc20.ppc64/drivers/net/ethernet/emulex/benet/be2net.ko...done. (gdb) list *(be_poll+0x880) 0xd00000000ff07fc0 is in be_poll (drivers/net/ethernet/emulex/benet/be_main.c:2231). 2226 2227 if (work_done) { 2228 be_cq_notify(adapter, rx_cq->id, true, work_done); 2229 2230 if (atomic_read(&rxo->q.used) < RX_FRAGS_REFILL_WM) 2231 be_post_rx_frags(rxo, GFP_ATOMIC); 2232 } 2233 2234 return work_done; 2235 } (gdb) list *(be_post_rx_frags+0x2ac) 0xd00000000ff011ec is in be_post_rx_frags (include/linux/gfp.h:334). 329 extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order); 330 331 static inline struct page * 332 alloc_pages(gfp_t gfp_mask, unsigned int order) 333 { 334 return alloc_pages_current(gfp_mask, order); 335 } 336 extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, 337 struct vm_area_struct *vma, unsigned long addr, 338 int node); (gdb) Above , see that in drivers/net/ethernet/emulex/benet/be_main.c:2231 , be_post_rx_frags() is called with GFP_ATOMIC while be_poll() is running in softirq context , shouldn't be_post_rx_frags be GFP_KERNEL flag ? Please suggest ... == Comment: #17 - Murilo Fossa Vicentini <muvic.com> - 2013-12-02 07:00:49 == Hi Onkar, The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag since it will not allow to block the memory allocation to try to free up more space to satisfy the request, but the former is more suitable in an interrupt context where it can't have its context switched off and you want the interrupt handlers to finish up as quickly as possible to not keep interrupts blocked for long. == Comment: #18 - Onkar N. Mahajan <onmahaja.com> - 2013-12-03 23:55:38 == (In reply to comment #17) > Hi Onkar, > > The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag > since it will not allow to block the memory allocation to try to free up > more space to satisfy the request, but the former is more suitable in an > interrupt context where it can't have its context switched off and you want > the interrupt handlers to finish up as quickly as possible to not keep > interrupts blocked for long. So you mean the driver is doing right thing by allocating memory GFP_ATOMIC , and that it should behave in a way it has ( simply fail , but never block ) here under memory pressure as in this case ? Thx, Onkar == Comment: #19 - Luciano Chavez <chavez.com> - 2013-12-04 20:55:26 == (In reply to comment #18) > (In reply to comment #17) > > Hi Onkar, > > > > The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag > > since it will not allow to block the memory allocation to try to free up > > more space to satisfy the request, but the former is more suitable in an > > interrupt context where it can't have its context switched off and you want > > the interrupt handlers to finish up as quickly as possible to not keep > > interrupts blocked for long. > > So you mean the driver is doing right thing by allocating memory GFP_ATOMIC > , and that it should behave in a way it has ( simply fail , but never block > ) here under memory pressure as in this case ? > > Thx, > Onkar Hi Onkar, For an interrupt context allocation, yes. Ideally, the driver should handle the allocation failure in such a way that it ends up re-queuing the work if it can so that it can re-try at a later time. Hopefully by then a task time allocation request caused the mm code to block that request, reclaim pages by shrinking slab, etc. and so the next time a GFP_ATOMIC calls is done, it succeeds.
Created attachment 926658 [details] varlogmessages from the affected machine default comment by bridge
Created attachment 926659 [details] vmmstress1.log default comment by bridge
Created attachment 926660 [details] vmmstress1.out default comment by bridge
Created attachment 926661 [details] vmmstress2.log default comment by bridge
Created attachment 926662 [details] vmmstress2.out default comment by bridge
Created attachment 926663 [details] SOS report default comment by bridge
Created attachment 926664 [details] o/p of lspci -vvv default comment by bridge
Created attachment 926665 [details] o/p of lsmod default comment by bridge
I don't see anything I would consider a bug here. The machine is under memory pressure and the driver can't allocate memory from the atomic pool. Also, 3.11.y is very very old at this point. You should really update your kernel to the latest available.