1129974 – FC20 GFP_ATOMIC page allocation failure in be_post_rx_frags (Emulex Ethernet)

Bug 1129974 - FC20 GFP_ATOMIC page allocation failure in be_post_rx_frags (Emulex Ethernet)

Summary: FC20 GFP_ATOMIC page allocation failure in be_post_rx_frags (Emulex Ethernet)

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	20
Hardware:	ppc64
OS:	All
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-08-14 06:11 UTC by IBM Bug Proxy
Modified:	2014-08-14 12:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-08-14 12:58:21 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
varlogmessages from the affected machine (1.38 MB, text/plain) 2014-08-14 06:11 UTC, IBM Bug Proxy	no flags	Details
vmmstress1.log (238 bytes, text/plain) 2014-08-14 06:11 UTC, IBM Bug Proxy	no flags	Details
vmmstress1.out (1.56 MB, text/plain) 2014-08-14 06:11 UTC, IBM Bug Proxy	no flags	Details
vmmstress2.log (19.83 KB, text/plain) 2014-08-14 06:11 UTC, IBM Bug Proxy	no flags	Details
vmmstress2.out (123.38 KB, text/plain) 2014-08-14 06:11 UTC, IBM Bug Proxy	no flags	Details
SOS report (6.36 MB, application/x-xz) 2014-08-14 06:12 UTC, IBM Bug Proxy	no flags	Details
o/p of lspci -vvv (29.21 KB, application/octet-stream) 2014-08-14 06:12 UTC, IBM Bug Proxy	no flags	Details
o/p of lsmod (2.89 KB, application/octet-stream) 2014-08-14 06:12 UTC, IBM Bug Proxy	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
IBM Linux Technology Center	100280	0	None	None	None	Never

Description IBM Bug Proxy 2014-08-14 06:11:33 UTC

Comment 1 IBM Bug Proxy 2014-08-14 06:11:34 UTC

== Comment: #0 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:20:10 ==
Description of problem: Am getting below exception & call trace in "var/log/messages" while running vmmstress test case on a power system installed with F20-Beta. Vmmstress basically stresses the memory, check for errors & system health if the memory is stressed to the range of 80-85%.

Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362458] CPU: 75 PID: 0 Comm: swapper/75 Not tainted 3.11.6-301.fc20.ppc64p7 #1
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362467] Call Trace:
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362481] [c000001effd5b760] [c000000000014b80] .show_stack+0x130/0x200 (unreliable)
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362502] [c000001effd5b830] [c00000000083ec1c] .dump_stack+0x88/0xb4
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362514] [c000001effd5b8b0] [c0000000001c2800] .warn_alloc_failed+0x110/0x160
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362523] [c000001effd5b960] [c0000000001c67ac] .__alloc_pages_nodemask+0x7ac/0xa00
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362535] [c000001effd5baf0] [c00000000021731c] .alloc_pages_current+0xbc/0x1a0
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362553] [c000001effd5bb90] [d0000000100811ec] .be_post_rx_frags+0x2ac/0x3a0 [be2net]
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362564] [c000001effd5bc70] [d000000010087fc0] .be_poll+0x880/0xa20 [be2net]
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362577] [c000001effd5bdb0] [c0000000006e275c] .net_rx_action+0x1dc/0x310
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362588] [c000001effd5be90] [c00000000009f208] .__do_softirq+0x158/0x330
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362598] [c000001effd5bf90] [c000000000023768] .call_do_softirq+0x14/0x24
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362606] [c0000007e22d3620] [c0000000000109e4] .do_softirq+0xf4/0x130
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362615] [c0000007e22d36b0] [c00000000009f618] .irq_exit+0xc8/0x110
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362623] [c0000007e22d3730] [c00000000001060c] .do_IRQ+0xdc/0x2c0
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362632] [c0000007e22d37e0] [c0000000000023d4] hardware_interrupt_common+0x154/0x180
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362645] --- Exception: 501 at .snooze_loop+0xd8/0x1c0
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362645]     LR = .snooze_loop+0x120/0x1c0
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362662] [c0000007e22d3ad0] [c0000007e22d3b80] 0xc0000007e22d3b80 (unreliable)
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362673] [c0000007e22d3b80] [c00000000068d86c] .cpuidle_enter_state+0x6c/0x150
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362682] [c0000007e22d3c30] [c00000000068da48] .cpuidle_idle_call+0xf8/0x310
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362692] [c0000007e22d3ce0] [c00000000006ede8] .pseries_lpar_idle+0x18/0x60
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362701] [c0000007e22d3d50] [c00000000001697c] .arch_cpu_idle+0x6c/0x160
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362712] [c0000007e22d3dd0] [c000000000107300] .cpu_startup_entry+0x1c0/0x2d0
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362722] [c0000007e22d3eb0] [c00000000003f8a0] .start_secondary+0x390/0x450
Nov 21 12:50:06 transam2s-lp1 kernel: [99120.362733] [c0000007e22d3f90] [c00000000000976c] .start_secondary_prolog+0x10/0x14

How reproducible: I tried running only once & got this exception & trace

Version-Release number of selected component like OS, packages etc
[root@transam2s-lp1 ~]# uname -a
Linux transam2s-lp1.xxxx.ibm.com 3.11.6-301.fc20.ppc64p7 #1 SMP Mon Oct 21 18:49:17 MST 2013 ppc64 ppc64 ppc64 GNU/Linux

Steps to Reproduce:

1) Stress 85% of memory using vmmstress on a full system installed with F20-Beta.

Actual results: Getting above mentioned exception & call trace

Expected results: I should not get any such exceptions.

Additional info: Attaching varlogmessages & sosreport

== Comment: #2 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:50:38 ==


== Comment: #3 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:52:14 ==


== Comment: #4 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:53:06 ==


== Comment: #5 - Shubham Goyal <shubgoya.com> - 2013-11-22 04:53:42 ==


== Comment: #6 - Onkar N. Mahajan <onmahaja.com> - 2013-11-22 05:12:13 ==


== Comment: #7 - Anton Blanchard <antonb.com> - 2013-11-22 06:14:06 ==
You missed the first line of the failure, it is a page allocation issue. Modifying the bug summary to be more accurate.

swapper/75: page allocation failure: order:0, mode:0x20
CPU: 75 PID: 0 Comm: swapper/75 Not tainted 3.11.6-301.fc20.ppc64p7 #1
Call Trace:
[c000001effd5b760] [c000000000014b80] .show_stack+0x130/0x200 (unreliable)
[c000001effd5b830] [c00000000083ec1c] .dump_stack+0x88/0xb4
[c000001effd5b8b0] [c0000000001c2800] .warn_alloc_failed+0x110/0x160
[c000001effd5b960] [c0000000001c67ac] .__alloc_pages_nodemask+0x7ac/0xa00
[c000001effd5baf0] [c00000000021731c] .alloc_pages_current+0xbc/0x1a0
[c000001effd5bb90] [d0000000100811ec] .be_post_rx_frags+0x2ac/0x3a0 [be2net]
[c000001effd5bc70] [d000000010087fc0] .be_poll+0x880/0xa20 [be2net]
[c000001effd5bdb0] [c0000000006e275c] .net_rx_action+0x1dc/0x310
[c000001effd5be90] [c00000000009f208] .__do_softirq+0x158/0x330
[c000001effd5bf90] [c000000000023768] .call_do_softirq+0x14/0x24
[c0000007e22d3620] [c0000000000109e4] .do_softirq+0xf4/0x130
[c0000007e22d36b0] [c00000000009f618] .irq_exit+0xc8/0x110
[c0000007e22d3730] [c00000000001060c] .do_IRQ+0xdc/0x2c0
[c0000007e22d37e0] [c0000000000023d4] hardware_interrupt_common+0x154/0x180
--- Exception: 501 at .snooze_loop+0xd8/0x1c0
    LR = .snooze_loop+0x120/0x1c0
[c0000007e22d3ad0] [c0000007e22d3b80] 0xc0000007e22d3b80 (unreliable)
[c0000007e22d3b80] [c00000000068d86c] .cpuidle_enter_state+0x6c/0x150
[c0000007e22d3c30] [c00000000068da48] .cpuidle_idle_call+0xf8/0x310
[c0000007e22d3ce0] [c00000000006ede8] .pseries_lpar_idle+0x18/0x60
[c0000007e22d3d50] [c00000000001697c] .arch_cpu_idle+0x6c/0x160
[c0000007e22d3dd0] [c000000000107300] .cpu_startup_entry+0x1c0/0x2d0

== Comment: #11 - Onkar N. Mahajan <onmahaja.com> - 2013-11-25 23:43:23 ==


== Comment: #12 - Onkar N. Mahajan <onmahaja.com> - 2013-11-25 23:46:19 ==


== Comment: #15 - Onkar N. Mahajan <onmahaja.com> - 2013-11-26 00:49:11 ==
Hi Murilo, 

Seems like we have an issue with be2net driver. I did a bit of debugging ... 

# gdb ./vmlinux 
GNU gdb (GDB) Fedora 7.6.50.20130731-12.fc20
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "ppc64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
..
Reading symbols from /root/rpmbuild/BUILD/kernel-3.11.fc20/linux-3.11.6-301.fc20.ppc64/vmlinux...done.
(gdb) list *(be_post_rx_frags+0x2ac)
No symbol "be_post_rx_frags" in current context.
(gdb) list *(net_rx_action+0x1dc)
0xc0000000006fdacc is in net_rx_action (net/core/dev.c:4316).
4311			 * actually make the ->poll() call.  Therefore we avoid
4312			 * accidentally calling ->poll() when NAPI is not scheduled.
4313			 */
4314			work = 0;
4315			if (test_bit(NAPI_STATE_SCHED, &n->state)) {
4316				work = n->poll(n, weight);
4317				trace_napi_poll(n);
4318			}
4319	
4320			WARN_ON_ONCE(work > weight);
(gdb) add-symbol-file drivers/net/ethernet/emulex/benet/
.be2net.ko.cmd     .be_main.o.cmd     be.h               be_cmds.c          be_hw.h            be_roce.o
.be2net.mod.o.cmd  .be_roce.o.cmd     be2net.ko          be_cmds.h          be_main.c          built-in.o
.be2net.o.cmd      .built-in.o.cmd    be2net.mod.c       be_cmds.o          be_main.o          modules.order
.be_cmds.o.cmd     Kconfig            be2net.mod.o       be_ethtool.c       be_roce.c          
.be_ethtool.o.cmd  Makefile           be2net.o           be_ethtool.o       be_roce.h          
(gdb) add-symbol-file drivers/net/ethernet/emulex/benet/be2net.ko 0xd00000000ff00000 -s .data 0xd00000000ff197a0 -s .bss 0xd00000000ff1af50
add symbol table from file "drivers/net/ethernet/emulex/benet/be2net.ko" at
	.text_addr = 0xd00000000ff00000
	.data_addr = 0xd00000000ff197a0
	.bss_addr = 0xd00000000ff1af50
(y or n) y
Reading symbols from /root/rpmbuild/BUILD/kernel-3.11.fc20/linux-3.11.6-301.fc20.ppc64/drivers/net/ethernet/emulex/benet/be2net.ko...done.
(gdb) list *(be_poll+0x880)
0xd00000000ff07fc0 is in be_poll (drivers/net/ethernet/emulex/benet/be_main.c:2231).
2226	
2227		if (work_done) {
2228			be_cq_notify(adapter, rx_cq->id, true, work_done);
2229	
2230			if (atomic_read(&rxo->q.used) < RX_FRAGS_REFILL_WM)
2231				be_post_rx_frags(rxo, GFP_ATOMIC);
2232		}
2233	
2234		return work_done;
2235	}
(gdb) list *(be_post_rx_frags+0x2ac)
0xd00000000ff011ec is in be_post_rx_frags (include/linux/gfp.h:334).
329	extern struct page *alloc_pages_current(gfp_t gfp_mask, unsigned order);
330	
331	static inline struct page *
332	alloc_pages(gfp_t gfp_mask, unsigned int order)
333	{
334		return alloc_pages_current(gfp_mask, order);
335	}
336	extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
337				struct vm_area_struct *vma, unsigned long addr,
338				int node);
(gdb) 


Above , see that in drivers/net/ethernet/emulex/benet/be_main.c:2231 , be_post_rx_frags()
is called with GFP_ATOMIC while be_poll() is running in softirq context , shouldn't be_post_rx_frags  be 
GFP_KERNEL flag ? 

Please suggest  ...

== Comment: #17 - Murilo Fossa Vicentini <muvic.com> - 2013-12-02 07:00:49 ==
Hi Onkar,

The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag since it will not allow to block the memory allocation to try to free up more space to satisfy the request, but the former is more suitable in an interrupt context where it can't have its context switched off and you want the interrupt handlers to finish up as quickly as possible to not keep interrupts blocked for long.

== Comment: #18 - Onkar N. Mahajan <onmahaja.com> - 2013-12-03 23:55:38 ==
(In reply to comment #17)
> Hi Onkar,
> 
> The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag
> since it will not allow to block the memory allocation to try to free up
> more space to satisfy the request, but the former is more suitable in an
> interrupt context where it can't have its context switched off and you want
> the interrupt handlers to finish up as quickly as possible to not keep
> interrupts blocked for long.

So you mean the driver is doing right thing by allocating memory GFP_ATOMIC ,  and that it should behave in a way it has ( simply fail , but never block )  here under memory pressure as in this case ?

Thx, 
Onkar

== Comment: #19 - Luciano Chavez <chavez.com> - 2013-12-04 20:55:26 ==
(In reply to comment #18)
> (In reply to comment #17)
> > Hi Onkar,
> > 
> > The GFP_ATOMIC flag has a higher chance to failure than the GFP_KERNEL flag
> > since it will not allow to block the memory allocation to try to free up
> > more space to satisfy the request, but the former is more suitable in an
> > interrupt context where it can't have its context switched off and you want
> > the interrupt handlers to finish up as quickly as possible to not keep
> > interrupts blocked for long.
> 
> So you mean the driver is doing right thing by allocating memory GFP_ATOMIC
> ,  and that it should behave in a way it has ( simply fail , but never block
> )  here under memory pressure as in this case ?
> 
> Thx, 
> Onkar

Hi Onkar,

For an interrupt context allocation, yes. Ideally, the driver should handle the allocation failure in such a way that it ends up re-queuing the work if it can so that it can re-try at a later time. Hopefully by then a task time allocation request caused the mm code to block that request, reclaim pages by shrinking slab, etc. and so the next time a GFP_ATOMIC calls is done, it succeeds.

Comment 2 IBM Bug Proxy 2014-08-14 06:11:42 UTC

Created attachment 926658 [details]
varlogmessages from the affected machine

default comment by bridge

Comment 3 IBM Bug Proxy 2014-08-14 06:11:44 UTC

Created attachment 926659 [details]
vmmstress1.log

default comment by bridge

Comment 4 IBM Bug Proxy 2014-08-14 06:11:49 UTC

Created attachment 926660 [details]
vmmstress1.out

default comment by bridge

Comment 5 IBM Bug Proxy 2014-08-14 06:11:50 UTC

Created attachment 926661 [details]
vmmstress2.log

default comment by bridge

Comment 6 IBM Bug Proxy 2014-08-14 06:11:52 UTC

Created attachment 926662 [details]
vmmstress2.out

default comment by bridge

Comment 7 IBM Bug Proxy 2014-08-14 06:12:05 UTC

Created attachment 926663 [details]
SOS report

default comment by bridge

Comment 8 IBM Bug Proxy 2014-08-14 06:12:08 UTC

Created attachment 926664 [details]
o/p of lspci -vvv

default comment by bridge

Comment 9 IBM Bug Proxy 2014-08-14 06:12:10 UTC

Created attachment 926665 [details]
o/p of lsmod

default comment by bridge

Comment 10 Josh Boyer 2014-08-14 12:58:21 UTC

I don't see anything I would consider a bug here.  The machine is under memory pressure and the driver can't allocate memory from the atomic pool.  Also, 3.11.y is very very old at this point.  You should really update your kernel to the latest available.

Note You need to log in before you can comment on or make changes to this bug.