Bug 626851

Summary:	Server freeze when multiple clients are writing large files
Product:	[Fedora] Fedora	Reporter:	Jonathan Dieter <jonathan>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED NOTABUG	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	low
Version:	13	CC:	anton, aquini, dougsland, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, toby
Target Milestone:	---
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2010-08-27 16:23:27 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Jonathan Dieter 2010-08-24 14:56:22 UTC

Description of problem:
I'm running a Fedora 13 nfs server with around a hundred clients. I wanted to compare performance with our old CentOS nfs server, so I ran bonnie++ on about 30 of the clients simultaneously onto separate directories on the server (/networld/bonnie/$HOSTNAME where /networld is the nfs directory).

After a time ranging from two minutes to ten minutes, the server locks up hard with nothing on the display (it's not running X, but there's no panic on the screen), the keyboard not responding, and nothing in the logs. The only solution is to reset the server.

Switching kernels doesn't fix anything, neither does switching to nfs3, and neither does changing the number of nfs servers from 64 (my default) back to 8.

Version-Release number of selected component (if applicable):
kernel-2.6.33.6-147.2.4.fc13.x86_64
kernel-2.6.34.4-41.fc13.x86_64
kernel-2.6.35.2-9.fc14.x86_64.rpm

How reproducible:
Always

Steps to Reproduce:
1. Set up nfs server
2. Install bonnie++ on 30 clients
3. Run bonnie++ on nfs directory

Actual results:
Server crashes

Expected results:
Completed bonnie++ runs (after a very long delay, of course)

Additional info:
The server does only have 2GB of RAM, but I should at least be getting an OOM error if it is running out of memory, and running free in a 1s loop doesn't show me even coming close to running out of memory. I've run memdisk on the server and it seems to think the RAM is fine. I also ran the test on a different server, and it froze too.

Honestly, my biggest fear is that we're going to have hard lockups when school starts again and we actually have students logging onto the computers.

Comment 1 Chuck Ebbert 2010-08-25 07:09:05 UTC

Is it really completely locked up? Do the numlock/capslock keys stop lighting their LEDs when you press them? Does it respond to SysRq? (Add "sysrq_always_enabled ignore_loglevel" to the boot options and see if alt-sysrq-p shows any output.)

Also, can you try a Fedora 12 kernel?

Comment 2 Jonathan Dieter 2010-08-25 08:55:46 UTC

Ok, thanks for the boot options, I got something different this time.  It looks like I'm running out of RAM, though I'm not sure why, given that nfs on btrfs on drbd is the only thing running on this server.  I've pasted the first crash from dmesg (don't know why it gave me a few minutes of network access this time before it completely froze).

The server is now completely frozen and num lock and caps lock don't change the light indicators or anything.  The display has been blanked because I wasn't on it for 15 minutes or whatever it is, but alt-sysrq-b doesn't reboot the system.

Let me know if you need the full dmesg output, or if you still want me to try with a F12 kernel.

swapper: page allocation failure. order:0, mode:0x4020
Pid: 0, comm: swapper Not tainted 2.6.34.4-41.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810d1424>] __alloc_pages_nodemask+0x5b2/0x635
 [<ffffffff810f82b7>] alloc_pages_current+0x95/0x9e
 [<ffffffff810ff828>] alloc_slab_page+0x1b/0x5b
 [<ffffffff810ff8d7>] new_slab+0x6f/0x1d4
 [<ffffffff81100266>] __slab_alloc+0x220/0x3ae
 [<ffffffff8139dafe>] ? __netdev_alloc_skb+0x34/0x51
 [<ffffffff8110112c>] __kmalloc_node_track_caller+0xb8/0x118
 [<ffffffff8139dafe>] ? __netdev_alloc_skb+0x34/0x51
 [<ffffffff8139d6b8>] __alloc_skb+0x80/0x170
 [<ffffffff8139dafe>] __netdev_alloc_skb+0x34/0x51
 [<ffffffffa0128883>] rtl8169_rx_interrupt+0x223/0x4e0 [r8169]
 [<ffffffffa012b560>] rtl8169_poll+0x3c/0x1a6 [r8169]
 [<ffffffff8106b3ae>] ? sched_clock_cpu+0xc3/0xce
 [<ffffffff813a57af>] net_rx_action+0xaf/0x1ca
 [<ffffffff81010261>] ? sched_clock+0x9/0xd
 [<ffffffff81053245>] __do_softirq+0xe5/0x1a6
 [<ffffffff8109e115>] ? handle_IRQ_event+0x60/0x121
 [<ffffffff8100ab5c>] call_softirq+0x1c/0x30
 [<ffffffff8100c342>] do_softirq+0x46/0x83
 [<ffffffff810530b6>] irq_exit+0x3b/0x7d
 [<ffffffff814525b4>] do_IRQ+0xac/0xc3
 [<ffffffff8144cd13>] ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff810117c5>] ? mwait_idle+0x7a/0x87
 [<ffffffff81011777>] ? mwait_idle+0x2c/0x87
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81445695>] start_secondary+0x253/0x294
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 178
CPU    1: hi:  186, btch:  31 usd: 197
active_anon:6377 inactive_anon:2136 isolated_anon:0
 active_file:207231 inactive_file:208770 isolated_file:0
 unevictable:0 dirty:4681 writeback:98529 unstable:0
 free:2474 slab_reclaimable:12039 slab_unreclaimable:10757
 mapped:698 shmem:1405 pagetables:790 bounce:0
Node 0 DMA free:7876kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:1308kB inactive_file:6736kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15356kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:20kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 1966 1966 1966
Node 0 DMA32 free:2268kB min:5652kB low:7064kB high:8476kB active_anon:25508kB inactive_anon:8544kB active_file:827616kB inactive_file:828088kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2014096kB mlocked:0kB dirty:18724kB writeback:394116kB mapped:2792kB shmem:5620kB slab_reclaimable:48136kB slab_unreclaimable:43008kB kernel_stack:1728kB pagetables:3160kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 2*4kB 5*8kB 5*16kB 4*32kB 3*64kB 4*128kB 1*256kB 1*512kB 2*1024kB 2*2048kB 0*4096kB = 7872kB
Node 0 DMA32: 455*4kB 12*8kB 0*16kB 3*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2268kB
417348 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
515327 pages RAM
10651 pages reserved
424422 pages shared
84890 pages non-shared
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
  cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 3, min order: 0
  node 0: slabs: 1390, objs: 7274, free: 0

Comment 3 Chuck Ebbert 2010-08-25 09:33:11 UTC

Oh, it's an r8169 card. I would recommend trying the boot option "pcie_aspm=off" first, and then try replacing the card if that doesn't help. The driver for that card has had lots of problems in the past.

Comment 4 Jonathan Dieter 2010-08-25 11:37:58 UTC

Swapped the hard drives into a different machine with an e1000e card.  Here's the output from the first crash.  However, the machine hasn't frozen yet.  It just keeps going, while these errors keep showing up on the screen.

swapper: page allocation failure. order:0, mode:0x4020
Pid: 0, comm: swapper Not tainted 2.6.34.4-41.fc13.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff810d1424>] __alloc_pages_nodemask+0x5b2/0x635
 [<ffffffff810f82b7>] alloc_pages_current+0x95/0x9e
 [<ffffffff810ff828>] alloc_slab_page+0x1b/0x5b
 [<ffffffff810ff8d7>] new_slab+0x6f/0x1d4
 [<ffffffff81100266>] __slab_alloc+0x220/0x3ae
 [<ffffffff8139dafe>] ? __netdev_alloc_skb+0x34/0x51
 [<ffffffff8110112c>] __kmalloc_node_track_caller+0xb8/0x118
 [<ffffffff8139dafe>] ? __netdev_alloc_skb+0x34/0x51
 [<ffffffff8139d6b8>] __alloc_skb+0x80/0x170
 [<ffffffff8139dafe>] __netdev_alloc_skb+0x34/0x51
 [<ffffffffa0104e78>] netdev_alloc_skb_ip_align+0x16/0x2c [e1000e]
 [<ffffffffa0104f0a>] e1000_alloc_rx_buffers+0x7c/0x15e [e1000e]
 [<ffffffffa01052be>] e1000_clean_rx_irq+0x260/0x29b [e1000e]
 [<ffffffff813a6a83>] ? dev_kfree_skb_any+0x38/0x3a
 [<ffffffffa010657d>] e1000_clean+0x7a/0x224 [e1000e]
 [<ffffffff813a57af>] net_rx_action+0xaf/0x1ca
 [<ffffffff81053245>] __do_softirq+0xe5/0x1a6
 [<ffffffff8109e115>] ? handle_IRQ_event+0x60/0x121
 [<ffffffff8100ab5c>] call_softirq+0x1c/0x30
 [<ffffffff8100c342>] do_softirq+0x46/0x83
 [<ffffffff810530b6>] irq_exit+0x3b/0x7d
 [<ffffffff814525b4>] do_IRQ+0xac/0xc3
 [<ffffffff8144cd13>] ret_from_intr+0x0/0x11
 <EOI>  [<ffffffff810117c5>] ? mwait_idle+0x7a/0x87
 [<ffffffff81011777>] ? mwait_idle+0x2c/0x87
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81434e9a>] rest_init+0x7e/0x80
 [<ffffffff81baee6f>] start_kernel+0x413/0x41e
 [<ffffffff81bae2c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81bae3c4>] x86_64_start_kernel+0xf8/0x107
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 105
CPU    1: hi:  186, btch:  31 usd:  92
active_anon:4192 inactive_anon:4226 isolated_anon:0
 active_file:84917 inactive_file:97670 isolated_file:0
 unevictable:0 dirty:5399 writeback:40684 unstable:0
 free:1349 slab_reclaimable:7161 slab_unreclaimable:7232
 mapped:495 shmem:1374 pagetables:776 bounce:0
Node 0 DMA free:3960kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:3816kB inactive_file:8032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15288kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:68kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 984 984 984
Node 0 DMA32 free:1436kB min:3980kB low:4972kB high:5968kB active_anon:16768kB inactive_anon:16904kB active_file:335852kB inactive_file:382648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1007976kB mlocked:0kB dirty:21596kB writeback:162736kB mapped:1980kB shmem:5496kB slab_reclaimable:28576kB slab_unreclaimable:28908kB kernel_stack:1688kB pagetables:3104kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:96 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 9*8kB 8*16kB 5*32kB 2*64kB 3*128kB 2*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 3960kB
Node 0 DMA32: 165*4kB 7*8kB 5*16kB 0*32kB 3*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1500kB
183950 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 0kB
Total swap = 0kB
259839 pages RAM
6616 pages reserved
190261 pages shared
67349 pages non-shared
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
  cache: kmalloc-2048, object size: 2048, buffer size: 2048, default order: 3, min order: 0
  node 0: slabs: 561, objs: 5966, free: 0

Comment 5 Jonathan Dieter 2010-08-25 13:08:26 UTC

Just to update this, the server is still up a couple of hours later, but most of the clients been disconnected, which gives bonnie++ stale NFS handles when they reconnect.

I'm assuming the clients are disconnecting because when the server runs out of RAM, it's unable to send some packets.

I can go ahead and throw loads more RAM at this, but I'd like to know where it's going.  Output of free right now:

             total       used       free     shared    buffers     cached
Mem:       1012896    1002664      10232          0       2776     738348
-/+ buffers/cache:     261540     751356
Swap:            0          0          0

Comment 6 Chuck Ebbert 2010-08-25 21:56:46 UTC

So at least the e1000e driver can deal with memory allocation failures.

/proc/meminfo should give you an overview of memory usage, and slabtop should show a breakdown of what is using kernel memory allocations.

Comment 7 Chuck Ebbert 2010-08-27 09:56:31 UTC

Oh wait, you have lots of free memory there. Try increasing /proc/sys/vm/min_free_kbytes to 32000 or more.

Comment 8 Jonathan Dieter 2010-08-27 14:02:50 UTC

Ok, setting /proc/sys/vm/min_free_kbytes to 64000 seems to have fixed the problem.  I'm not getting anything in the logs about being unable to allocate RAM.  Is this one of those "that's a knob you change on servers" things, or is there a real bug here?

I'm am still running into an issue where bonnie++ on several of the clients is unable to open the file it has just created, but there's no error message on the server or client.

I'm assuming this would be best reported as a separate bug, seeing as it probably has nothing to do with memory allocation problems.

Comment 9 Chuck Ebbert 2010-08-27 16:23:27 UTC

(In reply to comment #8)
> Ok, setting /proc/sys/vm/min_free_kbytes to 64000 seems to have fixed the
> problem.  I'm not getting anything in the logs about being unable to allocate
> RAM.  Is this one of those "that's a knob you change on servers" things, or is
> there a real bug here?
> 

Yes, this is one of those things that heavily-loaded servers need to change from the default. I'll make a note to add it to the KernelCommonProblems wiki page.

> I'm am still running into an issue where bonnie++ on several of the clients is
> unable to open the file it has just created, but there's no error message on
> the server or client.
> 
> I'm assuming this would be best reported as a separate bug, seeing as it
> probably has nothing to do with memory allocation problems.

Report a new bug.

Comment 10 Jonathan Dieter 2010-08-27 16:31:30 UTC

Ok, thanks much for helping me track this down.  I really appreciate it.