Bug 882479 - kernel nfsd invoked oom-killer repeatedly until there is substantially nothing running (hanging the machine)
Summary: kernel nfsd invoked oom-killer repeatedly until there is substantially nothin...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-12-01 09:44 UTC by Wendell Baker
Modified: 2013-08-01 02:20 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-08-01 02:20:45 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
/var/log/messages from 2012-11-29 -> 2012-11-30 showing oom-killer destroying the machine (1.18 MB, text/plain)
2012-12-01 09:44 UTC, Wendell Baker
no flags Details

Description Wendell Baker 2012-12-01 09:44:22 UTC
Created attachment 655465 [details]
/var/log/messages from 2012-11-29 -> 2012-11-30 showing oom-killer destroying the machine

Description of problem:

An nfs4 client is talking to an nfs4 server (both F17).  Over time the server executes oom-killer until there are few userland processes running.  The machine is unuseable and must be rebooted.

Version-Release number of selected component (if applicable):

$ rpm -q -a | grep -Ee '(kernel|nfs)' | sort ; uname -a
abrt-addon-kerneloops-2.0.16-1.fc17.i686
kernel-PAE-3.6.1-1.fc17.i686
kernel-PAE-3.6.2-4.fc17.i686
kernel-PAE-3.6.3-1.fc17.i686
libnfsidmap-0.25-3.fc17.i686
libreport-plugin-kerneloops-2.0.16-1.fc17.i686
nfs-utils-1.2.6-3.fc17.i686

Linux drippie 3.6.3-1.fc17.i686.PAE #1 SMP Mon Oct 22 15:48:37 UTC 2012 i686 i686 i386 GNU/Linux

How reproducible:

unclear ... but probably.
Attemping...

Steps to Reproduce:
1. nfs4 client (eggplants) runs an nightly rsync onto an nfs4 server (drippie)
2. the server machine becomes unuseable, ceases to answer nfs queries (hangs)
  
Actual results:

server (drippie) dies slowly

Expected results:

server must save itself, fail out the nfs queries if they are taking too much.

Additional info:

The gear is a Fujitsu Stylistic ST5100-series tablet i686 with 2GB of RAM.  It is running F17.  The HDD that it is serving are 4x USB-connected 6TB LUKS-encrypted ext4.

This is a "backup" server; its job is to slowly accrete work which is built elsewhere in a mirror-type fashion.   The rsync jobs are (probably) big because the filesystems are big, but they are executing off-server.

Context:

syslog from the timespan included nearby starting, fortuitously, from a fresh powercycle at 2012-11-20T17:54   The final oom-killer is at 2012-11-30T05:15

Early in this timespan there is an on-box rsync (see the 3x rsync processes).  But they do not appear later in the story.  Their last appearance is 2012-11-29T20:07, about two hours into the lifespan.


Of note ... the repeated stanzas wherein oom-killer picks off a process and destroys it.  Finally towards the end it kills off systemd-journal.  Manual intervention (powercycle) occurs at Dec  1 00:35:15



Example:

Nov 30 04:42:07 drippie kernel: [38836.908665] nfsd cpuset=/ mems_allowed=0
Nov 30 04:42:07 drippie kernel: [38836.908673] Pid: 1917, comm: nfsd Not tainted 3.6.3-1.fc17.i686.PAE #1
Nov 30 04:42:07 drippie kernel: [38836.908678] Call Trace:
Nov 30 04:42:07 drippie kernel: [38836.908696]  [<c096b1df>] dump_header.isra.11+0x82/0x1a9
Nov 30 04:42:07 drippie kernel: [38836.908709]  [<c067ea89>] ? ___ratelimit+0x89/0x100
Nov 30 04:42:07 drippie kernel: [38836.908720]  [<c050750a>] oom_kill_process+0x17a/0x2d0
Nov 30 04:42:07 drippie kernel: [38836.908731]  [<c0626e5c>] ? security_capable_noaudit+0x1c/0x30
Nov 30 04:42:07 drippie kernel: [38836.908742]  [<c0457793>] ? has_ns_capability_noaudit+0x13/0x20
Nov 30 04:42:07 drippie kernel: [38836.908751]  [<c04577b4>] ? has_capability_noaudit+0x14/0x20
Nov 30 04:42:07 drippie kernel: [38836.908759]  [<c05072ab>] ? oom_badness+0x9b/0xe0
Nov 30 04:42:07 drippie kernel: [38836.908767]  [<c0507a77>] out_of_memory+0x1f7/0x230
Nov 30 04:42:07 drippie kernel: [38836.908777]  [<c050ba96>] __alloc_pages_nodemask+0x716/0x7d0
Nov 30 04:42:07 drippie kernel: [38836.908838]  [<f7fc60c0>] svc_recv+0xe0/0x7c0 [sunrpc]
Nov 30 04:42:07 drippie kernel: [38836.908876]  [<f7fc5c83>] ? svc_xprt_release+0xc3/0x100 [sunrpc]
Nov 30 04:42:07 drippie kernel: [38836.908887]  [<c0972aa3>] ? _raw_spin_unlock_bh+0x13/0x20
Nov 30 04:42:07 drippie kernel: [38836.908898]  [<c047cc80>] ? try_to_wake_up+0x230/0x230
Nov 30 04:42:07 drippie kernel: [38836.908921]  [<f8027094>] nfsd+0x94/0x170 [nfsd]
Nov 30 04:42:07 drippie kernel: [38836.908932]  [<c0476e7e>] ? complete+0x4e/0x60
Nov 30 04:42:07 drippie kernel: [38836.908948]  [<f8027000>] ? 0xf8026fff
Nov 30 04:42:07 drippie kernel: [38836.908957]  [<c046b3e2>] kthread+0x72/0x80
Nov 30 04:42:07 drippie kernel: [38836.908967]  [<c046b370>] ? kthread_freezable_should_stop+0x60/0x60
Nov 30 04:42:07 drippie kernel: [38836.908978]  [<c0979cbe>] kernel_thread_helper+0x6/0x10
Nov 30 04:42:07 drippie kernel: [38836.908982] Mem-Info:
Nov 30 04:42:07 drippie kernel: [38836.908986] DMA per-cpu:
Nov 30 04:42:07 drippie kernel: [38836.908991] CPU    0: hi:    0, btch:   1 usd:   0
Nov 30 04:42:07 drippie kernel: [38836.908996] CPU    1: hi:    0, btch:   1 usd:   0
Nov 30 04:42:07 drippie kernel: [38836.909023] Normal per-cpu:
Nov 30 04:42:07 drippie kernel: [38836.909028] CPU    0: hi:  186, btch:  31 usd:  82
Nov 30 04:42:07 drippie kernel: [38836.909033] CPU    1: hi:  186, btch:  31 usd:  95
Nov 30 04:42:07 drippie kernel: [38836.909037] HighMem per-cpu:
Nov 30 04:42:07 drippie kernel: [38836.909042] CPU    0: hi:  186, btch:  31 usd:  61
Nov 30 04:42:07 drippie kernel: [38836.909047] CPU    1: hi:  186, btch:  31 usd:  43
Nov 30 04:42:07 drippie kernel: [38836.909059] active_anon:590 inactive_anon:5353 isolated_anon:0
Nov 30 04:42:07 drippie kernel: [38836.909059]  active_file:9539 inactive_file:89017 isolated_file:0
Nov 30 04:42:07 drippie kernel: [38836.909059]  unevictable:0 dirty:2 writeback:0 unstable:0
Nov 30 04:42:07 drippie kernel: [38836.909059]  free:202055 slab_reclaimable:182816 slab_unreclaimable:18493
Nov 30 04:42:07 drippie kernel: [38836.909059]  mapped:1230 shmem:1377 pagetables:85 bounce:0
Nov 30 04:42:07 drippie kernel: [38836.909081] DMA free:4228kB min:784kB low:980kB high:1176kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:2004kB slab_unreclaimable:288kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 30 04:42:07 drippie kernel: [38836.909087] lowmem_reserve[]: 0 863 2006 2006
Nov 30 04:42:07 drippie kernel: [38836.909110] Normal free:43856kB min:43912kB low:54888kB high:65868kB active_anon:0kB inactive_anon:0kB active_file:2916kB inactive_file:2944kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:883912kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:729260kB slab_unreclaimable:73684kB kernel_stack:1024kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9333 all_unreclaimable? yes
Nov 30 04:42:07 drippie kernel: [38836.909116] lowmem_reserve[]: 0 0 9147 9147
Nov 30 04:42:07 drippie kernel: [38836.909138] HighMem free:760136kB min:512kB low:15052kB high:29596kB active_anon:2360kB inactive_anon:21412kB active_file:35240kB inactive_file:353124kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1170820kB mlocked:0kB dirty:8kB writeback:0kB mapped:4916kB shmem:5508kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:340kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov 30 04:42:07 drippie kernel: [38836.909144] lowmem_reserve[]: 0 0 0 0
Nov 30 04:42:07 drippie kernel: [38836.909154] DMA: 25*4kB 10*8kB 7*16kB 7*32kB 8*64kB 7*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 4228kB
Nov 30 04:42:07 drippie kernel: [38836.909179] Normal: 4048*4kB 2690*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 1*4096kB = 43856kB
Nov 30 04:42:07 drippie kernel: [38836.909203] HighMem: 1966*4kB 2944*8kB 3255*16kB 1989*32kB 838*64kB 232*128kB 11*256kB 9*512kB 4*1024kB 23*2048kB 115*4096kB = 760136kB
Nov 30 04:42:07 drippie kernel: [38836.909228] 99933 total pagecache pages
Nov 30 04:42:07 drippie kernel: [38836.909232] 0 pages in swap cache
Nov 30 04:42:07 drippie kernel: [38836.909237] Swap cache stats: add 0, delete 0, find 0/0
Nov 30 04:42:07 drippie kernel: [38836.909241] Free swap  = 8290300kB
Nov 30 04:42:07 drippie kernel: [38836.909245] Total swap = 8290300kB
Nov 30 04:42:07 drippie kernel: [38836.923769] 521808 pages RAM
Nov 30 04:42:07 drippie kernel: [38836.923770] 295010 pages HighMem
Nov 30 04:42:07 drippie kernel: [38836.923771] 7792 pages reserved
Nov 30 04:42:07 drippie kernel: [38836.923772] 91670 pages shared
Nov 30 04:42:07 drippie kernel: [38836.923774] 221732 pages non-shared
Nov 30 04:42:07 drippie kernel: [38836.923776] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Nov 30 04:42:07 drippie kernel: [38836.923792] [  468]     0   468     1565      812       5        0         -1000 udevd
Nov 30 04:42:07 drippie kernel: [38836.923799] [  704]     0   704     2754      227       4        0         -1000 auditd
Nov 30 04:42:07 drippie kernel: [38836.923804] [  938]     0   938     2465      654       8        0         -1000 sshd
Nov 30 04:42:07 drippie kernel: [38836.923809] [ 1642]     0  1642     1558      750       5        0         -1000 udevd
Nov 30 04:42:07 drippie kernel: [38836.923814] [ 1746]     0  1746     1558      743       5        0         -1000 udevd
Nov 30 04:42:07 drippie kernel: [38836.923823] [ 9775]     0  9775    11935      418      10        0             0 rsyslogd
Nov 30 04:42:07 drippie kernel: [38836.923828] [ 9788]     0  9788     2936      341      16        0             0 systemd-journal
Nov 30 04:42:07 drippie kernel: [38836.923830] Out of memory: Kill process 9775 (rsyslogd) score 0 or sacrifice child
Nov 30 04:42:07 drippie kernel: [38836.923834] Killed process 9775 (rsyslogd) total-vm:47740kB, anon-rss:372kB, file-rss:1300kB

Comment 1 Fedora End Of Life 2013-07-03 23:33:35 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Fedora End Of Life 2013-08-01 02:20:55 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.