Bug 1399578 - [compound FOPs]: Memory leak while doing FOPs with brick down
Summary: [compound FOPs]: Memory leak while doing FOPs with brick down
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Krutika Dhananjay
QA Contact:
URL:
Whiteboard:
Depends On: 1398315
Blocks: 1399891
TreeView+ depends on / blocked
 
Reported: 2016-11-29 10:39 UTC by Krutika Dhananjay
Modified: 2017-03-06 17:37 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1398315
: 1399891 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:37:12 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Krutika Dhananjay 2016-11-29 10:39:48 UTC
+++ This bug was initially created as a clone of Bug #1398315 +++

Description of problem:
======================
this bug I am raising, because even after the file is completely written to the brick(with one brick down) the memory is not getting cleared. Hence a very high chance of memory leak. This is seen in both brick process and fuse client

Fuse client: I check in interval of 10min post the write was complete and didn't see any change in memory consumed
2951 root      20   0 86.157g 0.014t      0 S   0.3 93.3  27:39.73 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2951 root      20   0 86.157g 0.014t      0 S   2.0 93.3  27:43.90 glusterfs



same with brick process
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:01.99 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.82 glusterfs
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1431 root      20   0  608672  24324   4256 S   0.0  0.3   0:02.00 glusterd
 3914 root      20   0 4461344 3.097g   4348 S   0.0 40.5  15:00.45 glusterfsd
 3937 root      20   0  672724  31104   3092 S   0.0  0.4   0:01.84 glusterfs

Version-Release number of selected component (if applicable):
==========
3.8.4-5

Steps to Reproduce:
1. create a 1x2 vol
2. enable compound fops, fuse mount the volume on a client
3. keep track of the memory consumption by both  the brick processes and the client process
4. create a 10 gb file with dd
5. after about 5gb is written bring down one brick

Now after the file is completely written, note down the memory consumed by brick and the fuse client

Now leave the setup idle and check after 15min.
You don't see any freed up memory


Note: I would like to track them as two different issues. However on RCA if we find that the root cause is same, then we can go ahead and dup one of them to the other

--- Additional comment from nchilaka on 2016-11-24 07:48:52 EST ---

and here comes the OOM Kill :)
[Thu Nov 24 18:13:50 2016] glusterfs invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[Thu Nov 24 18:13:50 2016] glusterfs cpuset=/ mems_allowed=0-1
[Thu Nov 24 18:13:50 2016] CPU: 3 PID: 2953 Comm: glusterfs Not tainted 3.10.0-510.el7.x86_64 #1
[Thu Nov 24 18:13:50 2016] Hardware name: Supermicro X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 1.0b 05/29/2012
[Thu Nov 24 18:13:50 2016]  ffff880475a93ec0 000000002e7f78c3 ffff88046bb23990 ffffffff81685ccc
[Thu Nov 24 18:13:50 2016]  ffff88046bb23a20 ffffffff81680c77 ffffffff812ae65b ffff880476e27d00
[Thu Nov 24 18:13:50 2016]  ffff880476e27d18 ffffffff00000202 fffeefff00000000 0000000000000001
[Thu Nov 24 18:13:50 2016] Call Trace:
[Thu Nov 24 18:13:50 2016]  [<ffffffff81685ccc>] dump_stack+0x19/0x1b
[Thu Nov 24 18:13:50 2016]  [<ffffffff81680c77>] dump_header+0x8e/0x225
[Thu Nov 24 18:13:50 2016]  [<ffffffff812ae65b>] ? cred_has_capability+0x6b/0x120
[Thu Nov 24 18:13:50 2016]  [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81681780>] __alloc_pages_slowpath+0x5d7/0x725
[Thu Nov 24 18:13:50 2016]  [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420
[Thu Nov 24 18:13:50 2016]  [<ffffffff811d209a>] alloc_pages_vma+0x9a/0x150
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2e8b>] read_swap_cache_async+0xeb/0x160
[Thu Nov 24 18:13:50 2016]  [<ffffffff811c2fa8>] swapin_readahead+0xa8/0x110
[Thu Nov 24 18:13:50 2016]  [<ffffffff811b120c>] handle_mm_fault+0xb1c/0xfe0
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691794>] __do_page_fault+0x154/0x450
[Thu Nov 24 18:13:50 2016]  [<ffffffff81691ac5>] do_page_fault+0x35/0x90
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dfc0>] ? bstep_iret+0xf/0xf
[Thu Nov 24 18:13:50 2016]  [<ffffffff8168dd88>] page_fault+0x28/0x30
[Thu Nov 24 18:13:50 2016] Mem-Info:
[Thu Nov 24 18:13:50 2016] active_anon:3322839 inactive_anon:510929 isolated_anon:0
 active_file:174 inactive_file:754 isolated_file:0
 unevictable:0 dirty:0 writeback:136 unstable:0
 slab_reclaimable:11575 slab_unreclaimable:22836
 mapped:291 shmem:742 pagetables:45178 bounce:0
 free:32274 free_pcp:30 free_cma:0
[Thu Nov 24 18:13:50 2016] Node 0 DMA free:15848kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 1763 7777 7777
[Thu Nov 24 18:13:50 2016] Node 0 DMA32 free:33960kB min:10020kB low:12524kB high:15028kB active_anon:1239140kB inactive_anon:445404kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052108kB managed:1807368kB mlocked:0kB dirty:0kB writeback:0kB mapped:612kB shmem:608kB slab_reclaimable:1464kB slab_unreclaimable:8624kB kernel_stack:336kB pagetables:3624kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 6014 6014
[Thu Nov 24 18:13:50 2016] Node 0 Normal free:34060kB min:34180kB low:42724kB high:51268kB active_anon:4993472kB inactive_anon:713816kB active_file:632kB inactive_file:3244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6291456kB managed:6158340kB mlocked:0kB dirty:0kB writeback:344kB mapped:456kB shmem:2316kB slab_reclaimable:13080kB slab_unreclaimable:44936kB kernel_stack:3136kB pagetables:50900kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20840 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 1 Normal free:45228kB min:45820kB low:57272kB high:68728kB active_anon:7058744kB inactive_anon:884496kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:8388608kB managed:8255248kB mlocked:0kB dirty:0kB writeback:200kB mapped:96kB shmem:44kB slab_reclaimable:31756kB slab_unreclaimable:37784kB kernel_stack:2400kB pagetables:126188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8515 all_unreclaimable? yes
[Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0
[Thu Nov 24 18:13:50 2016] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15848kB
[Thu Nov 24 18:13:50 2016] Node 0 DMA32: 1013*4kB (UM) 688*8kB (UEM) 266*16kB (UEM) 54*32kB (UEM) 16*64kB (UEM) 6*128kB (EM) 7*256kB (EM) 5*512kB (M) 8*1024kB (UEM) 2*2048kB (M) 0*4096kB = 33972kB
[Thu Nov 24 18:13:50 2016] Node 0 Normal: 206*4kB (UEM) 158*8kB (UEM) 87*16kB (UEM) 59*32kB (UEM) 87*64kB (UEM) 38*128kB (UM) 24*256kB (UE) 6*512kB (UEM) 10*1024kB (M) 0*2048kB 0*4096kB = 35256kB
[Thu Nov 24 18:13:50 2016] Node 1 Normal: 164*4kB (UEM) 114*8kB (UEM) 68*16kB (UEM) 47*32kB (UEM) 17*64kB (UEM) 6*128kB (UM) 11*256kB (UEM) 35*512kB (UM) 19*1024kB (UM) 0*2048kB 0*4096kB = 46208kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Thu Nov 24 18:13:50 2016] 28778 total pagecache pages
[Thu Nov 24 18:13:50 2016] 27044 pages in swap cache
[Thu Nov 24 18:13:50 2016] Swap cache stats: add 2112210, delete 2085166, find 18057/22414
[Thu Nov 24 18:13:50 2016] Free swap  = 0kB
[Thu Nov 24 18:13:50 2016] Total swap = 8257532kB
[Thu Nov 24 18:13:50 2016] 4187026 pages RAM
[Thu Nov 24 18:13:50 2016] 0 pages HighMem/MovableOnly
[Thu Nov 24 18:13:50 2016] 127825 pages reserved
[Thu Nov 24 18:13:50 2016] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[Thu Nov 24 18:13:50 2016] [  731]     0   731     9204      172      21       49             0 systemd-journal
[Thu Nov 24 18:13:50 2016] [  752]     0   752    67411        0      34      608             0 lvmetad
[Thu Nov 24 18:13:50 2016] [  768]     0   768    11319        1      23      245         -1000 systemd-udevd
[Thu Nov 24 18:13:50 2016] [ 1091]     0  1091    13854       23      28       87         -1000 auditd
[Thu Nov 24 18:13:50 2016] [ 1113]     0  1113     4860       81      14       38             0 irqbalance
[Thu Nov 24 18:13:50 2016] [ 1116]    81  1116     8207       95      17       52          -900 dbus-daemon
[Thu Nov 24 18:13:50 2016] [ 1119]   997  1119    28962       47      26       50             0 chronyd
[Thu Nov 24 18:13:50 2016] [ 1127]   998  1127   132067       81      55     1658             0 polkitd
[Thu Nov 24 18:13:50 2016] [ 1128]     0  1128     6048       43      16       30             0 systemd-logind
[Thu Nov 24 18:13:50 2016] [ 1131]     0  1131    31556       26      19      130             0 crond
[Thu Nov 24 18:13:50 2016] [ 1141]     0  1141    81800      261      82     4781             0 firewalld
[Thu Nov 24 18:13:50 2016] [ 1148]     0  1148    27509        1      10       31             0 agetty
[Thu Nov 24 18:13:50 2016] [ 1150]     0  1150   109534      294      68      345             0 NetworkManager
[Thu Nov 24 18:13:50 2016] [ 1250]     0  1250    28206        1      55     3122             0 dhclient
[Thu Nov 24 18:13:50 2016] [ 1508]     0  1508    54944      164      38      135             0 rsyslogd
[Thu Nov 24 18:13:50 2016] [ 1511]     0  1511   138288       91      89     2576             0 tuned
[Thu Nov 24 18:13:50 2016] [ 1516]     0  1516    28335        1      11       38             0 rhsmcertd
[Thu Nov 24 18:13:50 2016] [ 1538]     0  1538    20617       25      42      189         -1000 sshd
[Thu Nov 24 18:13:50 2016] [ 1552]     0  1552    26971        0       9       24             0 rhnsd
[Thu Nov 24 18:13:50 2016] [ 2331]     0  2331    22244       16      41      239             0 master
[Thu Nov 24 18:13:50 2016] [ 2363]    89  2363    22270       15      44      235             0 pickup
[Thu Nov 24 18:13:50 2016] [ 2365]    89  2365    22287       14      44      236             0 qmgr
[Thu Nov 24 18:13:50 2016] [ 2869]     0  2869    35726       28      71      291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2873]     0  2873    29316       81      15      492             0 bash
[Thu Nov 24 18:13:50 2016] [ 2951]     0  2951 22585439  3780372   43885  2041242             0 glusterfs
[Thu Nov 24 18:13:50 2016] [ 2969]     0  2969    35726       26      68      291             0 sshd
[Thu Nov 24 18:13:50 2016] [ 2973]     0  2973    28846       72      14       39             0 bash
[Thu Nov 24 18:13:50 2016] [ 2998]     0  2998    31927       68      17       70             0 screen
[Thu Nov 24 18:13:50 2016] [ 2999]     0  2999    38218     4753      32     4734             0 bash
[Thu Nov 24 18:13:50 2016] [ 3674]     0  3674    35726      316      72        0             0 sshd
[Thu Nov 24 18:13:50 2016] [ 3678]     0  3678    28846      109      12        0             0 bash
[Thu Nov 24 18:13:50 2016] [ 3815]     0  3815   130941    18547     177        0             0 yum
[Thu Nov 24 18:13:50 2016] Out of memory: Kill process 2951 (glusterfs) score 929 or sacrifice child
[Thu Nov 24 18:13:50 2016] Killed process 2951 (glusterfs) total-vm:90341756kB, anon-rss:15121488kB, file-rss:0kB, shmem-rss:0kB

Comment 1 Worker Ant 2016-11-29 10:42:29 UTC
REVIEW: http://review.gluster.org/15965 (protocol/server: Fix mem-leaks in compound fops) posted (#1) for review on master by Krutika Dhananjay (kdhananj)

Comment 2 Worker Ant 2016-11-30 14:36:39 UTC
COMMIT: http://review.gluster.org/15965 committed in master by Pranith Kumar Karampuri (pkarampu) 
------
commit 955d9700397fda6ada269fc3077116b7756702a5
Author: Krutika Dhananjay <kdhananj>
Date:   Tue Nov 29 12:56:40 2016 +0530

    protocol/server: Fix mem-leaks in compound fops
    
    * Remove spurious 'return' statement.
    * Free up 'compound_rsp_array_val' as well in the end.
    * Remove multiple refs on this_args->xdata.
    
    Change-Id: I212c6dbe4d81b0381c1323d05fdfcc853886b25b
    BUG: 1399578
    Signed-off-by: Krutika Dhananjay <kdhananj>
    Reviewed-on: http://review.gluster.org/15965
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Reviewed-by: Jeff Darcy <jdarcy>
    Reviewed-by: Pranith Kumar Karampuri <pkarampu>

Comment 3 Shyamsundar 2017-03-06 17:37:12 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.