+++ This bug was initially created as a clone of Bug #1398315 +++ Description of problem: ====================== this bug I am raising, because even after the file is completely written to the brick(with one brick down) the memory is not getting cleared. Hence a very high chance of memory leak. This is seen in both brick process and fuse client Fuse client: I check in interval of 10min post the write was complete and didn't see any change in memory consumed 2951 root 20 0 86.157g 0.014t 0 S 0.3 93.3 27:39.73 glusterfs PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2951 root 20 0 86.157g 0.014t 0 S 2.0 93.3 27:43.90 glusterfs same with brick process PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1431 root 20 0 608672 24324 4256 S 0.0 0.3 0:01.99 glusterd 3914 root 20 0 4461344 3.097g 4348 S 0.0 40.5 15:00.45 glusterfsd 3937 root 20 0 672724 31104 3092 S 0.0 0.4 0:01.82 glusterfs PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1431 root 20 0 608672 24324 4256 S 0.0 0.3 0:02.00 glusterd 3914 root 20 0 4461344 3.097g 4348 S 0.0 40.5 15:00.45 glusterfsd 3937 root 20 0 672724 31104 3092 S 0.0 0.4 0:01.84 glusterfs Version-Release number of selected component (if applicable): ========== 3.8.4-5 Steps to Reproduce: 1. create a 1x2 vol 2. enable compound fops, fuse mount the volume on a client 3. keep track of the memory consumption by both the brick processes and the client process 4. create a 10 gb file with dd 5. after about 5gb is written bring down one brick Now after the file is completely written, note down the memory consumed by brick and the fuse client Now leave the setup idle and check after 15min. You don't see any freed up memory Note: I would like to track them as two different issues. However on RCA if we find that the root cause is same, then we can go ahead and dup one of them to the other --- Additional comment from nchilaka on 2016-11-24 07:48:52 EST --- and here comes the OOM Kill :) [Thu Nov 24 18:13:50 2016] glusterfs invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [Thu Nov 24 18:13:50 2016] glusterfs cpuset=/ mems_allowed=0-1 [Thu Nov 24 18:13:50 2016] CPU: 3 PID: 2953 Comm: glusterfs Not tainted 3.10.0-510.el7.x86_64 #1 [Thu Nov 24 18:13:50 2016] Hardware name: Supermicro X9DRW-3LN4F+/X9DRW-3TF+/X9DRW-3LN4F+/X9DRW-3TF+, BIOS 1.0b 05/29/2012 [Thu Nov 24 18:13:50 2016] ffff880475a93ec0 000000002e7f78c3 ffff88046bb23990 ffffffff81685ccc [Thu Nov 24 18:13:50 2016] ffff88046bb23a20 ffffffff81680c77 ffffffff812ae65b ffff880476e27d00 [Thu Nov 24 18:13:50 2016] ffff880476e27d18 ffffffff00000202 fffeefff00000000 0000000000000001 [Thu Nov 24 18:13:50 2016] Call Trace: [Thu Nov 24 18:13:50 2016] [<ffffffff81685ccc>] dump_stack+0x19/0x1b [Thu Nov 24 18:13:50 2016] [<ffffffff81680c77>] dump_header+0x8e/0x225 [Thu Nov 24 18:13:50 2016] [<ffffffff812ae65b>] ? cred_has_capability+0x6b/0x120 [Thu Nov 24 18:13:50 2016] [<ffffffff8113cb03>] ? delayacct_end+0x33/0xb0 [Thu Nov 24 18:13:50 2016] [<ffffffff8118460e>] oom_kill_process+0x24e/0x3c0 [Thu Nov 24 18:13:50 2016] [<ffffffff81184e46>] out_of_memory+0x4b6/0x4f0 [Thu Nov 24 18:13:50 2016] [<ffffffff81681780>] __alloc_pages_slowpath+0x5d7/0x725 [Thu Nov 24 18:13:50 2016] [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420 [Thu Nov 24 18:13:50 2016] [<ffffffff811d209a>] alloc_pages_vma+0x9a/0x150 [Thu Nov 24 18:13:50 2016] [<ffffffff811c2e8b>] read_swap_cache_async+0xeb/0x160 [Thu Nov 24 18:13:50 2016] [<ffffffff811c2fa8>] swapin_readahead+0xa8/0x110 [Thu Nov 24 18:13:50 2016] [<ffffffff811b120c>] handle_mm_fault+0xb1c/0xfe0 [Thu Nov 24 18:13:50 2016] [<ffffffff81691794>] __do_page_fault+0x154/0x450 [Thu Nov 24 18:13:50 2016] [<ffffffff81691ac5>] do_page_fault+0x35/0x90 [Thu Nov 24 18:13:50 2016] [<ffffffff8168dfc0>] ? bstep_iret+0xf/0xf [Thu Nov 24 18:13:50 2016] [<ffffffff8168dd88>] page_fault+0x28/0x30 [Thu Nov 24 18:13:50 2016] Mem-Info: [Thu Nov 24 18:13:50 2016] active_anon:3322839 inactive_anon:510929 isolated_anon:0 active_file:174 inactive_file:754 isolated_file:0 unevictable:0 dirty:0 writeback:136 unstable:0 slab_reclaimable:11575 slab_unreclaimable:22836 mapped:291 shmem:742 pagetables:45178 bounce:0 free:32274 free_pcp:30 free_cma:0 [Thu Nov 24 18:13:50 2016] Node 0 DMA free:15848kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 1763 7777 7777 [Thu Nov 24 18:13:50 2016] Node 0 DMA32 free:33960kB min:10020kB low:12524kB high:15028kB active_anon:1239140kB inactive_anon:445404kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2052108kB managed:1807368kB mlocked:0kB dirty:0kB writeback:0kB mapped:612kB shmem:608kB slab_reclaimable:1464kB slab_unreclaimable:8624kB kernel_stack:336kB pagetables:3624kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 6014 6014 [Thu Nov 24 18:13:50 2016] Node 0 Normal free:34060kB min:34180kB low:42724kB high:51268kB active_anon:4993472kB inactive_anon:713816kB active_file:632kB inactive_file:3244kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6291456kB managed:6158340kB mlocked:0kB dirty:0kB writeback:344kB mapped:456kB shmem:2316kB slab_reclaimable:13080kB slab_unreclaimable:44936kB kernel_stack:3136kB pagetables:50900kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:20840 all_unreclaimable? yes [Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0 [Thu Nov 24 18:13:50 2016] Node 1 Normal free:45228kB min:45820kB low:57272kB high:68728kB active_anon:7058744kB inactive_anon:884496kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:8388608kB managed:8255248kB mlocked:0kB dirty:0kB writeback:200kB mapped:96kB shmem:44kB slab_reclaimable:31756kB slab_unreclaimable:37784kB kernel_stack:2400kB pagetables:126188kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:8515 all_unreclaimable? yes [Thu Nov 24 18:13:50 2016] lowmem_reserve[]: 0 0 0 0 [Thu Nov 24 18:13:50 2016] Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15848kB [Thu Nov 24 18:13:50 2016] Node 0 DMA32: 1013*4kB (UM) 688*8kB (UEM) 266*16kB (UEM) 54*32kB (UEM) 16*64kB (UEM) 6*128kB (EM) 7*256kB (EM) 5*512kB (M) 8*1024kB (UEM) 2*2048kB (M) 0*4096kB = 33972kB [Thu Nov 24 18:13:50 2016] Node 0 Normal: 206*4kB (UEM) 158*8kB (UEM) 87*16kB (UEM) 59*32kB (UEM) 87*64kB (UEM) 38*128kB (UM) 24*256kB (UE) 6*512kB (UEM) 10*1024kB (M) 0*2048kB 0*4096kB = 35256kB [Thu Nov 24 18:13:50 2016] Node 1 Normal: 164*4kB (UEM) 114*8kB (UEM) 68*16kB (UEM) 47*32kB (UEM) 17*64kB (UEM) 6*128kB (UM) 11*256kB (UEM) 35*512kB (UM) 19*1024kB (UM) 0*2048kB 0*4096kB = 46208kB [Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [Thu Nov 24 18:13:50 2016] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [Thu Nov 24 18:13:50 2016] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [Thu Nov 24 18:13:50 2016] 28778 total pagecache pages [Thu Nov 24 18:13:50 2016] 27044 pages in swap cache [Thu Nov 24 18:13:50 2016] Swap cache stats: add 2112210, delete 2085166, find 18057/22414 [Thu Nov 24 18:13:50 2016] Free swap = 0kB [Thu Nov 24 18:13:50 2016] Total swap = 8257532kB [Thu Nov 24 18:13:50 2016] 4187026 pages RAM [Thu Nov 24 18:13:50 2016] 0 pages HighMem/MovableOnly [Thu Nov 24 18:13:50 2016] 127825 pages reserved [Thu Nov 24 18:13:50 2016] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [Thu Nov 24 18:13:50 2016] [ 731] 0 731 9204 172 21 49 0 systemd-journal [Thu Nov 24 18:13:50 2016] [ 752] 0 752 67411 0 34 608 0 lvmetad [Thu Nov 24 18:13:50 2016] [ 768] 0 768 11319 1 23 245 -1000 systemd-udevd [Thu Nov 24 18:13:50 2016] [ 1091] 0 1091 13854 23 28 87 -1000 auditd [Thu Nov 24 18:13:50 2016] [ 1113] 0 1113 4860 81 14 38 0 irqbalance [Thu Nov 24 18:13:50 2016] [ 1116] 81 1116 8207 95 17 52 -900 dbus-daemon [Thu Nov 24 18:13:50 2016] [ 1119] 997 1119 28962 47 26 50 0 chronyd [Thu Nov 24 18:13:50 2016] [ 1127] 998 1127 132067 81 55 1658 0 polkitd [Thu Nov 24 18:13:50 2016] [ 1128] 0 1128 6048 43 16 30 0 systemd-logind [Thu Nov 24 18:13:50 2016] [ 1131] 0 1131 31556 26 19 130 0 crond [Thu Nov 24 18:13:50 2016] [ 1141] 0 1141 81800 261 82 4781 0 firewalld [Thu Nov 24 18:13:50 2016] [ 1148] 0 1148 27509 1 10 31 0 agetty [Thu Nov 24 18:13:50 2016] [ 1150] 0 1150 109534 294 68 345 0 NetworkManager [Thu Nov 24 18:13:50 2016] [ 1250] 0 1250 28206 1 55 3122 0 dhclient [Thu Nov 24 18:13:50 2016] [ 1508] 0 1508 54944 164 38 135 0 rsyslogd [Thu Nov 24 18:13:50 2016] [ 1511] 0 1511 138288 91 89 2576 0 tuned [Thu Nov 24 18:13:50 2016] [ 1516] 0 1516 28335 1 11 38 0 rhsmcertd [Thu Nov 24 18:13:50 2016] [ 1538] 0 1538 20617 25 42 189 -1000 sshd [Thu Nov 24 18:13:50 2016] [ 1552] 0 1552 26971 0 9 24 0 rhnsd [Thu Nov 24 18:13:50 2016] [ 2331] 0 2331 22244 16 41 239 0 master [Thu Nov 24 18:13:50 2016] [ 2363] 89 2363 22270 15 44 235 0 pickup [Thu Nov 24 18:13:50 2016] [ 2365] 89 2365 22287 14 44 236 0 qmgr [Thu Nov 24 18:13:50 2016] [ 2869] 0 2869 35726 28 71 291 0 sshd [Thu Nov 24 18:13:50 2016] [ 2873] 0 2873 29316 81 15 492 0 bash [Thu Nov 24 18:13:50 2016] [ 2951] 0 2951 22585439 3780372 43885 2041242 0 glusterfs [Thu Nov 24 18:13:50 2016] [ 2969] 0 2969 35726 26 68 291 0 sshd [Thu Nov 24 18:13:50 2016] [ 2973] 0 2973 28846 72 14 39 0 bash [Thu Nov 24 18:13:50 2016] [ 2998] 0 2998 31927 68 17 70 0 screen [Thu Nov 24 18:13:50 2016] [ 2999] 0 2999 38218 4753 32 4734 0 bash [Thu Nov 24 18:13:50 2016] [ 3674] 0 3674 35726 316 72 0 0 sshd [Thu Nov 24 18:13:50 2016] [ 3678] 0 3678 28846 109 12 0 0 bash [Thu Nov 24 18:13:50 2016] [ 3815] 0 3815 130941 18547 177 0 0 yum [Thu Nov 24 18:13:50 2016] Out of memory: Kill process 2951 (glusterfs) score 929 or sacrifice child [Thu Nov 24 18:13:50 2016] Killed process 2951 (glusterfs) total-vm:90341756kB, anon-rss:15121488kB, file-rss:0kB, shmem-rss:0kB
REVIEW: http://review.gluster.org/15965 (protocol/server: Fix mem-leaks in compound fops) posted (#1) for review on master by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/15965 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 955d9700397fda6ada269fc3077116b7756702a5 Author: Krutika Dhananjay <kdhananj> Date: Tue Nov 29 12:56:40 2016 +0530 protocol/server: Fix mem-leaks in compound fops * Remove spurious 'return' statement. * Free up 'compound_rsp_array_val' as well in the end. * Remove multiple refs on this_args->xdata. Change-Id: I212c6dbe4d81b0381c1323d05fdfcc853886b25b BUG: 1399578 Signed-off-by: Krutika Dhananjay <kdhananj> Reviewed-on: http://review.gluster.org/15965 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Jeff Darcy <jdarcy> Reviewed-by: Pranith Kumar Karampuri <pkarampu>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/