1302759 – Exceptional memory consumed by glusterfs daemon on client

Bug 1302759 - Exceptional memory consumed by glusterfs daemon on client

Summary: Exceptional memory consumed by glusterfs daemon on client

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	core
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Ravishankar N
QA Contact:	Anoop
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-01-28 14:54 UTC by Peter Portante
Modified:	2019-01-10 18:12 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-02-07 04:26:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
State dump before drop cache executed (530.79 KB, text/plain) 2016-02-06 04:29 UTC, Peter Portante	no flags	Details
State dump after drop caches performed (557.37 KB, text/plain) 2016-02-06 04:38 UTC, Peter Portante	no flags	Details
View All

Description Peter Portante 2016-01-28 14:54:52 UTC

We have a 6 node GlusterFS volume which is mounted on two clients, but only used from when.

When running a "find -H . -user root -ls -exec chown -h user:user {} ;" on a subdirectory of the 22 TB volume (7TB used), the client process started out quite large, around 17 GB, and has since grown of the last three days this process has been running to about 41.5GB, and is till growing.

Info on the setup:

6 Nodes
  2 Socket Dell R510s
  12 core Intel Westmere CPUs in each socket
  64 GB of memory
  12 1 TB disks
  2 10Gb links to large Cisco switch
RHEL 6.6, RHGS 3.6.0.53-1.el6rhs
6 lvm volumes (2 disks per volume) for bricks
Distributed-Replicated, 12x3, 22 TB Volume
Options
  performance.readdir-ahead: on
  performance.io-cache: off
  performance.stat-prefetch: on
  cluster.lookup-unhashed: off
  client.event-threads: 8
  cluster.read-hash-mode: 2
  auto-delete: disable
  snap-max-soft-limit: 90
  snap-max-hard-limit: 256

Comment 2 Peter Portante 2016-01-28 14:56:44 UTC

The volume is currently running in a degraded state, where one member of the 6 nodes is no longer participating (problems with the OS install).

Comment 3 Kaleb KEITHLEY 2016-01-28 16:13:13 UTC

1288857 mainline
1288922 release-3.7 upstream
RHGS 3.0 -> release-3.6 ???

Comment 4 Nagaprasad Sathyanarayana 2016-01-29 07:24:54 UTC

This seems to be fixed and targeted to be available in RHGS 3.1.2.  Please check https://bugzilla.redhat.com/show_bug.cgi?id=1288921.  Fixed in version would be 3.7.5*


Please confirm if you are okay to get this fix in 3.1.2.

Comment 5 Peter Portante 2016-01-29 12:15:36 UTC

Has somebody looked at the environment in which this is failing and determined this is the same bug as those?

Comment 6 Kaleb KEITHLEY 2016-01-29 13:20:39 UTC

please check /var/log/glusterfs/mnt.log on the client for the presence of "kernel notifier loop terminated"

Thanks

Comment 7 Peter Portante 2016-01-29 14:13:02 UTC

I cannot find the file /var/log/glusterfs/mnt.log, and I don't see that string in the glusterfs/pbench.log, or in any other logs from that system.  Sorry.

Comment 8 Kaleb KEITHLEY 2016-02-03 10:11:48 UTC

Not getting the "kernel notifier loop terminated" issue in my 3.6.x setup

Comment 10 Ravishankar N 2016-02-05 05:43:28 UTC

There is an ongoing discussion on gluster-users about memory leaks in FUSE:http://www.gluster.org/pipermail/gluster-users/2016-January/024775.html
Soumya,Kaleb and Xavier Hernandez have sent some patches upstream. I'll talk to Soumya and update the bug.

Peter, could you provide the IP/login details of the client and servers?

Thanks,
Ravi

Comment 11 Peter Portante 2016-02-05 05:47:27 UTC

Sure, let's talk offline about what information you need to gather from the system.

Comment 12 Ravishankar N 2016-02-05 06:48:10 UTC

Sure, but I'd also that we have all the info/logs here on the BZ, so that it gives context in case someone else wants to look..

So I spoke to Soumya. Her fixes is in gfapi and another one related to upcall, so it wouldn't be relevant here. The FUSE related fixes that have been merged in master are: http://review.gluster.org/13327 and http://review.gluster.org/13274
There is also a dict leak in DHT fixed recently: http://review.gluster.org/13322

But before that, could you provide the following?

1. When the memory of the client (fuse mount) is high, take a state dump of the process. Also note the memory consumed.

2. Perform
  #sync
  #echo 3 > /proc/sys/vm/drop_caches

3.Note the memory consumed and take the statedump of the process again. 

Feel free to ping me on #rhs, my nick is 'itisravi'

Comment 13 Peter Portante 2016-02-05 17:46:06 UTC

# slabtop --sort=c -o
 Active / Total Objects (% used)    : 19486645 / 20288665 (96.0%)
 Active / Total Slabs (% used)      : 397170 / 397170 (100.0%)
 Active / Total Caches (% used)     : 78 / 123 (63.4%)
 Active / Total Size (% used)       : 5943909.85K / 6061758.42K (98.1%)
 Minimum / Average / Maximum Object : 0.01K / 0.30K / 15.88K

   OBJS  ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
5343645 5343645 100%    0.75K 127301       42   4073632K fuse_inode             
5696292 5696292 100%    0.19K 135626       42   1085008K dentry                 
 475076  448395  94%    0.57K  16967       28    271472K radix_tree_node        
 179936  172950  96%    1.00K   5623       32    179936K xfs_inode              
5347072 5347072 100%    0.03K  41774      128    167096K kmalloc-32             
1485393  795740  53%    0.10K  38087       39    152348K buffer_head            
1251648 1251648 100%    0.06K  19557       64     78228K kmalloc-64             
   9696    8973  92%    2.00K    606       16     19392K kmalloc-2048           
  27342   11260  41%    0.64K    558       49     17856K proc_inode_cache       
  11470   11470 100%    1.02K    370       31     11840K ext4_inode_cache       
  28896   16671  57%    0.38K    688       42     11008K blkdev_requests

Comment 14 Peter Portante 2016-02-05 17:53:17 UTC

# pidstat -r -u -C glusterfs 2 4
Linux 3.10.0-229.el7.x86_64 (perf42.perf.lab.eng.bos.redhat.com)        02/05/2016      _x86_64_        (12 CPU)

05:52:36 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
05:52:38 PM     0      1706    4.98   11.44    0.00   16.42     4  glusterfs

05:52:36 PM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
05:52:38 PM     0      1706   8676.62      0.00 53101696 52261664  52.88  glusterfs

05:52:38 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
05:52:40 PM     0      1706    4.50    9.00    0.00   13.50     4  glusterfs

05:52:38 PM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
05:52:40 PM     0      1706   7074.00      0.00 53101696 52261664  52.88  glusterfs

05:52:40 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
05:52:42 PM     0      1706  170.50    0.00    0.00  170.50     4  glusterfs

05:52:40 PM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
05:52:42 PM     0      1706   6117.00      0.00 53101696 52261660  52.88  glusterfs

05:52:42 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command

05:52:42 PM   UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
05:52:44 PM     0      1706   9855.50      0.00 53101696 52261664  52.88  glusterfs

Average:      UID       PID    %usr %system  %guest    %CPU   CPU  Command
Average:        0      1706   44.94    5.12    0.00   50.06     -  glusterfs

Average:      UID       PID  minflt/s  majflt/s     VSZ    RSS   %MEM  Command
Average:        0      1706   7931.71      0.00 53101696 52261663  52.88  glusterfs

Comment 15 Peter Portante 2016-02-05 17:53:51 UTC

# cat /proc/meminfo
MemTotal:       98825916 kB
MemFree:         2955356 kB
MemAvailable:   40674920 kB
Buffers:            9168 kB
Cached:         36311612 kB
SwapCached:           32 kB
Active:         51902544 kB
Inactive:       36867284 kB
Active(anon):   48488272 kB
Inactive(anon):  3961580 kB
Active(file):    3414272 kB
Inactive(file): 32905704 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       4194300 kB
SwapFree:        4188980 kB
Dirty:               116 kB
Writeback:             0 kB
AnonPages:      52449140 kB
Mapped:            40988 kB
Shmem:               804 kB
Slab:            6180372 kB
SReclaimable:    1737436 kB
SUnreclaim:      4442936 kB
KernelStack:        7744 kB
PageTables:       111456 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    53607256 kB
Committed_AS:    1760364 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      320092 kB
VmallocChunk:   34308293628 kB
HardwareCorrupted:     0 kB
AnonHugePages:  27322368 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      145596 kB
DirectMap2M:     6135808 kB
DirectMap1G:    94371840 kB

Comment 16 Peter Portante 2016-02-05 19:44:13 UTC

# lsof -p 1706
lsof: WARNING: can't stat() nfs4 file system /pub
      Output information may be incomplete.
COMMAND    PID USER   FD      TYPE             DEVICE  SIZE/OFF      NODE NAME
glusterfs 1706 root  cwd       DIR              253,1      4096       128 /
glusterfs 1706 root  rtd       DIR              253,1      4096       128 /
glusterfs 1706 root  txt       REG              253,1     89320 201460176 /usr/sbin/glusterfsd
glusterfs 1706 root  mem       REG              253,1     83384    607939 /usr/lib64/glusterfs/3.6.0.53/xlator/meta.so
glusterfs 1706 root  mem       REG              253,1    119512  67415717 /usr/lib64/glusterfs/3.6.0.53/xlator/debug/io-stats.so
glusterfs 1706 root  mem       REG              253,1     61336    608029 /usr/lib64/glusterfs/3.6.0.53/xlator/performance/md-cache.so
glusterfs 1706 root  mem       REG              253,1     36504    608031 /usr/lib64/glusterfs/3.6.0.53/xlator/performance/quick-read.so
glusterfs 1706 root  mem       REG              253,1     22864    608033 /usr/lib64/glusterfs/3.6.0.53/xlator/performance/readdir-ahead.so
glusterfs 1706 root  mem       REG              253,1     52192    608032 /usr/lib64/glusterfs/3.6.0.53/xlator/performance/read-ahead.so
glusterfs 1706 root  mem       REG              253,1     57888    608035 /usr/lib64/glusterfs/3.6.0.53/xlator/performance/write-behind.so
glusterfs 1706 root  mem       REG              253,1    381024    607927 /usr/lib64/glusterfs/3.6.0.53/xlator/cluster/dht.so
glusterfs 1706 root  mem       REG              253,1    521544    607925 /usr/lib64/glusterfs/3.6.0.53/xlator/cluster/afr.so
glusterfs 1706 root  mem       REG              253,1    271992  67415720 /usr/lib64/glusterfs/3.6.0.53/xlator/protocol/client.so
glusterfs 1706 root  mem       REG              253,1     37152 201664912 /usr/lib64/libnss_sss.so.2
glusterfs 1706 root  mem       REG              253,1     27512 201351271 /usr/lib64/libnss_dns-2.17.so
glusterfs 1706 root  mem       REG              253,1     58288 201351273 /usr/lib64/libnss_files-2.17.so
glusterfs 1706 root  mem       REG              253,1    153184 201328204 /usr/lib64/liblzma.so.5.0.99
glusterfs 1706 root  mem       REG              253,1    398272 201328279 /usr/lib64/libpcre.so.1.2.0
glusterfs 1706 root  mem       REG              253,1    147096 201328289 /usr/lib64/libselinux.so.1
glusterfs 1706 root  mem       REG              253,1    110808 201351283 /usr/lib64/libresolv-2.17.so
glusterfs 1706 root  mem       REG              253,1     15688 201328475 /usr/lib64/libkeyutils.so.1.5
glusterfs 1706 root  mem       REG              253,1     62720 201481349 /usr/lib64/libkrb5support.so.0.1
glusterfs 1706 root  mem       REG              253,1    202576 201481337 /usr/lib64/libk5crypto.so.3.1
glusterfs 1706 root  mem       REG              253,1     15840 201328198 /usr/lib64/libcom_err.so.2.1
glusterfs 1706 root  mem       REG              253,1    942024 201481347 /usr/lib64/libkrb5.so.3.3
glusterfs 1706 root  mem       REG              253,1    316560 201481333 /usr/lib64/libgssapi_krb5.so.2.2
glusterfs 1706 root  mem       REG              253,1    449808 201338341 /usr/lib64/libssl.so.1.0.1e
glusterfs 1706 root  mem       REG              253,1     82976 201408758 /usr/lib64/glusterfs/3.6.0.53/rpc-transport/socket.so
glusterfs 1706 root  mem       REG              253,1    202168  67415726 /usr/lib64/glusterfs/3.6.0.53/xlator/mount/fuse.so
glusterfs 1706 root  mem       REG              253,1 106065056  67129219 /usr/lib/locale/locale-archive
glusterfs 1706 root  mem       REG              253,1     90632 201328292 /usr/lib64/libz.so.1.2.7
glusterfs 1706 root  mem       REG              253,1   2107760 201327679 /usr/lib64/libc-2.17.so
glusterfs 1706 root  mem       REG              253,1   2013048 201338339 /usr/lib64/libcrypto.so.1.0.1e
glusterfs 1706 root  mem       REG              253,1    141616 201351281 /usr/lib64/libpthread-2.17.so
glusterfs 1706 root  mem       REG              253,1     19512 201328126 /usr/lib64/libdl-2.17.so
glusterfs 1706 root  mem       REG              253,1    100384 201408747 /usr/lib64/libgfxdr.so.0.0.0
glusterfs 1706 root  mem       REG              253,1    112584 201408745 /usr/lib64/libgfrpc.so.0.0.0
glusterfs 1706 root  mem       REG              253,1    673560 201408753 /usr/lib64/libglusterfs.so.0.0.0
glusterfs 1706 root  mem       REG              253,1    164336 201328123 /usr/lib64/ld-2.17.so
glusterfs 1706 root  mem       REG              253,1     26254 134323387 /usr/lib64/gconv/gconv-modules.cache
glusterfs 1706 root    0r      CHR                1,3       0t0      6146 /dev/null
glusterfs 1706 root    1w      CHR                1,3       0t0      6146 /dev/null
glusterfs 1706 root    2w      CHR                1,3       0t0      6146 /dev/null
glusterfs 1706 root    3u  a_inode                0,9         0      5833 [eventpoll]
glusterfs 1706 root    4u     unix 0xffff8817e2eacb00       0t0     27006 socket
glusterfs 1706 root    5u     IPv4           22587659       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1023->gprfs001.sbu.lab.eng.bos.redhat.com:24007 (ESTABLISHED)
glusterfs 1706 root    6r     FIFO                0,8       0t0     23473 pipe
glusterfs 1706 root    7w     FIFO                0,8       0t0     23473 pipe
glusterfs 1706 root    8u      CHR             10,229       0t0     16567 /dev/fuse
glusterfs 1706 root   10u     IPv4              23472       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:958->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49157 (ESTABLISHED)
glusterfs 1706 root   11r      CHR                1,9       0t0      6151 /dev/urandom
glusterfs 1706 root   12u     IPv4              23426       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1004->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49153 (ESTABLISHED)
glusterfs 1706 root   13u     IPv4              23417       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1013->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49152 (ESTABLISHED)
glusterfs 1706 root   14u     IPv4              27015       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1023->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49152 (ESTABLISHED)
glusterfs 1706 root   15u     IPv4              23411       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1019->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49152 (ESTABLISHED)
glusterfs 1706 root   16u     IPv4              23413       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1017->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49152 (ESTABLISHED)
glusterfs 1706 root   17u     IPv4              23428       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1002->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49153 (ESTABLISHED)
glusterfs 1706 root   18u     IPv4              23415       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1015->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49152 (ESTABLISHED)
glusterfs 1706 root   19u     IPv4              23420       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:surf->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49153 (ESTABLISHED)
glusterfs 1706 root   20u     IPv4              23422       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1008->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49153 (ESTABLISHED)
glusterfs 1706 root   21u     IPv4              23424       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:1006->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49153 (ESTABLISHED)
glusterfs 1706 root   22w      REG              253,1 815103913     40555 /var/log/glusterfs/pbench.log
glusterfs 1706 root   23u     IPv4              23439       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:nas->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49154 (ESTABLISHED)
glusterfs 1706 root   24u     IPv4              23431       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:garcon->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49154 (ESTABLISHED)
glusterfs 1706 root   25u     IPv4              23433       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:maitrd->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49154 (ESTABLISHED)
glusterfs 1706 root   26u     IPv4              23435       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:pop3s->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49154 (ESTABLISHED)
glusterfs 1706 root   27u     IPv4              23437       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:imaps->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49154 (ESTABLISHED)
glusterfs 1706 root   28u     IPv4              23450       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:980->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49155 (ESTABLISHED)
glusterfs 1706 root   29u     IPv4              23442       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:988->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49155 (ESTABLISHED)
glusterfs 1706 root   30u     IPv4              23444       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:986->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49155 (ESTABLISHED)
glusterfs 1706 root   31u     IPv4              23446       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:984->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49155 (ESTABLISHED)
glusterfs 1706 root   32u     IPv4              23448       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:982->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49155 (ESTABLISHED)
glusterfs 1706 root   33u     IPv4              23461       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:969->gprfs010-b-10ge.sbu.lab.eng.bos.redhat.com:49156 (ESTABLISHED)
glusterfs 1706 root   34u     IPv4              23453       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:977->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49158 (ESTABLISHED)
glusterfs 1706 root   35u     IPv4              23455       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:975->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49156 (ESTABLISHED)
glusterfs 1706 root   36u     IPv4              23457       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:973->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49156 (ESTABLISHED)
glusterfs 1706 root   37u     IPv4              23459       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:971->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49156 (ESTABLISHED)
glusterfs 1706 root   39u     IPv4              23464       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:966->gprfs001-b-10ge.sbu.lab.eng.bos.redhat.com:49157 (ESTABLISHED)
glusterfs 1706 root   40u     IPv4              23466       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:964->gprfs009-b-10ge.sbu.lab.eng.bos.redhat.com:49157 (ESTABLISHED)
glusterfs 1706 root   41u     IPv4              23468       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:962->gprfs011-b-10ge.sbu.lab.eng.bos.redhat.com:49157 (ESTABLISHED)
glusterfs 1706 root   42u     IPv4              23470       0t0       TCP perf42.perf.lab.eng.bos.redhat.com:960->gprfs002-b-10ge.sbu.lab.eng.bos.redhat.com:49157 (ESTABLISHED)

Comment 17 Peter Portante 2016-02-05 21:02:00 UTC

What do you mean by a "statedump" of the process?

Comment 18 Ravishankar N 2016-02-06 01:02:22 UTC

(In reply to Peter Portante from comment #17)
> What do you mean by a "statedump" of the process?

You would need to do a `kill -SIGUSR1 <fuse_mount_process_pid>` to get the statedump of that process. `gluster --print-statedumpdir` should give you the location where  it gets saved.

Section 14.6 of the admin guide https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3/pdf/Administration_Guide/Red_Hat_Storage-3-Administration_Guide-en-US.pdf has some information on statedumps.

Comment 19 Peter Portante 2016-02-06 02:43:54 UTC

From what I can tell, this is a server side state dump, not a state dump of the glusterfs daemon on the client side.  Do I understand this right?

There is no "gluster" command available on our client perf42, where the memory leak is occuring.  And there does not appear to be any signal handling for the SIGUSR1 applied to the glusterfs daemon process when I send the signal, at least when I strace the process, nothing happens.

Do I take statedumps for all 6 servers in the cluster?  I am not sure how that addresses the memory leak on the client though.

Comment 20 Ravishankar N 2016-02-06 04:14:47 UTC

(In reply to Peter Portante from comment #19)
> From what I can tell, this is a server side state dump, not a state dump of
> the glusterfs daemon on the client side.  Do I understand this right?
> 
The command I gave is for the client. Sending a SIGUSR1 to client process generates the statedump for it, as mentioned in the admin guide.

> There is no "gluster" command available on our client perf42, where the
> memory leak is occuring.  And there does not appear to be any signal
> handling for the SIGUSR1 applied to the glusterfs daemon process when I send
> the signal, at least when I strace the process, nothing happens.
> 

Is there a /var/run/gluster/ path on the client? If not, create one and try again.

Comment 21 Peter Portante 2016-02-06 04:29:43 UTC

Created attachment 1121563 [details]
State dump before drop cache executed

Thanks for the clarification of the documentation.  I'll file a bug against the documentation to make that easier to understand.

Comment 22 Peter Portante 2016-02-06 04:38:37 UTC

Created attachment 1121564 [details]
State dump after drop caches performed

Ran the drop caches operation and then took a state dump.  No change in the memory usage of the GlusterFS process.

Comment 23 Peter Portante 2016-02-06 04:58:27 UTC

Below is a quick capture of an IRC conversation about the state of the system in this BZ:

itisravi
portante: btw, what kind of files are there in the volume?
portante
thanks, this is helpful, and I'll file a doc enhancement but to get that clearer, because I was confused by the documentation text
itisravi
portante: regular files or VM images..?
portante
regular files
itisravi
sure
itisravi
ok
portante
lots of small files, though there are a few that are 1 or 2 GB
itisravi
portante: the process still consumes 40 odd GB is it?
portante
I started the command, "find -H . -user root -ls -exec chown -h pbench:pbench {} ;" running about a week ago, it has yet to complete, it is slowly chugging through it
itisravi
right
portante
yes, right now it is reported by top, 49.4GB RES
itisravi
hmm.
portante
all the users of the volume don't see any problems.
portante
the volume is running fine for all intents and purposes
portante
and it is running in a degraded state, no less
portante
One of the peers is reported as: State: Peer Rejected (Connected)
portante
somebody toasted the setup on that host, so I am following the steps in section 8.6.2 of the guide you posted in the BZ to restore that cluster member
itisravi
portante: oh ok, this node in question is hosting one of the replica bricks?
portante
yes, this cluster is a 2x3
itisravi
If the bricks of this node are down, then once they are back, self-heal is going to kick in.
portante
yes, that is what I understand to be the case as well
itisravi
you'll probably notice high cpu usage during the heal.
portante
okay, good to know, it will have the weekend to get a good head start ahead of the coming week
itisravi
right.
itisravi
so the thing is, running a find on the mount triggers lookups and creates inodes for all these files. The no. of inodes in the memory is only limited by the RAM, which could explain the high mem. usage.
itisravi
But the fact that dropping caches is not reducing the mem usage seems to indicate memory leaks.
portante
okay

Comment 24 Ravishankar N 2016-02-09 12:25:59 UTC

Notes to self from looking at the client state dumps (Reference:https://github.com/gluster/glusterfs/blob/master/doc/debugging/statedump.md):

1.Mallinfo:
mallinfo_fordblks-->* Total free space (bytes) */
Before drop_caches:1538645840
After drop_caches:1305856192
Not sure how the free space can _reduce_ after a drop_cache.

2.Data structure allocation stats:
No difference before and after.

3.Mempools:
fuse:dentry_t and inode_t have cold_count=0 (hot count ~ 32K) before drop_cache and high pool misses. This is probably due to the high number of lookups triggered by FUSE. After drop_cache, cold_count, hot_count approx 16K each.Seems some inodes were reclaimed due to drop_caches

None of other xlator pools have zero cold_count.

4.iobufs:
All arena.x have been moved to purged.x==>seems a logical effect of drop_caches

5.call stack and frame:
I see mostly lookup and readdirp. Nothing suspicious here.

6.[mount/fuse.  gf_common_mt_inode_ctx memory usage:
Before:num_allocs=461330
After:num_allocs=2007

7.History of operations in Fuse
Mostly lookups , stats and readdirps. See lot of ENOENT errors in fuse_entry_cbk for LOOKUP()

8. Memory accounting highest consumers (num_allocs):
Before: 
num_allocs=5703132--[mount/fuse.fuse - usage-type 40 memusage], type=gf_common_mt_strdup
num_allocs=5703133--performance/md-cache.pbench-md-cache - usage-type 117 memusage], type=gf_mdc_mt_md_cache_t
num_allocs=694438 type=[mount/fuse.fuse - usage-type 119 memusage], type=gf_fuse_mt_iov_base
num_allocs=6142231type=[mount/fuse.fuse - usage-type 48 memusage], tupe=gf_common_mt_mem_pool

After:
num_allocs=695587--[mount/fuse.fuse - usage-type 119 memusage, type=type=gf_fuse_mt_iov_base
num_allocs=194492

It seemed odd that the num_allocs for gf_fuse_mt_iov_base was same before and after drop_caches while every other data type saw a drastic reduction. But with some code reading, Raghavendra (Du) pointed out that this is a bug in memory accounting.Need to fix that.

Comment 25 Ravishankar N 2016-02-09 12:40:31 UTC

Hi Peter, While I'm not getting anything conclusive about the high memory usage from the statedumps, they do seem to indicate a high amount of lookups and readdirps triggered from FUSE from the above points (3,5,6,7,8). The fixes that went in recently in upstream, specifically Xavier's patch http://review.gluster.org/#/c/13327/ should help in reducing the leaks.

If I provide a build with the FUSE fixes on top of the latest RHGS-3.1.2 branch,would you be able to test and see if you are still observing the high memory usage? You would need to update both clients and servers.

Comment 26 Peter Portante 2016-02-09 17:23:26 UTC

Yes, I'd be happy to try something out.

However, we are running 3.0.4, I believe, so let me get the machines upgraded to RHEL 7.2 and RHGS-3.1.2 first and then we can try that out.

Thanks!

Comment 27 Peter Portante 2016-02-10 03:35:27 UTC

I should amend this: we are running on the *client* RHEL 7.1 with:

Installed Packages
glusterfs.x86_64       3.6.0.53-1.el7rhs  @/glusterfs-3.6.0.53-1.el7rhs.x86_64     
glusterfs-api.x86_64   3.6.0.53-1.el7rhs  @/glusterfs-api-3.6.0.53-1.el7rhs.x86_64 
glusterfs-fuse.x86_64  3.6.0.53-1.el7rhs  @/glusterfs-fuse-3.6.0.53-1.el7rhs.x86_64
glusterfs-libs.x86_64  3.6.0.53-1.el7rhs  @/glusterfs-libs-3.6.0.53-1.el7rhs.x86_64

On the servers we are running RHEL 6.6 with RHGS 3.0.4.

So as long as an RHGS-3.1.2 client can work with an RHGS 3.0.4 cluster, then we can do this once the self-heal completes (estimated EOD Tuesday, Feb 16th, right now).

Comment 28 Ravishankar N 2016-02-10 05:25:34 UTC

(In reply to Peter Portante from comment #27)
> I should amend this: we are running on the *client* RHEL 7.1 with:
> 
> Installed Packages
> glusterfs.x86_64       3.6.0.53-1.el7rhs 
> @/glusterfs-3.6.0.53-1.el7rhs.x86_64     
> glusterfs-api.x86_64   3.6.0.53-1.el7rhs 
> @/glusterfs-api-3.6.0.53-1.el7rhs.x86_64 
> glusterfs-fuse.x86_64  3.6.0.53-1.el7rhs 
> @/glusterfs-fuse-3.6.0.53-1.el7rhs.x86_64
> glusterfs-libs.x86_64  3.6.0.53-1.el7rhs 
> @/glusterfs-libs-3.6.0.53-1.el7rhs.x86_64
> 
> On the servers we are running RHEL 6.6 with RHGS 3.0.4.
> 

Oh my! You seem to be running a newer version of client with older version of server. This is not supported in general. Moreover, RHGS-3.0.4 has the older afr-v1 (replication translator) code while glusterfs-3.6.x has the re-factored afr-v2 code which is not backward compatible. :-(


> So as long as an RHGS-3.1.2 client can work with an RHGS 3.0.4 cluster, then
> we can do this once the self-heal completes (estimated EOD Tuesday, Feb
> 16th, right now).

Comment 29 Ravishankar N 2016-02-25 08:46:21 UTC

Hi Peter, shall I provide a build with the fixes?

Comment 30 Peter Portante 2016-02-25 11:55:57 UTC

I think we need to get this setup to a supported state first.  I'll post here when we have moved this to the supported setup.

Comment 31 Ravishankar N 2016-02-26 07:17:18 UTC

Removing the BZ from 3.1.3 for now.

Comment 32 Ravishankar N 2016-06-15 07:24:53 UTC

Hi Peter, Are you experiencing any more memory leak issues in 3.1.3? If not, we could close this BZ.

Comment 33 Peter Portante 2016-06-15 10:29:50 UTC

One of our clients recently grew to about 55.8GB, and at that point we took a state dump, and Vijay was going to analyze it to see if any of it represented a memory leak.

Comment 34 Amar Tumballi 2018-02-07 04:26:42 UTC

We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.

Comment 35 Vijay Bellur 2019-01-10 18:12:28 UTC

Clearing needinfo as this bug is closed now.

Note You need to log in before you can comment on or make changes to this bug.