Bug 1327751

Summary:	glusterd memory overcommit
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Dustin Black <dblack>
Component:	glusterd	Assignee:	Kaushal <kaushal>
Status:	CLOSED ERRATA	QA Contact:	Byreddy <bsrirama>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.1	CC:	amukherj, annair, asrivast, bkunal, bordas.csaba, bugs, kaushal, rcyriac, rhinduja, rhs-bugs, rnalakka, ryanlee, sankarshan, storage-qa-internal, vbellur
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	RHGS 3.1.3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.9-4	Doc Type:	Bug Fix
Doc Text:	When encrypted connections are enabled, a separate thread is used for each connection. These threads were not being cleaned up correctly after the connections ended, which led to a gradual increase in GlusterFS memory consumption over time, especially in the glusterd process. These disconnected threads are now cleaned up once a minute, avoiding the problem with memory usage.	Story Points:	---
Clone Of:	1268125	Environment:
Last Closed:	2016-06-23 05:18:02 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1268125, 1331289
Bug Blocks:	1311817

Description Dustin Black 2016-04-15 20:20:25 UTC

+++ This bug was initially created as a clone of Bug #1268125 +++

Description of problem:

We were using Gluster 3.3 through 3.5 without issue but needed to add SSL support.  Due to SSL bugs in earlier versions, the quickest path forward was to upgrade the network to 3.7.  This generally appears to be fine as files added are appearing where they should, except that it's caused glusterd to vastly overcommit memory on both nodes where it runs.

On one (serverA, the 'master'), it had 4GB of RAM to work with, and the other (serverB) 2GB.  Both got up to around 30GB of committed virtual memory in a couple of weeks. When other processes were stopped on serverA and glusterd restarted, the overcommit problem appeared to be alleviated and, if it was growing at all, grew a lot slower.  We had to resize serverA and took it offline; at the same time, serverB glusterd shot up to 140GB while it was offline.  Both are currently at 2GB of RAM (for other reasons) and, after restarting both daemons, appear to be growing in committed memory at around 4GB/day.

Version of GlusterFS package installed:

3.7.4-ubuntu1~trusty1

Location from which the packages are used:

Launchpad PPA

GlusterFS Cluster Information:

    Number of volumes: 2
    Volume Names: backup, other
    Volume on which the particular issue is seen: N/A
    Type of volumes: backup Replicate, other Distribute
    Volume options if available:

Volume Name: backup
Type: Replicate
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: serverA:/glusterfs/brick3/data
Brick2: serverB:/glusterfs/brick3/data
Options Reconfigured:
auth.ssl-allow: [names]
ssl.cipher-list: HIGH:!SSLv2
ssl.certificate-depth: 3
server.ssl: on
client.ssl: on
 
Volume Name: other
Type: Distribute
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: serverA:/glusterfs/brick4/data
Options Reconfigured:
[same as above]

    Client Information
        OS Type: Linux
        Mount type: GlusterFS

How reproducible:

Have not tried (sorry).

Steps to Reproduce:

    1.
    2.
    3.

Actual results:


Expected results:


Logs Information:

I stripped dates and did a uniq -n on the actual log messages for the glusterd process.  server setup and SSL connect errors appear together.

  17299  E [socket.c:2863:socket_connect] (-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(gf_timer_proc+0xfb) [0x7f57813d366b] -->/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_reconnect+0xb9) [0x7f5781185c59] -->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/rpc-transport/socket.so(+0x755d) [0x7f577a62255d] ) 0-socket: invalid argument: this->private [Invalid argument]
    400  W [dict.c:1452:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7f577c95ee99] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str_boolean+0x22) [0x7f57813af6a2] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x19406) [0x7f57813ad406] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument]
    217  E [socket.c:2388:socket_poller] 0-socket.management: server setup failed
    217  E [socket.c:352:ssl_setup_connection] 0-socket.management: SSL connect error
     16  W [dict.c:1452:dict_get_with_ref] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/mgmt/glusterd.so(build_shd_graph+0x69) [0x7f3c7c558e99] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get_str_boolean+0x22) [0x7f3c80fa96a2] -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x19406) [0x7f3c80fa7406] ) 0-dict: dict OR key (graph-check) is NULL [Invalid argument]
     11  E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop

Additional info: 

This is totally unrelated (probably?), but in order to get glusterd to really start, I had to ln /usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/xlator/rpc-transport -s /usr/lib/x86_64-linux-gnu/glusterfs/3.7.4/rpc-transport as the logs were essentially complaining about being unable to find xlator/rpc-transport/socket.so.

--- Additional comment from Atin Mukherjee on 2015-10-05 23:41:53 EDT ---

Thanks for reporting the issue, however we would need a statedump of running glusterd instance once the hike is seen. Along with that cmd_history.log is also expected as that tells the commands performed in the cluster. Otherwise its almost impossible to analyze what has caused the memory leak.

--- Additional comment from  on 2015-10-06 16:20 EDT ---



--- Additional comment from  on 2015-10-06 16:21 EDT ---



--- Additional comment from  on 2015-10-06 16:22:21 EDT ---

I've generated the statedump files and cut out 6336 sections named xlator.features.locks.backup-locks.inode from backup.dump.  I'm hoping the contents of the inode context and the specific names of clients aren't necessary bits of info and have removed them from the dump files.

The cmd_history.log is pretty empty.  The memory starts leaking immediately on start/restart.  We're currently mitigating by restarting the process once a week, the growth rate is quite steady without having done anything else.  cmd_history does contain successes for updating volume settings (specifically, SSL allows) that our internal management system runs every half hour.  We could probably stand to modify it to issue changes only if the values differ, but that doesn't look to me to be the source of the issue.

--- Additional comment from Atin Mukherjee on 2015-10-07 01:38:33 EDT ---

(In reply to ryanlee from comment #4)
> I've generated the statedump files and cut out 6336 sections named
> xlator.features.locks.backup-locks.inode from backup.dump.  I'm hoping the
> contents of the inode context and the specific names of clients aren't
> necessary bits of info and have removed them from the dump files.
> 
> The cmd_history.log is pretty empty.  The memory starts leaking immediately
> on start/restart.  We're currently mitigating by restarting the process once
> a week, the growth rate is quite steady without having done anything else. 
> cmd_history does contain successes for updating volume settings
> (specifically, SSL allows) that our internal management system runs every
> half hour.  We could probably stand to modify it to issue changes only if
> the values differ, but that doesn't look to me to be the source of the issue.

You would need to take a statedump for glusterd process, not the clients. The attachment indicates the statedump is for clients. Would be able to provide that?

--- Additional comment from  on 2015-10-07 12:13:10 EDT ---

When I searched for gluster and statedump, I found info for running

% gluster volume statedump backup all
% gluster volume statedump other all

on the server, which is what's provided.  If you need something else, you're going to have to be more precise about how I can make it for you, not just that you need it.

--- Additional comment from Atin Mukherjee on 2015-10-08 00:25:33 EDT ---

You could take a statedump of the glusterd instance with the following command:

kill -SIGUSR1 <pid of glusterd instance>

With that the statedump will be generated in /var/run/gluster with the name as glusterdump.<pid of glusterd>.timestamp

Hope this clarifies the confusion.

--- Additional comment from  on 2015-10-08 16:03 EDT ---



--- Additional comment from  on 2015-10-08 16:03 EDT ---



--- Additional comment from  on 2015-10-08 16:04:49 EDT ---

Great, thanks.  Perhaps the new attachments will be of more use.

--- Additional comment from Atin Mukherjee on 2015-10-09 01:38:20 EDT ---

Statedump helps, however I did ask for the cmd_history.log & glusterd.log file as well. Along with it could you also provide the output of gluster volume info output?

--- Additional comment from  on 2015-10-09 15:03:15 EDT ---

(In reply to Atin Mukherjee from comment #11)

> however I did ask for the ... glusterd.log file as well

No, you didn't.  But I already summarized the errors in it in the original bug report.

> cmd_history.log

I've already mentioned it's effectively empty in comment #4.

> the output of gluster volume info output

I already provided it in the original bug report.

--- Additional comment from  on 2016-01-21 15:03:07 EST ---

Still present in 3.7.6.  I wonder if this is related to bug 1258931?

--- Additional comment from Atin Mukherjee on 2016-01-21 23:10:26 EST ---

(In reply to ryanlee from comment #13)
> Still present in 3.7.6.  I wonder if this is related to bug 1258931?

Not really IMO. There is no memory leak/over commit reported in that bug.

--- Additional comment from  on 2016-01-22 15:34:38 EST ---

Sorry, I should have added a bit more context.  Several months on, and the overcommit spread to all Gluster clients and started forcing one particularly small node offline due to resource exhaustion.  It maxed out at 1TB on serverA (probably because the system didn't allow it to commit any more).  Our requirement for SSL support was for secure access from an offsite Gluster client, but while enabling SSL provided a way in from one angle, its apparent memory-hogging side effects meant that client was nearly always disconnected anyways, so it wasn't going to work.

We went to find a different solution so we could switch back to non-SSL mode for everything else.  It may not be the same bug, certainly, but I turned off SSL yesterday, and everything is back to normal - all of the memory overcommits on the servers and the clients are mercifully gone, and there's no longer a growth pattern in our  monitoring graphs.

Which is the type of related I meant.  If not the same, they're both rooted in issues with SSL mode.  Is it possible the inability to connect to the self-heal daemon manifests as a constantly growing memory overcommit over the course of months?

Comment 4 Kaushal 2016-04-18 11:31:42 UTC

When using TLS, glusterfs launches a new thread for each connection. I suspect that the threads are not being reaped correctly causing the mem allocation to balloon, particularly VIRT.

Comment 5 Kaushal 2016-04-25 09:14:09 UTC

Testing has confirmed that the threads launched for TLS connections are not being reaped, leading to the gradual increase in memory used by GlusterD.

This should happen with bricks as well, when using IO encryption, but it's not as noticeable as in GlusterD. GlusterD gets a lot of short-lived connections from clients for volfile and portmap requests. This leaves a lot of dead threads in GlusterD which are waiting to be reaped (pthread_joined).

We're figuring out how to correctly solve the problem of reaping threads, as we don't see a straight forward method right away.

We also have a discussion ongoing in the upstream mailing lists, discussing TLS and threads in GlusterFS [1]. This discusses existing issues we found in TLS implementation in GlusterFS, including the issue of thread reaping.

[1] https://www.gluster.org/pipermail/gluster-devel/2016-April/049116.html

Comment 8 Atin Mukherjee 2016-04-29 11:29:13 UTC

Upstream patch http://review.gluster.org/14101 posted for review.

Comment 11 Atin Mukherjee 2016-05-05 05:53:34 UTC

upstream mainline patch : http://review.gluster.org/14101
release-3.7 patch : http://review.gluster.org/14143
Downstream patch : https://code.engineering.redhat.com/gerrit/#/c/73681

Comment 19 Byreddy 2016-05-17 06:23:56 UTC

Verified this bug using the build "glusterfs-3.7.9-4"

After doing mount and umount of volume for 300 times, the memory footprint with and without fix is below.


    
Without Fix:
-----------
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                   
5257 root      20   0 41.496g 153400   4480 S   0.0  8.1   0:30.08 glusterd.

With Fix:
---------
PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                   
4245 root      20   0 1459040  74808   4752 S   0.0  4.0   0:40.21 glusterd   



With the fix, memory over commit is not happening, moving to verified state.

Comment 21 Bipin Kunal 2016-05-30 08:45:28 UTC

Changing Status to ON_QA for hotfix testing.

Comment 22 Rahul Hinduja 2016-05-30 09:52:32 UTC

Moving this to verified state as the issue is fixed in the downstream 3.1.3 build. Please use needinfo flags for tracking the progress on hotfix.

Comment 23 Bipin Kunal 2016-05-30 10:16:21 UTC

I am not sure on the candidate for needinfo. Setting it for Anoop. 

@Anoop : Please change it to appropriate person in case it is needed.

Comment 26 Bipin Kunal 2016-06-14 04:34:21 UTC

Customer has agreed to wait for few weeks until we release 3.1.3. Thus, we do not require hotfix.

Comment 29 errata-xmlrpc 2016-06-23 05:18:02 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240