Bug 1324684 - glusterfs process *replicate* consumed 75 GB of 96 GB, forcing node into OOM Kill
Summary: glusterfs process *replicate* consumed 75 GB of 96 GB, forcing node into OOM ...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-07 01:15 UTC by Peter Portante
Modified: 2016-09-17 12:18 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-15 09:00:32 UTC
Target Upstream Version:


Attachments (Terms of Use)
First statedump (182.94 KB, text/plain)
2016-04-11 03:44 UTC, Vijay Bellur
no flags Details
Second Statedump (183.93 KB, text/plain)
2016-04-11 03:45 UTC, Vijay Bellur
no flags Details

Description Peter Portante 2016-04-07 01:15:09 UTC
Today we found one of our six gluster nodes in a state where a glusterfs process was consuming most of memory:

USER       PID %CPU %MEM       VSZ      RSS TTY  STAT START    TIME COMMAND
root     31918 24.1 95.4 104567776 78867272 ?    Ssl  Mar15 7606:14 \ /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p \
/var/lib/glusterd/glustershd/run/glustershd.pid -l \
/var/log/glusterfs/glustershd.log -S \
/var/run/9f67821258ca7bb33117f9c9ec46e8d3.socket --xlator-option \
*replicate*.node-uuid=d05444b0-6034-403e-a9f9-59a7a9428d0e

At first I tried to stop gluster with, "service glusterd stop", but that failed.  So I had to kill that process with "kill -TERM 31918", and then had to kill all the other processes before I could get a "service gluster start" to work properly.

I don't know what caused this.

Info:

Volume Name: pbench
Type: Distributed-Replicate
Volume ID: 688b4f86-9868-4fab-ab0e-7341404c762d
Status: Started
Snap Volume: no
Number of Bricks: 12 x 3 = 36
Transport-type: tcp
Bricks:
Brick1: gprfs001-b-10ge:/brick/pbench0-brick/pbench
Brick2: gprfs009-b-10ge:/brick/pbench0-brick/pbench
Brick3: gprfs011-b-10ge:/brick/pbench0-brick/pbench
Brick4: gprfs002-b-10ge:/brick/pbench0-brick/pbench
Brick5: gprfs010-b-10ge:/brick/pbench0-brick/pbench
Brick6: gprfs012-b-10ge:/brick/pbench0-brick/pbench
Brick7: gprfs001-b-10ge:/brick/pbench1-brick/pbench
Brick8: gprfs009-b-10ge:/brick/pbench1-brick/pbench
Brick9: gprfs011-b-10ge:/brick/pbench1-brick/pbench
Brick10: gprfs002-b-10ge:/brick/pbench1-brick/pbench.1
Brick11: gprfs010-b-10ge:/brick/pbench1-brick/pbench
Brick12: gprfs012-b-10ge:/brick/pbench1-brick/pbench
Brick13: gprfs001-b-10ge:/brick/pbench2-brick/pbench
Brick14: gprfs009-b-10ge:/brick/pbench2-brick/pbench
Brick15: gprfs011-b-10ge:/brick/pbench2-brick/pbench
Brick16: gprfs002-b-10ge:/brick/pbench2-brick/pbench
Brick17: gprfs010-b-10ge:/brick/pbench2-brick/pbench
Brick18: gprfs012-b-10ge:/brick/pbench2-brick/pbench
Brick19: gprfs001-b-10ge:/brick/pbench3-brick/pbench
Brick20: gprfs009-b-10ge:/brick/pbench3-brick/pbench
Brick21: gprfs011-b-10ge:/brick/pbench3-brick/pbench
Brick22: gprfs002-b-10ge:/brick/pbench3-brick/pbench
Brick23: gprfs010-b-10ge:/brick/pbench3-brick/pbench
Brick24: gprfs012-b-10ge:/brick/pbench3-brick/pbench
Brick25: gprfs001-b-10ge:/brick/pbench4.1-brick/pbench
Brick26: gprfs009-b-10ge:/brick/pbench4-brick/pbench
Brick27: gprfs011-b-10ge:/brick/pbench4-brick/pbench
Brick28: gprfs002-b-10ge:/brick/pbench4-brick/pbench
Brick29: gprfs010-b-10ge:/brick/pbench4-brick/pbench
Brick30: gprfs012-b-10ge:/brick/pbench4-brick/pbench
Brick31: gprfs001-b-10ge:/brick/pbench5-brick/pbench
Brick32: gprfs009-b-10ge:/brick/pbench5-brick/pbench
Brick33: gprfs011-b-10ge:/brick/pbench5-brick/pbench
Brick34: gprfs002-b-10ge:/brick/pbench5-brick/pbench
Brick35: gprfs010-b-10ge:/brick/pbench5-brick/pbench
Brick36: gprfs012-b-10ge:/brick/pbench5-brick/pbench
Options Reconfigured:
diagnostics.brick-sys-log-level: CRITICAL
performance.readdir-ahead: on
performance.io-cache: off
performance.stat-prefetch: on
cluster.lookup-unhashed: off
client.event-threads: 8
cluster.read-hash-mode: 2
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256

All hardware boxes in cluster (6 in all) are 2 socket, 12 core, 96GB of memory systems, with 12 1 TB disks, lashed together in pairs to create 6 bricks per host for a total of 36 bricks.

Comment 2 Peter Portante 2016-04-07 01:16:56 UTC
This is for RHGS 3.0.4, on RHEL 6.6.

Comment 3 Peter Portante 2016-04-07 01:35:13 UTC
This does NOT appear to be related to https://bugzilla.redhat.com/show_bug.cgi?id=1247221, as no find operations were being performed on the local disks.

Comment 4 Atin Mukherjee 2016-04-07 04:58:55 UTC
Changing the component to AFR as its the self heal daemon which is consuming this amount of memory.

Comment 5 Peter Portante 2016-04-07 13:19:29 UTC
After restarting gluster on that host, it has returned to growing in its memory use again, at 16 GB now.

How do I safely restart gluster at this time to avoid the memory growth causing a problem?

Comment 6 Vijay Bellur 2016-04-11 03:44:50 UTC
Created attachment 1145815 [details]
First statedump

Attaching statedump #1

Comment 7 Vijay Bellur 2016-04-11 03:45:39 UTC
Created attachment 1145816 [details]
Second Statedump

Comment 8 Vijay Bellur 2016-04-11 03:49:56 UTC
Note that memory leaks seem to be stemming from gf_strdup:

[cluster/replicate.pbench-replicate-0 - usage-type 40 memusage]
type=gf_common_mt_strdup
size=2918102943
num_allocs=32222745
max_size=2918102943


[cluster/replicate.pbench-replicate-4 - usage-type 40 memusage]
type=gf_common_mt_strdup
size=2917406121
num_allocs=32216871
max_size=2917406121

[cluster/replicate.pbench-replicate-8 - usage-type 40 memusage]
type=gf_common_mt_strdup
size=2134711491
num_allocs=9109248
max_size=2134711786

Comment 11 Pranith Kumar K 2016-06-15 09:00:32 UTC
Peter,
     In 3.0 afr-v1 was present where as from 3.1 onwards afr-v2 is present in self-heal daemon so the code is completely different. Since we are not going to make anymore releases on 3.0.x I am closing this bug for now. Please feel free to re-open/open new bug if you face same issue on 3.1

Pranith


Note You need to log in before you can comment on or make changes to this bug.