1356960 – OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Bug 1356960 - OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Summary: OOM Kill on client when heal is in progress on 1*(2+1) arbiter volume

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	arbiter
Sub Component:
Version:	mainline
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1393316 1408217 1408220 1408221
TreeView+	depends on / blocked

Reported:	2016-07-15 11:36 UTC by Karan Sandha
Modified:	2017-03-06 17:20 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.10.0
Clone Of:
Clones:	1393316 1408217 1408220 1408221 (view as bug list)
Environment:
Last Closed:	2017-03-06 17:20:39 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Statedumps (1.12 MB, application/x-gzip) 2016-11-08 13:58 UTC, Karan Sandha	no flags	Details
View All

Description Karan Sandha 2016-07-15 11:36:00 UTC

Description of problem:
OOM kill led to unmount of the mount point when healling of the files in progress.

Version-Release number of selected component (if applicable):
[root@dhcp43-157 guess]# gluster --version
glusterfs 3.9dev built on Jul 11 2016 10:04:54
Repository revision: git://git.gluster.com/glusterfs.git

How reproducible:
Hit once
BRICK1 DATA
BRICK2 DATA
BRICK3 ARBITER
Steps to Reproduce:
1. Create a 1*3 volume with arbiter - Volname GUESS
2. Mount the volume using fuse  /mnt/guesmount
3. create files using dd.sh 
for((i=1;i<=1000000;i++))
do 
dd if=/dev/urandom of=file$i bs=10K count=1
done
4. when the IO is started wait 5 mins ,kill brick process of B1
5. now after the script is over,  bring the B1 up using "gluster volume start guess force"
6. Now kill the B3 brick and bring it up by  "gluster volume start guess force"


Observations:-
The heals were happening from B2 to B1 but when i killed the B3 brick. The Heals started from B1 to B3. 

Actual results:
OOM kill was observed on the client with mount point inaccessible. 

Expected results:
No OOM kill should be observed

Additional info:
Logs are kept @ rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Comment 1 Karan Sandha 2016-11-08 13:58:02 UTC

Created attachment 1218528 [details]
Statedumps

Comment 2 Karan Sandha 2016-11-08 14:07:10 UTC

                                    ***UPDATED***
1) Kill one data brick from 1*(2+1) arbiter volume. 
2) create a file script.sh and echo abcd into script.sh; Create 500000 files from FUSE mounted client.
3) After the creation is successful, Bring the brick online using start force.
4) While healing is in progress on brick side, Access a script.sh on mount using VIM command.
5) tailf /var/log/messages . while accessing you will see the OOM kill of the mount point.


^^^ Attached the statedumps ^^^ in comment 1

Comment 3 Pranith Kumar K 2016-11-08 16:24:21 UTC

Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?

Comment 4 Karan Sandha 2016-11-08 20:24:55 UTC

Pranith,

I am not making any directory, but i am creating files of 1K each. 

Thanks & Regards
Karan Sandha

Comment 6 Raghavendra G 2016-11-23 04:02:42 UTC

There is no upper limit to amount of dentries readdir-ahead can store as of now. It keeps populating the cache till EOD is reached or an error is encountered in readdir from lower xlators. So, in a scenario where readdirs from application are infrequent and directory is huge, all the dentries of a directory is cached in memory and that could result in OOM. Please note that it is not a leak, but a bug in readdir-ahead to not have an upper limit.

Comment 7 Worker Ant 2016-12-15 05:35:10 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#2) for review on master by Raghavendra G (rgowdapp)

Comment 8 Worker Ant 2016-12-16 04:40:53 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#3) for review on master by Raghavendra G (rgowdapp)

Comment 9 Worker Ant 2016-12-19 09:11:28 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#4) for review on master by Raghavendra G (rgowdapp)

Comment 10 Worker Ant 2016-12-19 09:22:30 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#5) for review on master by Raghavendra G (rgowdapp)

Comment 11 Worker Ant 2016-12-22 07:00:23 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#6) for review on master by Raghavendra G (rgowdapp)

Comment 12 Worker Ant 2016-12-22 07:26:20 UTC

REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#7) for review on master by Raghavendra G (rgowdapp)

Comment 13 Worker Ant 2016-12-22 11:43:18 UTC

COMMIT: http://review.gluster.org/16137 committed in master by Atin Mukherjee (amukherj) 
------
commit 96fb35624060565e02e946a970b3e777071bde9c
Author: Raghavendra G <rgowdapp>
Date:   Thu Nov 24 14:58:20 2016 +0530

    performance/readdir-ahead: limit cache size
    
    This patch introduces a new option called "rda-cache-limit", which is
    the maximum value the entire readdir-ahead cache can grow into. Since,
    readdir-ahead holds a reference to inode through dentries, this patch
    also accounts memory stored by various xlators in inode contexts.
    
    Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99
    BUG: 1356960
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/16137
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Poornima G <pgurusid>
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 14 Raghavendra G 2016-12-27 10:44:58 UTC

Need to mark rda-low-wmark/rda-high-wmark as NO_DOC

Comment 15 Worker Ant 2016-12-27 10:47:26 UTC

REVIEW: http://review.gluster.org/16297 (performance/readdir-ahead: mark two options as NO_DOC) posted (#1) for review on master by Raghavendra G (rgowdapp)

Comment 16 Worker Ant 2016-12-27 16:19:35 UTC

COMMIT: http://review.gluster.org/16297 committed in master by Atin Mukherjee (amukherj) 
------
commit ad785fd8ed7460ed5a2ba571a3d509317a627aba
Author: Raghavendra G <rgowdapp>
Date:   Tue Dec 27 16:15:30 2016 +0530

    performance/readdir-ahead: mark two options as NO_DOC
    
    The two options are rda-high-wmark and rda-low-wmark. The impact of
    these two options is yet to be fully understood and hence not
    advertising these options to not run into surprises.
    
    Change-Id: Ia537f4cc342011f0f2f3849ad6b938e247e5622d
    BUG: 1356960
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/16297
    Reviewed-by: Atin Mukherjee <amukherj>
    Smoke: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>

Comment 17 Shyamsundar 2017-03-06 17:20:39 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.