Description of problem: OOM kill led to unmount of the mount point when healling of the files in progress. Version-Release number of selected component (if applicable): [root@dhcp43-157 guess]# gluster --version glusterfs 3.9dev built on Jul 11 2016 10:04:54 Repository revision: git://git.gluster.com/glusterfs.git How reproducible: Hit once BRICK1 DATA BRICK2 DATA BRICK3 ARBITER Steps to Reproduce: 1. Create a 1*3 volume with arbiter - Volname GUESS 2. Mount the volume using fuse /mnt/guesmount 3. create files using dd.sh for((i=1;i<=1000000;i++)) do dd if=/dev/urandom of=file$i bs=10K count=1 done 4. when the IO is started wait 5 mins ,kill brick process of B1 5. now after the script is over, bring the B1 up using "gluster volume start guess force" 6. Now kill the B3 brick and bring it up by "gluster volume start guess force" Observations:- The heals were happening from B2 to B1 but when i killed the B3 brick. The Heals started from B1 to B3. Actual results: OOM kill was observed on the client with mount point inaccessible. Expected results: No OOM kill should be observed Additional info: Logs are kept @ rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>
Created attachment 1218528 [details] Statedumps
***UPDATED*** 1) Kill one data brick from 1*(2+1) arbiter volume. 2) create a file script.sh and echo abcd into script.sh; Create 500000 files from FUSE mounted client. 3) After the creation is successful, Bring the brick online using start force. 4) While healing is in progress on brick side, Access a script.sh on mount using VIM command. 5) tailf /var/log/messages . while accessing you will see the OOM kill of the mount point. ^^^ Attached the statedumps ^^^ in comment 1
Based on the statedump, one big leak I see is from dirents which are allocated by DHT but don't seem to be leaking in dht. I think some xlator above dht is not freeing it. Could you let me know the size of the directory you may have?
Pranith, I am not making any directory, but i am creating files of 1K each. Thanks & Regards Karan Sandha
There is no upper limit to amount of dentries readdir-ahead can store as of now. It keeps populating the cache till EOD is reached or an error is encountered in readdir from lower xlators. So, in a scenario where readdirs from application are infrequent and directory is huge, all the dentries of a directory is cached in memory and that could result in OOM. Please note that it is not a leak, but a bug in readdir-ahead to not have an upper limit.
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#2) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#3) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#4) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#5) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#6) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/16137 (performance/readdir-ahead: limit cache size) posted (#7) for review on master by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/16137 committed in master by Atin Mukherjee (amukherj) ------ commit 96fb35624060565e02e946a970b3e777071bde9c Author: Raghavendra G <rgowdapp> Date: Thu Nov 24 14:58:20 2016 +0530 performance/readdir-ahead: limit cache size This patch introduces a new option called "rda-cache-limit", which is the maximum value the entire readdir-ahead cache can grow into. Since, readdir-ahead holds a reference to inode through dentries, this patch also accounts memory stored by various xlators in inode contexts. Change-Id: I84cc0ca812f35e0f9041f8cc71effae53a9e7f99 BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/16137 NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Poornima G <pgurusid> Smoke: Gluster Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
Need to mark rda-low-wmark/rda-high-wmark as NO_DOC
REVIEW: http://review.gluster.org/16297 (performance/readdir-ahead: mark two options as NO_DOC) posted (#1) for review on master by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/16297 committed in master by Atin Mukherjee (amukherj) ------ commit ad785fd8ed7460ed5a2ba571a3d509317a627aba Author: Raghavendra G <rgowdapp> Date: Tue Dec 27 16:15:30 2016 +0530 performance/readdir-ahead: mark two options as NO_DOC The two options are rda-high-wmark and rda-low-wmark. The impact of these two options is yet to be fully understood and hence not advertising these options to not run into surprises. Change-Id: Ia537f4cc342011f0f2f3849ad6b938e247e5622d BUG: 1356960 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/16297 Reviewed-by: Atin Mukherjee <amukherj> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report. glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html [2] https://www.gluster.org/pipermail/gluster-users/