+++ This bug was initially created as a clone of Bug #976800 +++ Description of problem: Running dbench on a distributed replicate volume causes leaked fds on the server thereby causing the OOM killer to kill the brick process. Output of dmesg on server: ======================================================================== <snip> VFS: file-max limit 188568 reached . . . Out of memory: Kill process 12235 (glusterfsd) score 215 or sacrifice child Killed process 12235, UID 0, (glusterfsd) total-vm:3138856kB, anon-rss:466728kB, file-rss:1028kB glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 glusterfsd cpuset=/ mems_allowed=0 Pid: 12333, comm: glusterfsd Not tainted 2.6.32-358.6.2.el6.x86_64 #1 </snip> ======================================================================== How reproducible: Always Steps to Reproduce: 1. create a 2x2 distributed replicate volume and FUSE mount it 2. On the mount point,run "dbench -s -F -S -x --one-byte-write-fix --stat-check 10" 3.Kill dbench after running for about 3 minutes. 4. On the server, do ls -l /proc/pid_of_brick(s)/fd|grep deleted Actual results: We can still see open/unlinked fds even though dbench was killed. Also if dbench is run till completion, we can observe some of the bricks getting killed by the oom killer (ps aux |grep glusterfsd) Expected results: Once dbench is killed, the brick processes must not have open/unlinked fds. Additional info: Bisected the leak to the following commit ID on upstream: * 8909c28 - cluster/afr: fsync() guarantees POST-OP completion --- Additional comment from Anand Avati on 2013-06-24 03:03:34 EDT --- REVIEW: http://review.gluster.org/5248 (cluster/afr: Fix fd/memory leak on fsync) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)
Tested with the above scenario from multiple runs. No fd leak seen.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1064.html