Bug 977250 - running dbench results in leaked fds leading to OOM killer killing glusterfsd.
Summary: running dbench results in leaked fds leading to OOM killer killing glusterfsd.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Sudhir D
URL:
Whiteboard:
Depends On: 976800 985388
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-24 07:19 UTC by Pranith Kumar K
Modified: 2013-07-17 11:23 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.3.0.11rhs-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 976800
Environment:
Last Closed: 2013-07-15 21:53:54 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2013:1064 0 normal SHIPPED_LIVE Red Hat Storage 2.0 enhancement and bug fix update #5 2013-07-16 01:51:03 UTC

Description Pranith Kumar K 2013-06-24 07:19:49 UTC
+++ This bug was initially created as a clone of Bug #976800 +++

Description of problem:
Running dbench on a distributed replicate volume causes leaked fds on the server thereby causing the OOM killer to kill the brick process.

Output of dmesg on server:
========================================================================
<snip>
VFS: file-max limit 188568 reached
.
.
.
Out of memory: Kill process 12235 (glusterfsd) score 215 or sacrifice child
Killed process 12235, UID 0, (glusterfsd) total-vm:3138856kB, anon-rss:466728kB, file-rss:1028kB
glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
glusterfsd cpuset=/ mems_allowed=0
Pid: 12333, comm: glusterfsd Not tainted 2.6.32-358.6.2.el6.x86_64 #1

</snip>
========================================================================


How reproducible:
Always

Steps to Reproduce:
1. create a 2x2 distributed replicate volume and FUSE mount it
2. On the mount point,run "dbench -s -F -S -x  --one-byte-write-fix --stat-check 10"
3.Kill dbench after running for about 3 minutes.
4. On the server, do
 ls -l /proc/pid_of_brick(s)/fd|grep deleted


Actual results:
We can still see open/unlinked fds even though dbench was killed. Also if dbench is run till completion, we can observe some of the bricks getting killed by the oom killer (ps aux |grep glusterfsd)

Expected results:
Once dbench is killed, the brick processes must not have open/unlinked fds.

Additional info:
Bisected the leak to the following commit ID on upstream:
* 8909c28 - cluster/afr: fsync() guarantees POST-OP completion

--- Additional comment from Anand Avati on 2013-06-24 03:03:34 EDT ---

REVIEW: http://review.gluster.org/5248 (cluster/afr: Fix fd/memory leak on fsync) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

Comment 3 Sachidananda Urs 2013-07-04 08:39:33 UTC
Tested with the above scenario from multiple runs. No fd leak seen.

Comment 5 errata-xmlrpc 2013-07-15 21:53:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1064.html


Note You need to log in before you can comment on or make changes to this bug.