Bug 1527147

Summary:	[GSS] [Regression] glusterfs threads consuming swap space until it runs out
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Pan Ousley <pousley>
Component:	protocol	Assignee:	Csaba Henk <csaba>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.3	CC:	amukherj, bkunal, csaba, jahernan, mchangir, ppai, rcyriac, rhs-bugs, rkavunga, sankarshan, storage-qa-internal
Target Milestone:	---	Keywords:	Regression, ZStream
Target Release:	RHGS 3.3.1 Async
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-52.3	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-01-11 02:46:39 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pan Ousley 2017-12-18 16:31:24 UTC

Description of problem:

"We've got two-node distributed-replicated gluster volumes that we've recently updated (in-service) from RHGS 3.2 to RHGS 3.3.1. Current access protocol in use is native glusterfs fuse.

Since the upgrade, glusterfs processes consume much more swap space than previously, and continue to consume swap until each machine runs out entirely -- it takes about 24 hours for the combined glusterfs threads on each node to consume all the available swap space. Initially, we had 4GB of swap, and upped that to 8GB swap with the same results -- a steady increase in swap consumption until it's gone.

Though the increase in swap consumption is continuous, we did notice an increase when our nightly (4a) backup of the <redacted> volume started -- that backup does enumerate the entire filesystem, and it appears to speed up the consumption of swap space."

I collected SAR data and the output of swap-usage.sh and shared it with a kernel SME, and he confirmed that the glusterfsd process is allocating heap space over and over, growing the VSZ of the userspace daemon. It is forced to use the swap file until it expires because the kernel doesn't have all the real memory to keep each copy in memory and there is no backing store file for heap space. The daemon is responsible for reallocating heap when it is no longer required, but it appears to not be doing that.

Their current workaround is to use scripts to restart glusterd each morning.

Version-Release number of selected component (if applicable): RHGS 3.3.0, RHGS 3.3.1

Additional information: I was researching existing bugs and came across these two:

https://bugzilla.redhat.com/show_bug.cgi?id=1507361
https://bugzilla.redhat.com/show_bug.cgi?id=1495161

Is it possible they are hitting one of these? Their 'application' (various versions of PHP and Apache) seems to use posix locking.

Information collected will be listed in my next comment. Please let me know if anything else is needed. Thank you.

Comment 19 errata-xmlrpc 2018-01-11 02:46:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0083