1527147 – [GSS] [Regression] glusterfs threads consuming swap space until it runs out

Bug 1527147 - [GSS] [Regression] glusterfs threads consuming swap space until it runs out

Summary: [GSS] [Regression] glusterfs threads consuming swap space until it runs out

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	protocol
Sub Component:
Version:	rhgs-3.3
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.3.1 Async
Assignee:	Csaba Henk
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-18 16:31 UTC by Pan Ousley
Modified:	2021-03-11 16:42 UTC (History)
CC List:	11 users (show)
Fixed In Version:	glusterfs-3.8.4-52.3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-01-11 02:46:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3336831	0	None	None	None	2018-01-29 14:47:54 UTC
Red Hat Product Errata	RHBA-2018:0083	0	normal	SHIPPED_LIVE	glusterfs bug fix update	2018-01-11 07:46:21 UTC

Description Pan Ousley 2017-12-18 16:31:24 UTC

Description of problem:

"We've got two-node distributed-replicated gluster volumes that we've recently updated (in-service) from RHGS 3.2 to RHGS 3.3.1. Current access protocol in use is native glusterfs fuse.

Since the upgrade, glusterfs processes consume much more swap space than previously, and continue to consume swap until each machine runs out entirely -- it takes about 24 hours for the combined glusterfs threads on each node to consume all the available swap space. Initially, we had 4GB of swap, and upped that to 8GB swap with the same results -- a steady increase in swap consumption until it's gone.

Though the increase in swap consumption is continuous, we did notice an increase when our nightly (4a) backup of the <redacted> volume started -- that backup does enumerate the entire filesystem, and it appears to speed up the consumption of swap space."

I collected SAR data and the output of swap-usage.sh and shared it with a kernel SME, and he confirmed that the glusterfsd process is allocating heap space over and over, growing the VSZ of the userspace daemon. It is forced to use the swap file until it expires because the kernel doesn't have all the real memory to keep each copy in memory and there is no backing store file for heap space. The daemon is responsible for reallocating heap when it is no longer required, but it appears to not be doing that.

Their current workaround is to use scripts to restart glusterd each morning.

Version-Release number of selected component (if applicable): RHGS 3.3.0, RHGS 3.3.1

Additional information: I was researching existing bugs and came across these two:

https://bugzilla.redhat.com/show_bug.cgi?id=1507361
https://bugzilla.redhat.com/show_bug.cgi?id=1495161

Is it possible they are hitting one of these? Their 'application' (various versions of PHP and Apache) seems to use posix locking.

Information collected will be listed in my next comment. Please let me know if anything else is needed. Thank you.

Comment 19 errata-xmlrpc 2018-01-11 02:46:39 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0083

Note You need to log in before you can comment on or make changes to this bug.