Bug 1527147 - [GSS] [Regression] glusterfs threads consuming swap space until it runs out
[GSS] [Regression] glusterfs threads consuming swap space until it runs out
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: protocol (Show other bugs)
All Linux
unspecified Severity high
: ---
: RHGS 3.3.1 Async
Assigned To: Csaba Henk
: Regression, ZStream
Depends On:
  Show dependency treegraph
Reported: 2017-12-18 11:31 EST by Pan Ousley
Modified: 2018-01-29 09:47 EST (History)
11 users (show)

See Also:
Fixed In Version: glusterfs-3.8.4-52.3
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2018-01-10 21:46:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3336831 None None None 2018-01-29 09:47 EST
Red Hat Product Errata RHBA-2018:0083 normal SHIPPED_LIVE glusterfs bug fix update 2018-01-11 02:46:21 EST

  None (edit)
Description Pan Ousley 2017-12-18 11:31:24 EST
Description of problem:

"We've got two-node distributed-replicated gluster volumes that we've recently updated (in-service) from RHGS 3.2 to RHGS 3.3.1. Current access protocol in use is native glusterfs fuse.

Since the upgrade, glusterfs processes consume much more swap space than previously, and continue to consume swap until each machine runs out entirely -- it takes about 24 hours for the combined glusterfs threads on each node to consume all the available swap space. Initially, we had 4GB of swap, and upped that to 8GB swap with the same results -- a steady increase in swap consumption until it's gone.

Though the increase in swap consumption is continuous, we did notice an increase when our nightly (4a) backup of the <redacted> volume started -- that backup does enumerate the entire filesystem, and it appears to speed up the consumption of swap space."

I collected SAR data and the output of swap-usage.sh and shared it with a kernel SME, and he confirmed that the glusterfsd process is allocating heap space over and over, growing the VSZ of the userspace daemon. It is forced to use the swap file until it expires because the kernel doesn't have all the real memory to keep each copy in memory and there is no backing store file for heap space. The daemon is responsible for reallocating heap when it is no longer required, but it appears to not be doing that.

Their current workaround is to use scripts to restart glusterd each morning.

Version-Release number of selected component (if applicable): RHGS 3.3.0, RHGS 3.3.1

Additional information: I was researching existing bugs and came across these two:


Is it possible they are hitting one of these? Their 'application' (various versions of PHP and Apache) seems to use posix locking.

Information collected will be listed in my next comment. Please let me know if anything else is needed. Thank you.
Comment 19 errata-xmlrpc 2018-01-10 21:46:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.