Red Hat Bugzilla – Bug 1527147
[GSS] [Regression] glusterfs threads consuming swap space until it runs out
Last modified: 2018-01-29 09:47:55 EST
Description of problem:
"We've got two-node distributed-replicated gluster volumes that we've recently updated (in-service) from RHGS 3.2 to RHGS 3.3.1. Current access protocol in use is native glusterfs fuse.
Since the upgrade, glusterfs processes consume much more swap space than previously, and continue to consume swap until each machine runs out entirely -- it takes about 24 hours for the combined glusterfs threads on each node to consume all the available swap space. Initially, we had 4GB of swap, and upped that to 8GB swap with the same results -- a steady increase in swap consumption until it's gone.
Though the increase in swap consumption is continuous, we did notice an increase when our nightly (4a) backup of the <redacted> volume started -- that backup does enumerate the entire filesystem, and it appears to speed up the consumption of swap space."
I collected SAR data and the output of swap-usage.sh and shared it with a kernel SME, and he confirmed that the glusterfsd process is allocating heap space over and over, growing the VSZ of the userspace daemon. It is forced to use the swap file until it expires because the kernel doesn't have all the real memory to keep each copy in memory and there is no backing store file for heap space. The daemon is responsible for reallocating heap when it is no longer required, but it appears to not be doing that.
Their current workaround is to use scripts to restart glusterd each morning.
Version-Release number of selected component (if applicable): RHGS 3.3.0, RHGS 3.3.1
Additional information: I was researching existing bugs and came across these two:
Is it possible they are hitting one of these? Their 'application' (various versions of PHP and Apache) seems to use posix locking.
Information collected will be listed in my next comment. Please let me know if anything else is needed. Thank you.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.