Description of problem
On single machine of trusted storage pool monitored by RHGSWA, glusterd process
memory usage grows about 1.3 GB per day, consuming all available memory in few
On all other nodes, the memory growth was smaller (about 80 MB/day),
which is within limits what has been already reported as BZ 1664046.
Version-Release number of selected component
# rpm -qa |grep gluster | sort
# rpm -qa | grep tendrl | sort
I don't know. I haven't seen this before and haven't chance to reproduce
it again (as I decided to collect data before retrying).
Steps to Reproduce
1. Install setup RHGS cluster on 6 machines, with 2 volumes
(using standard usmqe configuration).
2. Install RHGSWA on separate machine and import trusted storage pool
3. Mount one volume on dedicated client machine, and fill it completely
with 10 MB files, then free the space.
4. Let the cluster operational for few days
The memory usage on one node grows at about 1.3 GB per day.
The machine has 7821 MB of memory, and within one day, memory
consumption jumped from 65 % to 82 % (see screenshot from WA
dashboard) => (7821/2**10)*.17 GB/day = 1.3 GB/day
The memory utilization doesn't grow that rapidly.
Note: sos report killed all the bricks => memory was freed and I was not
able to create proper statedump report.
I can't directly confirm that the node was used as RHGSWA Provisioner Node,
when I tried to find what node is the provisioner, no machine was assigned.
I will try to confirm this indirectly, checking logs.
That said, it's possible that the problem is triggered by commands executed by
some RHGSWA component. See attached cmd_history.log file.
Other memory leak BZs
At the time of reporting this bug, the following memory leak bugs were open:
* Bug 1651915
* Bug 1664046
Since I'm not sure about the reproducer, I list the bugs here as it could be
related. That said, I noticed enough differences in my case compared to these
already reported bugs, I created a separate bug:
* Compared to BZ 1664046, the memory growth is much faster. I see 1.3 GB/day,
while in BZ 1664046, the rate is about 100 MB/day. Moreover, I see it on
single node (out of 6) only, while in BZ 1664046, all storage machines are
* Compared to BZ 1651915, I see no "volume status" commands in cmd_history.log
The growth rate also differs, but the change could be caused by differences
Created attachment 1521305 [details]
screenshot of RHGSWA host dashboard, with Memory Utilization chart for 7 days
*** Bug 1664046 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.