Description of problem: This RFE is being raised in order to have controls available for gluster process in terms of resource usage. Sometime we see that some process over consume resources. There should be a way in gluster to control that, and this can be easily done using cgroups.
In past we have seen issue with self heal daemon with replicate and EC volumes.
Here are two reported issues :
Expected : resource restriction to be done from gluster. Some cli or tool to do that.
We are targeting this for 3.4 to control cpu for selfheald, Can you please test it?
Steps are available in https://bugzilla.redhat.com/show_bug.cgi?id=1478395#c12
This RFE was based on discussion with Alok/Ric/Sankarshan during Ric's Pune visit. Suggestion was to have tuned-adm profile for this kind of tuning and we should avoid giving them bunch of manual steps.
upstream patch : https://review.gluster.org/#/c/18404/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/124875/
moving to verified as below
based on my testing in comment at https://bugzilla.redhat.com/show_bug.cgi?id=1406363#c12
and also tested below for memory consumption management.
However, I noticed that memory consumption doesnt go back to the set limit,
however, kernel notifies the non-compliance as below , as by default oom killing is disabled(for which I would raise a new bug)
[root@dhcp37-174 ~]# cat /sys/fs/cgroup/memory/system.slice/glusterd.service/cgroup_gluster_26704/memory.failcnt
So the script is working as expected, as once memory consumption is crossing the limit, we can notice it as above with this script, which was not previously available.
[root@dhcp37-174 ~]# top -n 1 -b|grep gluster
26704 root 20 0 2532892 119024 4936 S 125.0 1.5 1:36.07 glusterfsd
4047 root 20 0 680856 13944 4392 S 0.0 0.2 0:44.89 glusterd
26740 root 20 0 1318488 58020 3220 S 0.0 0.7 0:02.86 glusterfs
[root@dhcp37-174 ~]# cd /usr/share/
[root@dhcp37-174 share]# cd glusterfs/scripts/
[root@dhcp37-174 scripts]# ls
control-cpu-load.sh get-gfid.sh schedule_georep.pyc
control-mem.sh gsync-sync-gfid schedule_georep.pyo
eventsdash.py gsync-upgrade.sh slave-upgrade.sh
eventsdash.pyc post-upgrade-script-for-quota.sh stop-all-gluster-processes.sh
[root@dhcp37-174 scripts]# ./control-mem.sh
Enter Any gluster daemon pid for that you want to control MEMORY.
If you want to continue the script to attach daeomon with new cgroup. Press (y/n)?y
Creating child cgroup directory 'cgroup_gluster_26704 cgroup' for glusterd.service.
Enter Memory value in Mega bytes [100,8000000000000]:
Entered memory limit value is 110.
Setting 115343360 to memory.limit_in_bytes for /sys/fs/cgroup/memory/system.slice/glusterd.service/cgroup_gluster_26704.
Tasks are attached successfully specific to 26704 to cgroup_gluster_26704.
Have updated the doc text. kindly review and confirm.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.