Bug 1484446 - [GSS] [RFE] Control Gluster process/resource using cgroup through tunables [NEEDINFO]
Summary: [GSS] [RFE] Control Gluster process/resource using cgroup through tunables
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.4.0
Assignee: Mohit Agrawal
QA Contact: nchilaka
URL:
Whiteboard:
Depends On:
Blocks: 1397798 RHGS-3.4-GSS-proposed-tracker 1478395 1479335 1496334 1496335 1503132 1503135 1531939
TreeView+ depends on / blocked
 
Reported: 2017-08-23 15:23 UTC by Bipin Kunal
Modified: 2018-09-17 11:34 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.12.2-2
Doc Type: Bug Fix
Doc Text:
Some gluster daemons like glustershd have a higher cpu or memory consumption, when there is a large amount of data/entries to healed. This results in slow consumption of resources. You can resolve this by running the control-cpu-load.sh script. This script used the control groups for regulating cpu and memory of any gluster daemon.
Clone Of:
: 1531939 (view as bug list)
Environment:
Last Closed: 2018-09-04 06:35:11 UTC
srmukher: needinfo? (moagrawa)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:37:32 UTC

Description Bipin Kunal 2017-08-23 15:23:42 UTC
Description of problem: This RFE is being raised in order to have controls available for gluster process in terms of resource usage. Sometime we see that some process over consume resources. There should be a way in gluster to control that, and this can be easily done using cgroups. 

In past we have seen issue with self heal daemon with replicate and EC volumes.

Here are two reported issues :

https://bugzilla.redhat.com/show_bug.cgi?id=1406363
https://bugzilla.redhat.com/show_bug.cgi?id=1478395

Expected : resource restriction to be done from gluster. Some cli or tool to do that.

Comment 3 Mohit Agrawal 2017-08-29 07:04:26 UTC
Hi Rahul,

We are targeting this for 3.4 to control cpu for selfheald, Can you please test it?
Steps are available in https://bugzilla.redhat.com/show_bug.cgi?id=1478395#c12


Regards
Mohit Agrawal

Comment 4 Bipin Kunal 2017-09-01 07:31:23 UTC
Hi Mohit,

   This RFE was based on discussion with Alok/Ric/Sankarshan during Ric's Pune visit. Suggestion was to have tuned-adm profile for this kind of tuning and we should avoid giving them bunch of manual steps.
     
Thanks,
Bipin Kunal

Comment 5 Atin Mukherjee 2017-10-09 13:43:14 UTC
upstream patch : https://review.gluster.org/#/c/18404/

Comment 8 Atin Mukherjee 2017-12-01 07:59:16 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/124875/

Comment 9 nchilaka 2018-04-04 12:38:51 UTC
Test version:3.12-2-6
moving to verified as below

based on my testing in comment at https://bugzilla.redhat.com/show_bug.cgi?id=1406363#c12

and also tested below for memory consumption management.
However, I noticed that memory consumption doesnt go back to the set limit,
however, kernel notifies the non-compliance as below , as by default oom killing is disabled(for which I would raise a new bug)
[root@dhcp37-174 ~]# cat /sys/fs/cgroup/memory/system.slice/glusterd.service/cgroup_gluster_26704/memory.failcnt 
39970

So the script is working as expected, as once memory consumption is crossing the limit, we can notice it as above with this script, which was not previously available.



[root@dhcp37-174 ~]# top -n 1 -b|grep gluster
26704 root      20   0 2532892 119024   4936 S 125.0  1.5   1:36.07 glusterfsd
 4047 root      20   0  680856  13944   4392 S   0.0  0.2   0:44.89 glusterd
26740 root      20   0 1318488  58020   3220 S   0.0  0.7   0:02.86 glusterfs
[root@dhcp37-174 ~]# cd /usr/share/
[root@dhcp37-174 share]# cd glusterfs/scripts/
[root@dhcp37-174 scripts]# ls
control-cpu-load.sh    get-gfid.sh                       schedule_georep.pyc
control-mem.sh         gsync-sync-gfid                   schedule_georep.pyo
eventsdash.py          gsync-upgrade.sh                  slave-upgrade.sh
eventsdash.pyc         post-upgrade-script-for-quota.sh  stop-all-gluster-processes.sh
eventsdash.pyo         pre-upgrade-script-for-quota.sh
generate-gfid-file.sh  schedule_georep.py
[root@dhcp37-174 scripts]# ./control-mem.sh

Enter Any gluster daemon pid for that you want to control MEMORY.
26704
If you want to continue the script to attach daeomon with new cgroup. Press (y/n)?y
yes
Creating child cgroup directory 'cgroup_gluster_26704 cgroup' for glusterd.service.
Enter Memory value in Mega bytes [100,8000000000000]:
110
Entered memory limit value is 110.
Setting 115343360 to memory.limit_in_bytes for /sys/fs/cgroup/memory/system.slice/glusterd.service/cgroup_gluster_26704.
Tasks are attached successfully specific to 26704 to cgroup_gluster_26704.

Comment 10 Srijita Mukherjee 2018-09-03 15:33:12 UTC
Have updated the doc text. kindly review and confirm.

Comment 12 errata-xmlrpc 2018-09-04 06:35:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607


Note You need to log in before you can comment on or make changes to this bug.