1406363 – [GSS][RFE] Provide option to control heal load for disperse volume

Bug 1406363 - [GSS][RFE] Provide option to control heal load for disperse volume

Summary: [GSS][RFE] Provide option to control heal load for disperse volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	disperse
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	Ashish Pandey
QA Contact:	Nag Pavan Chilakam
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1369781 1408949 1460665 1468983 RHGS-3.4-GSS-proposed-tracker 1503132 1503135 1531935
TreeView+	depends on / blocked

Reported:	2016-12-20 11:53 UTC by Bipin Kunal
Modified:	2020-02-14 18:21 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.12.2-2
Doc Type:	If docs needed, set a value
Doc Text:	undefined
Clone Of:
Clones:	1460665 1531935 (view as bug list)
Environment:
Last Closed:	2018-09-04 06:29:55 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1531935	0	medium	CLOSED	[Doc RFE] Document option to control self-heal for a dispersed volume	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHSA-2018:2607	0	None	None	None	2018-09-04 06:32:05 UTC

Internal Links: 1531935

Description Bipin Kunal 2016-12-20 11:53:19 UTC

Description of problem: 

We should have a way to control self-heal for a disperse volume. At sometime when we have multiple bricks on a node. If a node goes down and comes up, several brick will start healing. This can lead to very high CPU usage and this can hamper ongoing IO. So we should have a mechanism to control heal, this will help in performing lazy heal during peak time and aggressive heal during less load.

Comment 12 Nag Pavan Chilakam 2018-04-04 12:30:51 UTC

on_qa validation:
Have tested below:
/usr/share/glusterfs/scripts/control-cpu-load.sh -->able to set cpu load limit to glusterfsd/shd/glusterd process.
Suppose a process was consuming 100% and I set the limit to 20%, the cpu % comes down and doesnt cross 20% (+ a very small delta eg:20.5 ,etc)

[root@dhcp35-205 scripts]# ./control-cpu-load.sh 
Enter gluster daemon pid for which you want to control CPU.
23530
pid 23530 is attached with glusterd.service cgroup.
pid 23530 is not attached with cgroup_gluster_23530.
If you want to continue the script to attach 23530 with new cgroup_gluster_23530 cgroup Press (y/n)?y
yes
Creating child cgroup directory 'cgroup_gluster_23530 cgroup' for glusterd.service.
Enter quota value in range [10,100]:  
30
Entered quota value is 30
Setting 30000 to cpu.cfs_quota_us for gluster_cgroup.
Tasks are attached successfully specific to 23530 to cgroup_gluster_23530.


In above case cpu consumption was arrested to max 30%


hence moving to verified.

If I see any specific observations will raise new bugs, but at a high level this feature works

Comment 13 errata-xmlrpc 2018-09-04 06:29:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Note You need to log in before you can comment on or make changes to this bug.