Bug 1458801

Summary:	Exclude node from a cluster when node is overloaded
Product:	[Community] GlusterFS	Reporter:	Pavel Znamensky <kompastver>
Component:	core	Assignee:	Sanju <srakonde>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	medium	Docs Contact:
Priority:	low
Version:	mainline	CC:	bugs
Target Milestone:	---	Keywords:	FutureFeature, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-03-17 03:27:22 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Znamensky 2017-06-05 13:47:07 UTC

Cluster hangs when one of the node in replicated cluster is overloading.
Usually all nodes in cluster have the same level of load. But rarely nodes could hang or freeze, but still be reachable.
For instance, when there are problems with filesystem or HDDs.
In this cases whole cluster hangs.
We are able to set ping-timeout but it doesn't help in this case.
Is it possible to set thresholds based on node's performance and mark nodes as failed or unreachable when they are hang or work very slow?

Thanks.

Comment 1 Amar Tumballi 2018-09-18 09:07:07 UTC

This is the way we want to deal with in container storage. The challenge is with data migration involved. We can 'migrate' the process to new node, but the data migration again spikes up the load. 

Recommended way is restrict the CPU for glusterfs using cgroups, and focus on fixing some of the lock contention issues which are being identified.

Comment 2 Worker Ant 2020-03-17 03:27:22 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/1109, and will be tracked there from now on. Visit GitHub issues URL for further details