Bug 1458801

Summary: Exclude node from a cluster when node is overloaded
Product: [Community] GlusterFS Reporter: Pavel Znamensky <kompastver>
Component: coreAssignee: Sanju <srakonde>
Status: CLOSED UPSTREAM QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: bugs
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-17 03:27:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Znamensky 2017-06-05 13:47:07 UTC
Cluster hangs when one of the node in replicated cluster is overloading.
Usually all nodes in cluster have the same level of load. But rarely nodes could hang or freeze, but still be reachable.
For instance, when there are problems with filesystem or HDDs.
In this cases whole cluster hangs.
We are able to set ping-timeout but it doesn't help in this case.
Is it possible to set thresholds based on node's performance and mark nodes as failed or unreachable when they are hang or work very slow?

Thanks.

Comment 1 Amar Tumballi 2018-09-18 09:07:07 UTC
This is the way we want to deal with in container storage. The challenge is with data migration involved. We can 'migrate' the process to new node, but the data migration again spikes up the load. 

Recommended way is restrict the CPU for glusterfs using cgroups, and focus on fixing some of the lock contention issues which are being identified.

Comment 2 Worker Ant 2020-03-17 03:27:22 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1109, and will be tracked there from now on. Visit GitHub issues URL for further details