Bug 1458801 - Exclude node from a cluster when node is overloaded
Summary: Exclude node from a cluster when node is overloaded
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Sanju
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-06-05 13:47 UTC by Pavel Znamensky
Modified: 2020-03-17 03:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-17 03:27:22 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pavel Znamensky 2017-06-05 13:47:07 UTC
Cluster hangs when one of the node in replicated cluster is overloading.
Usually all nodes in cluster have the same level of load. But rarely nodes could hang or freeze, but still be reachable.
For instance, when there are problems with filesystem or HDDs.
In this cases whole cluster hangs.
We are able to set ping-timeout but it doesn't help in this case.
Is it possible to set thresholds based on node's performance and mark nodes as failed or unreachable when they are hang or work very slow?

Thanks.

Comment 1 Amar Tumballi 2018-09-18 09:07:07 UTC
This is the way we want to deal with in container storage. The challenge is with data migration involved. We can 'migrate' the process to new node, but the data migration again spikes up the load. 

Recommended way is restrict the CPU for glusterfs using cgroups, and focus on fixing some of the lock contention issues which are being identified.

Comment 2 Worker Ant 2020-03-17 03:27:22 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/1109, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.