Bug 1436167

Summary: Brick Multiplexing: RFE: Need a backup for brick process with multiplexing to avoid single point of failure
Product: [Community] GlusterFS Reporter: Nag Pavan Chilakam <nchilaka>
Component: coreAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10CC: bugs, jeff
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:28:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Nag Pavan Chilakam 2017-03-27 11:37:01 UTC
Description of problem:
=======================
Currently with brick multiplexing we avoid spawning dedicated process for each brick hosted on a node.
While this is the theme of brick multiplexing, we need to also make sure a way to have a backup , can be in the form of process or a different infra so that one process is not a single point of failure.
For eg: lets say we have 50 volumes, host on a gluster cluster of say 2 nodes
Now, when there is a single brick process for all the bricks in each node.
what if the that process gets,killed? then all the bricks in that node gets killed.
Instead why not have a HA or replica or some other infra such that there is no single point of failure. say instead of having 1 process, I can have two processes which are replica of each other.

In this way we have a high availability and also we have brought down the number of process from 50 to 2 using Brick multiplexing

Version-Release number of selected component (if applicable):
[root@dhcp35-192 ~]# rpm -qa|grep gluster
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-debuginfo-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
[root@dhcp35-192 ~]#

Comment 1 Jeff Darcy 2017-03-27 12:10:03 UTC
Replication needs to be across nodes to be meaningful anyway, since node failures (or node-specific network failures) are the most common kind in practice.  We already have a requirement that members of a replica/EC set be on different nodes (though we do allow users to override).  New infrastructure to mirror processes on the same node would add complexity rivaling that of multiplexing itself, and lose significant performance, for very little gain in reliability.

Comment 3 Shyamsundar 2018-06-20 18:28:26 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.