1436167 – Brick Multiplexing: RFE: Need a backup for brick process with multiplexing to avoid single point of failure

Bug 1436167 - Brick Multiplexing: RFE: Need a backup for brick process with multiplexing to avoid single point of failure

Summary: Brick Multiplexing: RFE: Need a backup for brick process with multiplexing to...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-27 11:37 UTC by Nag Pavan Chilakam
Modified:	2018-06-20 18:28 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-06-20 18:28:26 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Nag Pavan Chilakam 2017-03-27 11:37:01 UTC

Description of problem:
=======================
Currently with brick multiplexing we avoid spawning dedicated process for each brick hosted on a node.
While this is the theme of brick multiplexing, we need to also make sure a way to have a backup , can be in the form of process or a different infra so that one process is not a single point of failure.
For eg: lets say we have 50 volumes, host on a gluster cluster of say 2 nodes
Now, when there is a single brick process for all the bricks in each node.
what if the that process gets,killed? then all the bricks in that node gets killed.
Instead why not have a HA or replica or some other infra such that there is no single point of failure. say instead of having 1 process, I can have two processes which are replica of each other.

In this way we have a high availability and also we have brought down the number of process from 50 to 2 using Brick multiplexing

Version-Release number of selected component (if applicable):
[root@dhcp35-192 ~]# rpm -qa|grep gluster
glusterfs-libs-3.10.0-1.el7.x86_64
glusterfs-api-3.10.0-1.el7.x86_64
glusterfs-debuginfo-3.10.0-1.el7.x86_64
glusterfs-3.10.0-1.el7.x86_64
glusterfs-fuse-3.10.0-1.el7.x86_64
glusterfs-cli-3.10.0-1.el7.x86_64
glusterfs-rdma-3.10.0-1.el7.x86_64
glusterfs-client-xlators-3.10.0-1.el7.x86_64
glusterfs-server-3.10.0-1.el7.x86_64
[root@dhcp35-192 ~]#

Comment 1 Jeff Darcy 2017-03-27 12:10:03 UTC

Replication needs to be across nodes to be meaningful anyway, since node failures (or node-specific network failures) are the most common kind in practice.  We already have a requirement that members of a replica/EC set be on different nodes (though we do allow users to override).  New infrastructure to mirror processes on the same node would add complexity rivaling that of multiplexing itself, and lose significant performance, for very little gain in reliability.

Comment 3 Shyamsundar 2018-06-20 18:28:26 UTC

This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.

Note You need to log in before you can comment on or make changes to this bug.