1349782 – [RFE] : gluster volume stop should be handle in graceful manner when few nodes glusterd is down

Bug 1349782 - [RFE] : gluster volume stop should be handle in graceful manner when few nodes glusterd is down

Summary: [RFE] : gluster volume stop should be handle in graceful manner when few node...

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd2
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1067307
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-24 09:19 UTC by Atin Mukherjee
Modified:	2018-11-19 05:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Clone Of:	1067307
Environment:
Last Closed:	2018-11-19 05:20:31 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	576	0	None	None	None	2018-11-19 05:31:35 UTC

Description Atin Mukherjee 2016-06-24 09:19:28 UTC

+++ This bug was initially created as a clone of Bug #1067307 +++

Description of problem:
=======================

Currently when a gluster volume stop is performed, it looks for all the glusterd running on nodes and shuts their glusterfsd and makes the volume offline. But this design has one flaw in following scenario:

Consider 4 node cluster forming 2*2 volume from one brick in each node. The volume is mounted on clinet(Fuse) and it is accessible. Read and Write from the client will be successful. Now stop the glusterd on 2 of the nodes (one from each replica server). Read and Write from the client will be successful. Now stop the volume from the online node which makes the volume stopped. But, Read and Write from the client will still be successful, it is because the glusterfsd process on nodes where the glusterd was made down are still online and gluster volume stop did not consider that.

So from user point of view the volume is stopped and few nodes glusterd is also down , but still able to read and write from the fuse mount.

gluster volume stop should be handled gracefully. 

One of the solution could be to introduce stop and stop force for user to know that the client could still access.

1. When issued "gluster volume stop", if any of the node/glusterd down for the bricks participating in volume, fail the volume stop with proper message

2. When issued "gluster volume stop force" than explicitly inform a user that volume is stopped but their could be some glusterfsd process on down nodes still online which will serve the mount. Do you wish to continue. 

Solution to this is debatable, but surely a user needs education that how the volume is accessible even gluster volume info shows the volume as stopped.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.4.1.1.snap.feb17.2014git-1.el6.x86_64

found with snapshot build but issue is in general


How reproducible:
=================
1/1


Steps to Reproduce:
1. Create and start volume (2*2)
2. Mount the volume (Fuse)
3. Create some data from client
4. Bring down 2 nodes glusterd (node 2 and node 4)
5. Stop the volume from node 1
6. Check volume info, the volume should be stopped.
7. Access the volume

Actual results:
===============
Volume is accessible, as the glusterfsd processes are down only on node 1 and node 3, but glusterfsd process of node 2 and node 4 is still online.


Expected results:
=================

user needs education that how the volume is accessible even gluster volume info shows the volume as stopped.

--- Additional comment from RHEL Product and Program Management on 2014-02-20 03:02:48 EST ---

Since this issue was entered in bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Atin Mukherjee on 2015-12-24 05:16:47 EST ---

With GlusterD 2.0 this problem will go away since the transaction will be based on central store, the transaction itself will fail if the glusterd instance of the node which hosts any one of these brick is down.

Comment 1 Vijay Bellur 2018-11-19 05:31:35 UTC

Migrated to github:

https://github.com/gluster/glusterfs/issues/576

Please follow the github issue for further updates on this bug.

Note You need to log in before you can comment on or make changes to this bug.