1067307 – [RFE] : gluster volume stop should be handle in graceful manner when few nodes glusterd is down

Bug 1067307 - [RFE] : gluster volume stop should be handle in graceful manner when few nodes glusterd is down

Summary: [RFE] : gluster volume stop should be handle in graceful manner when few node...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1349782
TreeView+	depends on / blocked

Reported:	2014-02-20 07:41 UTC by Rahul Hinduja
Modified:	2016-06-24 09:20 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Clones:	1349782 (view as bug list)
Environment:
Last Closed:	2016-06-24 09:20:50 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2014-02-20 07:41:18 UTC

Description of problem:
=======================

Currently when a gluster volume stop is performed, it looks for all the glusterd running on nodes and shuts their glusterfsd and makes the volume offline. But this design has one flaw in following scenario:

Consider 4 node cluster forming 2*2 volume from one brick in each node. The volume is mounted on clinet(Fuse) and it is accessible. Read and Write from the client will be successful. Now stop the glusterd on 2 of the nodes (one from each replica server). Read and Write from the client will be successful. Now stop the volume from the online node which makes the volume stopped. But, Read and Write from the client will still be successful, it is because the glusterfsd process on nodes where the glusterd was made down are still online and gluster volume stop did not consider that.

So from user point of view the volume is stopped and few nodes glusterd is also down , but still able to read and write from the fuse mount.

gluster volume stop should be handled gracefully. 

One of the solution could be to introduce stop and stop force for user to know that the client could still access.

1. When issued "gluster volume stop", if any of the node/glusterd down for the bricks participating in volume, fail the volume stop with proper message

2. When issued "gluster volume stop force" than explicitly inform a user that volume is stopped but their could be some glusterfsd process on down nodes still online which will serve the mount. Do you wish to continue. 

Solution to this is debatable, but surely a user needs education that how the volume is accessible even gluster volume info shows the volume as stopped.

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.4.1.1.snap.feb17.2014git-1.el6.x86_64

found with snapshot build but issue is in general


How reproducible:
=================
1/1


Steps to Reproduce:
1. Create and start volume (2*2)
2. Mount the volume (Fuse)
3. Create some data from client
4. Bring down 2 nodes glusterd (node 2 and node 4)
5. Stop the volume from node 1
6. Check volume info, the volume should be stopped.
7. Access the volume

Actual results:
===============
Volume is accessible, as the glusterfsd processes are down only on node 1 and node 3, but glusterfsd process of node 2 and node 4 is still online.


Expected results:
=================

user needs education that how the volume is accessible even gluster volume info shows the volume as stopped.

Comment 2 Atin Mukherjee 2015-12-24 10:16:47 UTC

With GlusterD 2.0 this problem will go away since the transaction will be based on central store, the transaction itself will fail if the glusterd instance of the node which hosts any one of these brick is down.

Comment 3 Atin Mukherjee 2016-06-24 09:20:50 UTC

Based on comment 2 closing this bug.

Note You need to log in before you can comment on or make changes to this bug.