1344625 – fail delete volume operation if one of the glusterd instance is down in cluster

Bug 1344625 - fail delete volume operation if one of the glusterd instance is down in cluster

Summary: fail delete volume operation if one of the glusterd instance is down in cluster

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:	1344407 1344631 1344634
Blocks:	1311817 1344239
TreeView+	depends on / blocked

Reported:	2016-06-10 08:24 UTC by Atin Mukherjee
Modified:	2016-09-17 16:46 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.7.9-10
Doc Type:	Bug Fix
Doc Text:	The 'volume delete' operation succeeded even when an instance of glusterd was down. This meant that when the glusterd instance recovered, it re-synced the deleted volume to the cluster. This update ensures that 'volume delete' operations fail when an instance of glusterd is not available.
Clone Of:	1344407
Environment:
Last Closed:	2016-06-23 05:26:33 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Atin Mukherjee 2016-06-10 08:24:32 UTC

+++ This bug was initially created as a clone of Bug #1344407 +++

Description of problem:

If a volume is deleted when one of the glusterd instance on a node is down in the cluster then once glusterd comes back it re-syncs the same volume to all of the nodes. User will get annoyed to see the volume back into the namespace.

Version-Release number of selected component (if applicable):
mainline

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Vijay Bellur on 2016-06-09 11:38:08 EDT ---

REVIEW: http://review.gluster.org/14681 (glusterd: fail volume delete if one of the node is down) posted (#2) for review on master by Atin Mukherjee (amukherj)

--- Additional comment from Vijay Bellur on 2016-06-10 03:31:02 EDT ---

COMMIT: http://review.gluster.org/14681 committed in master by Kaushal M (kaushal) 
------
commit 5016cc548d4368b1c180459d6fa8ae012bb21d6e
Author: Atin Mukherjee <amukherj>
Date:   Thu Jun 9 18:22:43 2016 +0530

    glusterd: fail volume delete if one of the node is down
    
    Deleting a volume on a cluster where one of the node in the cluster is down is
    buggy since once that node comes back the resync of the same volume will happen.
    Till we bring in the soft delete feature tracked in
    http://review.gluster.org/12963 this is a safe guard to block the volume
    deletion.
    
    Change-Id: I9c13869c4a7e7a947f88842c6dc6f231c0eeda6c
    BUG: 1344407
    Signed-off-by: Atin Mukherjee <amukherj>
    Reviewed-on: http://review.gluster.org/14681
    Smoke: Gluster Build System <jenkins.com>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Kaushal M <kaushal>
    NetBSD-regression: NetBSD Build System <jenkins.org>

Comment 3 Atin Mukherjee 2016-06-10 08:39:53 UTC

Downstream patch https://code.engineering.redhat.com/gerrit/76322 posted for review.

Comment 6 Atin Mukherjee 2016-06-10 16:34:33 UTC

Laura,

this needs your attention and hence I am raising a need_info now :)

~Atin

Comment 8 Atin Mukherjee 2016-06-13 04:22:13 UTC

Laura,

Mentioning node is unavailable will not be technically correct. This indicates that the node could be down or under maintenance. The issue is all about when one or more glusterd instances are down. Can you please reword?

~Atin

Comment 9 Atin Mukherjee 2016-06-13 04:29:44 UTC

LGTM :)

Comment 10 Byreddy 2016-06-13 05:24:03 UTC

Verified this bug using the build "glusterfs-3.7.9-10"

Fix is working good, it's not allowing to delete the volume when the peer nodes are down and able to delete once offline nodes comes up and able to create new volume.

Test cases verified for this fix are:
=====================================
1. Stop and Delete volume when one of the node is down - Pass
2. Delete the volume by starting the shutdown node - Pass
3. Stop and delete the volume when nodes are down - Pass
4. Bring up one node out of two offline nodes and delete the volume - Pass
5. Bring up all the offline nodes and delete the volume - Pass
6. Delete the volume when one of the peer node which is not hosting volume bricks is offline -Pass
7. Stop the volume when all the nodes are online and move one of node to offline and delete the volume - Pass
8. Stop the volume when one the peer node is down and probe new node and delete the volume - Pass
9. Create a volume (don't start ) and down one of the node and delete the volume - Pass
10. Have multiple volumes and down one of the peer node and delete the volumes - Pass
11. Delete the volume(s) when offline node(s) comes up - Pass
12. Delete the volume by powering off one of the peer node - Pass
13. when one of the node is down, create the volume and try to delele - Pass
14. Create a volume, down one of the node and create new volume using online node bricks -Pass

With all above details moving to verified state.

Comment 12 errata-xmlrpc 2016-06-23 05:26:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.