Bug 1618221

Summary: If a node disconnects during volume delete, it assumes deleted volume as a freshly created volume when it is back online
Product: Red Hat Gluster Storage Reporter: Atin Mukherjee <amukherj>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED ERRATA QA Contact: Bala Konda Reddy M <bmekala>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.4CC: amukherj, apaladug, bugs, mchangir, nchilaka, rcyriac, rhs-bugs, rtalur, sanandpa, sankarshan, sheggodu, srakonde, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ocs-dependency-issue
Fixed In Version: glusterfs-3.12.2-24 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1605077
: 1631248 (view as bug list) Environment:
Last Closed: 2018-10-31 08:46:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1605077    
Bug Blocks: 1565940, 1582402, 1589070, 1631248    

Description Atin Mukherjee 2018-08-16 12:42:11 UTC
+++ This bug was initially created as a clone of Bug #1605077 +++

Description of problem:
In a cluster of n nodes, if a node goes down during the volume delete operation, When the node is back online, it will have the information about the deleted volume. The node assumes this volume as a freshly created volume and display the volume name if we trigger volume list command. All the remaining nodes in the cluster do not have any information this volume.

Version-Release number of selected component (if applicable):
mainline

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
When the disconnected node is back online, deleted volume's info should be removed from the node. volume list command should not display the volume name of deleted volume.

Additional info:

--- Additional comment from Worker Ant on 2018-07-31 03:29:32 EDT ---

REVIEW: https://review.gluster.org/20592 (glusterd: ignore importingvolume which is undergoing a delete operation) posted (#1) for review on master by Atin Mukherjee

--- Additional comment from Worker Ant on 2018-08-16 08:37:20 EDT ---

COMMIT: https://review.gluster.org/20592 committed in master by "Atin Mukherjee" <amukherj@redhat.com> with a commit message- glusterd: ignore importing volume which is undergoing a delete operation

Problem explanation:

Assuming in a 3 nodes cluster, if N1 originates a delete operation and
while N1's commit phase completes, either glusterd service of N2 or N3
gets disconnected from N1 (before completing the commit phase), N1 will
attempt to end up importing the volume which is in-flight for a delete
in other nodes as a fresh resulting into an incorrect configuration
state.

Fix:

Mark a volume as stage deleted once a volume delete operation passes
it's staging phase and reset this flag during unlock phase. Now during
this intermediate phase if the same volume gets imported to other peers,
it shouldn't considered to be recreated.

An automated .t is quite tough to implement with the current infra.

Test Case:

1. Keep creating and deleting volumes in a loop on a 3 node cluster
2. Simulate n/w failure between the peers (ifdown followed by ifup)
3. Check if output of 'gluster v list | wc -l' is same across all 3
nodes during 1 & 2.

Change-Id: Ifdd5dc39699120258d7fdd42fe2deb9de25c6246
Fixes: bz#1605077
Signed-off-by: Atin Mukherjee <amukherj@redhat.com>

Comment 16 errata-xmlrpc 2018-10-31 08:46:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:3432