Bug 1631248

Summary:	If a node disconnects during volume delete, it assumes deleted volume as a freshly created volume when it is back online
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Sunil Kumar Acharya <sheggodu>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED DEFERRED	QA Contact:	Bala Konda Reddy M <bmekala>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	rhgs-3.4	CC:	amukherj, apaladug, bmekala, bugs, hgowtham, mchangir, nberry, nchilaka, rcyriac, rhs-bugs, rtalur, sanandpa, sankarshan, srakonde, srmukher, storage-qa-internal, vbellur
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	ocs-dependency-issue
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	When volume delete CLI operation is in progress, if one of the peers in the cluster gets disconnected, a sequence of events occasionally leads to a state where the deleted volume is created back when the peer is connected. This volume appears in the existing cluster as a stale volume. With this fix, handshaking a volume is ignored. Thus no stale volumes will be seen.	Story Points:	---
Clone Of:	1618221	Environment:
Last Closed:	2018-10-16 04:57:37 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1605077, 1618221
Bug Blocks:

Comment 26 hari gowtham 2018-10-22 10:45:41 UTC

Hi,

From the logs the following observations were made:

at 13:27:38 the glusterd restart has happened on node 10.70.35.38.

[2018-10-12 13:27:38.467525] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

At this time the volumes have to be synced in the handshake.
The handshake started around 13:27:45

[2018-10-12 13:27:45.983464] I [MSGID: 106163] [glusterd-handshake.c:1319:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31302

and went till 13:27:59.876472 on 10.70.35.38

[2018-10-12 13:27:59.876472] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 4382f67b-e759-409f-a579-f8c98ededd83


And the time the gluster volume delete for the volume iceberg-2-1-9-5 has been issued:  
[2018-10-12 13:27:59.875956]  : v delete iceberg-2-1-9-5 : SUCCESS

From the above observation the delete was issued while the handshake was happening.

As the handshake started before delete and the iceberg-2-1-9-5 was supposed to be available in the cross checked file for handshake. And this was being synced in the restarted node (10.70.35.38).
Now delete was issued and it deleted the entry as the node has come up. After the delete completed successfully and removed the entry, the handshake has created the entry for the iceberg-2-1-9-5. This is why the entry has stage_deleted as false (before delete was issued it is false).

And to confirm further the entry is available only on the node that went through a reboot (10.70.35.38) and not in the other nodes.