Bug 1631248

Summary: If a node disconnects during volume delete, it assumes deleted volume as a freshly created volume when it is back online
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sunil Kumar Acharya <sheggodu>
Component: glusterdAssignee: Atin Mukherjee <amukherj>
Status: CLOSED DEFERRED QA Contact: Bala Konda Reddy M <bmekala>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.4CC: amukherj, apaladug, bmekala, bugs, hgowtham, mchangir, nberry, nchilaka, rcyriac, rhs-bugs, rtalur, sanandpa, sankarshan, srakonde, srmukher, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: ocs-dependency-issue
Fixed In Version: Doc Type: Bug Fix
Doc Text:
When volume delete CLI operation is in progress, if one of the peers in the cluster gets disconnected, a sequence of events occasionally leads to a state where the deleted volume is created back when the peer is connected. This volume appears in the existing cluster as a stale volume. With this fix, handshaking a volume is ignored. Thus no stale volumes will be seen.
Story Points: ---
Clone Of: 1618221 Environment:
Last Closed: 2018-10-16 04:57:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1605077, 1618221    
Bug Blocks:    

Comment 26 hari gowtham 2018-10-22 10:45:41 UTC
Hi,

From the logs the following observations were made:

at 13:27:38 the glusterd restart has happened on node 10.70.35.38.

[2018-10-12 13:27:38.467525] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

At this time the volumes have to be synced in the handshake.
The handshake started around 13:27:45

[2018-10-12 13:27:45.983464] I [MSGID: 106163] [glusterd-handshake.c:1319:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31302

and went till 13:27:59.876472 on 10.70.35.38

[2018-10-12 13:27:59.876472] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 4382f67b-e759-409f-a579-f8c98ededd83


And the time the gluster volume delete for the volume iceberg-2-1-9-5 has been issued:  
[2018-10-12 13:27:59.875956]  : v delete iceberg-2-1-9-5 : SUCCESS

From the above observation the delete was issued while the handshake was happening.

As the handshake started before delete and the iceberg-2-1-9-5 was supposed to be available in the cross checked file for handshake. And this was being synced in the restarted node (10.70.35.38).
Now delete was issued and it deleted the entry as the node has come up. After the delete completed successfully and removed the entry, the handshake has created the entry for the iceberg-2-1-9-5. This is why the entry has stage_deleted as false (before delete was issued it is false).

And to confirm further the entry is available only on the node that went through a reboot (10.70.35.38) and not in the other nodes.