Bug 1631248 - If a node disconnects during volume delete, it assumes deleted volume as a freshly created volume when it is back online
Summary: If a node disconnects during volume delete, it assumes deleted volume as a fr...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
URL:
Whiteboard: ocs-dependency-issue
Depends On: 1605077 1618221
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-20 09:59 UTC by Sunil Kumar Acharya
Modified: 2018-10-23 11:18 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When volume delete CLI operation is in progress, if one of the peers in the cluster gets disconnected, a sequence of events occasionally leads to a state where the deleted volume is created back when the peer is connected. This volume appears in the existing cluster as a stale volume. With this fix, handshaking a volume is ignored. Thus no stale volumes will be seen.
Clone Of: 1618221
Environment:
Last Closed: 2018-10-16 04:57:37 UTC
Embargoed:


Attachments (Terms of Use)

Comment 26 hari gowtham 2018-10-22 10:45:41 UTC
Hi,

From the logs the following observations were made:

at 13:27:38 the glusterd restart has happened on node 10.70.35.38.

[2018-10-12 13:27:38.467525] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

At this time the volumes have to be synced in the handshake.
The handshake started around 13:27:45

[2018-10-12 13:27:45.983464] I [MSGID: 106163] [glusterd-handshake.c:1319:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31302

and went till 13:27:59.876472 on 10.70.35.38

[2018-10-12 13:27:59.876472] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 4382f67b-e759-409f-a579-f8c98ededd83


And the time the gluster volume delete for the volume iceberg-2-1-9-5 has been issued:  
[2018-10-12 13:27:59.875956]  : v delete iceberg-2-1-9-5 : SUCCESS

From the above observation the delete was issued while the handshake was happening.

As the handshake started before delete and the iceberg-2-1-9-5 was supposed to be available in the cross checked file for handshake. And this was being synced in the restarted node (10.70.35.38).
Now delete was issued and it deleted the entry as the node has come up. After the delete completed successfully and removed the entry, the handshake has created the entry for the iceberg-2-1-9-5. This is why the entry has stage_deleted as false (before delete was issued it is false).

And to confirm further the entry is available only on the node that went through a reboot (10.70.35.38) and not in the other nodes.


Note You need to log in before you can comment on or make changes to this bug.