Bug 1631248 - If a node disconnects during volume delete, it assumes deleted volume as a freshly created volume when it is back online
Summary: If a node disconnects during volume delete, it assumes deleted volume as a fr...
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Atin Mukherjee
QA Contact: Bala Konda Reddy M
Whiteboard: ocs-dependency-issue
Depends On: 1605077 1618221
TreeView+ depends on / blocked
Reported: 2018-09-20 09:59 UTC by Sunil Kumar Acharya
Modified: 2018-10-23 11:18 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When volume delete CLI operation is in progress, if one of the peers in the cluster gets disconnected, a sequence of events occasionally leads to a state where the deleted volume is created back when the peer is connected. This volume appears in the existing cluster as a stale volume. With this fix, handshaking a volume is ignored. Thus no stale volumes will be seen.
Clone Of: 1618221
Last Closed: 2018-10-16 04:57:37 UTC
Target Upstream Version:

Attachments (Terms of Use)

Comment 26 hari gowtham 2018-10-22 10:45:41 UTC

From the logs the following observations were made:

at 13:27:38 the glusterd restart has happened on node

[2018-10-12 13:27:38.467525] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.12.2 (args: /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO)

At this time the volumes have to be synced in the handshake.
The handshake started around 13:27:45

[2018-10-12 13:27:45.983464] I [MSGID: 106163] [glusterd-handshake.c:1319:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 31302

and went till 13:27:59.876472 on

[2018-10-12 13:27:59.876472] I [MSGID: 106493] [glusterd-rpc-ops.c:701:__glusterd_friend_update_cbk] 0-management: Received ACC from uuid: 4382f67b-e759-409f-a579-f8c98ededd83

And the time the gluster volume delete for the volume iceberg-2-1-9-5 has been issued:  
[2018-10-12 13:27:59.875956]  : v delete iceberg-2-1-9-5 : SUCCESS

From the above observation the delete was issued while the handshake was happening.

As the handshake started before delete and the iceberg-2-1-9-5 was supposed to be available in the cross checked file for handshake. And this was being synced in the restarted node (
Now delete was issued and it deleted the entry as the node has come up. After the delete completed successfully and removed the entry, the handshake has created the entry for the iceberg-2-1-9-5. This is why the entry has stage_deleted as false (before delete was issued it is false).

And to confirm further the entry is available only on the node that went through a reboot ( and not in the other nodes.

Note You need to log in before you can comment on or make changes to this bug.