1802041 – Peer is already being detached from cluster.

Bug 1802041 - Peer is already being detached from cluster.

Summary: Peer is already being detached from cluster.

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	7
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-12 09:14 UTC by akshsy
Modified:	2020-03-04 10:33 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-04 10:33:36 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
glusterd.logs (191.71 KB, text/plain) 2020-02-12 09:14 UTC, akshsy	no flags	Details
View All

Description akshsy 2020-02-12 09:14:12 UTC

Created attachment 1662619 [details]
glusterd.logs

Description of problem: Peer is already being detached from cluster.



Version-Release number of selected component (if applicable): 7.1


How reproducible: intermittent 


Steps to Reproduce:
1. Have a gluster volume of 4 bricks and 4 peers.
2. Now stop the volume, Remove the brick of a peer.
3. Detach that peer.

Actual results:
Peer is already being detached from cluster.



Expected results: Peer detach success.


Gluster_cmd_history.log:

 peer probe 192.168.2.174 : FAILED : Cluster quorum is not met. Changing peers is not allowed in this state
[2020-02-11 14:45:38.194728]  : volume set all cluster.server-quorum-ratio 0% : SUCCESS
[2020-02-11 14:45:38.261141]  : volume stop clusterfs : FAILED : Another transaction is in progress. Please try again after some time.
[2020-02-11 14:45:41.335167]  : volume stop clusterfs : FAILED : Another transaction is in progress. Please try again after some time.
[2020-02-11 14:45:44.398650]  : volume stop clusterfs : FAILED : Another transaction is in progress. Please try again after some time.
[2020-02-11 14:45:47.471528]  : volume stop clusterfs : FAILED : Another transaction is in progress. Please try again after some time.
[2020-02-11 14:45:50.530094]  : volume stop clusterfs : FAILED : Another transaction is in progress. Please try again after some time.
[2020-02-11 14:45:53.601531]  : volume remove-brick clusterfs replica 3 192.168.2.171:/mnt/glusterfs/bricks/clusterfs force : FAILED : Locking failed on 192.168.2.172. Please check log file for details.
[2020-02-11 14:45:55.100656]  : volume remove-brick clusterfs replica 3 192.168.2.171:/mnt/glusterfs/bricks/clusterfs force : SUCCESS
[2020-02-11 14:45:55.453267]  : volume remove-brick clusterfs replica 2 192.168.2.172:/mnt/glusterfs/bricks/clusterfs force : SUCCESS
[2020-02-11 14:45:56.810151]  : volume remove-brick clusterfs replica 1 192.168.2.175:/mnt/glusterfs/bricks/clusterfs force : SUCCESS
[2020-02-11 14:45:57.023880]  : volume set all cluster.server-quorum-ratio 0% : SUCCESS
[2020-02-11 14:45:58.447129]  : volume stop testapp_test_vol : SUCCESS
[2020-02-11 14:45:58.794148]  : volume remove-brick testapp_test_vol replica 3 192.168.2.171:/mnt/glusterfs/bricks/testapp_test_vol force : SUCCESS
[2020-02-11 14:45:59.130836]  : volume remove-brick testapp_test_vol replica 2 192.168.2.172:/mnt/glusterfs/bricks/testapp_test_vol force : SUCCESS
[2020-02-11 14:45:59.508541]  : volume remove-brick testapp_test_vol replica 1 192.168.2.175:/mnt/glusterfs/bricks/testapp_test_vol force : SUCCESS
[2020-02-11 14:45:59.646319]  : peer detach 192.168.2.172 force : SUCCESS
[2020-02-11 14:46:32.431234]  : peer detach 192.168.2.171 force : FAILED : Peer is already being detached from cluster.
Check peer status by running gluster peer status
[2020-02-11 14:46:32.493063]  : peer detach 192.168.2.175 force : SUCCESS
[2020-02-11 14:46:32.698910]  : volume delete clusterfs : FAILED : Volume clusterfs has been started.Volume needs to be stopped before deletion.
[2020-02-11 14:46:35.772386]  : volume delete clusterfs : FAILED : Volume clusterfs has been started.Volume needs to be stopped before deletion.
[2020-02-11 14:46:38.842208]  : volume delete clusterfs : FAILED : Volume clusterfs has been started.Volume needs to be stopped before deletion.
[2020-02-11 14:46:41.915457]  : volume delete clusterfs : FAILED : Volume clusterfs has been started.Volume needs to be stopped before deletion.
[2020-02-11 14:46:44.991937]  : volume delete clusterfs : FAILED : Volume clusterfs has been started.Volume needs to be stopped before deletion.
[2020-02-11 14:46:48.709217]  : volume set all cluster.server-quorum-ratio 0% : SUCCESS
[2020-02-11 14:46:49.955656]  : volume stop clusterfs : SUCCESS
[2020-02-11 14:46:50.161455]  : volume set all cluster.server-quorum-ratio 0% : SUCCESS
[2020-02-11 14:46:50.229301]  : volume stop testapp_test_vol : FAILED : Volume testapp_test_vol is not in the started state
[2020-02-11 14:46:53.302157]  : volume stop testapp_test_vol : FAILED : Volume testapp_test_vol is not in the started state
[2020-02-11 14:46:56.372807]  : volume stop testapp_test_vol : FAILED : Volume testapp_test_vol is not in the started state
[2020-02-11 14:46:59.440233]  : volume stop testapp_test_vol : FAILED : Volume testapp_test_vol is not in the started state
[2020-02-11 14:47:02.509268]  : volume stop testapp_test_vol : FAILED : Volume testapp_test_vol is not in the started state
[2020-02-11 14:47:05.647540]  : peer detach 192.168.2.171 force : FAILED : Peer is already being detached from cluster.
Check peer status by running gluster peer status
[2020-02-11 14:47:05.845251]  : volume delete clusterfs : FAILED : Some of the peers are down
[2020-02-11 14:47:08.914379]  : volume delete clusterfs : FAILED : Some of the peers are down
[2020-02-11 14:47:11.981630]  : volume delete clusterfs : FAILED : Some of the peers are down
[2020-02-11 14:47:15.050954]  : volume delete clusterfs : FAILED : Some of the peers are down
[2020-02-11 14:47:18.123895]  : volume delete clusterfs : FAILED : Some of the peers are down
[2020-02-11 14:47:21.910019]  : volume set all cluster.server-quorum-ratio 0% : SUCCESS
[2020-02-11 14:47:21.973452]  : volume stop clusterfs : FAILED : Volume clusterfs is not in the started state
[2020-02-11 14:47:25.047652]  : volume stop clusterfs : FAILED : Volume clusterfs is not in the started state
[2020-02-11 14:47:28.120029]  : volume stop clusterfs : FAILED : Volume clusterfs is not in the started state
[2020-02-11 14:47:31.191867]  : volume stop clusterfs : FAILED : Volume clusterfs is not in the started state





GLusterd.logs 

Attached with this bug.


Please check

Thanks
AKshay

Comment 1 Sanju 2020-02-24 07:09:00 UTC

I don't see this happening in my environment. Have you executed peer detach on server from two different nodes, which resulted in this?

[root@server4 glusterfs]# gluster pe s
Number of Peers: 3

Hostname: server1
Uuid: 23d8606c-7d10-449a-a269-a8ab1a83d4e5
State: Peer in Cluster (Connected)

Hostname: server2
Uuid: 9af9a8b7-3aeb-49db-9343-1f6b5b741616
State: Peer in Cluster (Connected)

Hostname: server3
Uuid: fdf6192d-0faf-418e-aa7e-569d7ad2c598
State: Peer in Cluster (Connected)
[root@server4 glusterfs]# 

[root@server4 glusterfs]# gluster v stop rep4
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: rep4: success
[root@server4 glusterfs]# gluster v remove-brick rep4 replica 3 server3:/tmp/b1 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success
[root@server4 glusterfs]# gluster v remove-brick rep4 replica 2 server2:/tmp/b1 force
Remove-brick force will not migrate files from the removed bricks, so they will no longer be available on the volume.
Do you want to continue? (y/n) y
volume remove-brick commit force: success
[root@server4 glusterfs]# gluster pe detach server2
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
[root@server4 glusterfs]# gluster pe detach server3
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
[root@server4 glusterfs]# 

Thanks,
Sanju

Comment 2 akshsy 2020-02-24 07:25:38 UTC

No, i have executed on only one node where i want to remove the peer. 




Thanks,
Akshay

Comment 3 Sanju 2020-02-24 07:50:23 UTC

Reading through the code, I don't see any bug.

We display this error message on peer detach if the op_ret or op_errno is GD_FRIEND_STATE_BEFRIENDED, in set_probe_error_str():

      case GF_PROBE_FRIEND_DETACHING:
                snprintf(errstr, len,
                         "Peer is already being "
                         "detached from cluster.\n"
                         "Check peer status by running "
                         "gluster peer status");
                break;

Only in glusterd_deprobe_begin(), we set peerinfo->detaching to true.
        peerinfo->detaching = _gf_true;

In glusterd_deprobe_begin(), before setting peerinfo->detaching to true, if it is already true, we set the op_errno to GD_FRIEND_STATE_BEFRIENDED, and hence the error message.

   if (peerinfo->detaching) {
        ret = -1;
        if (op_errno)
            *op_errno = GF_DEPROBE_FRIEND_DETACHING;
        goto out;
    }

I believe there is no such bug exists in this code path.

You can look at the peer status output or get-state output before issuing the peer detach command to check whether the peer is in the connected state. If you find that the peer is in connected state but you still see the same error, please get back to us with get-state output.

I would like to close the bug with resolution as NOTABUG. Please, let me know what you think.

Thanks,
Sanju

Comment 4 akshsy 2020-02-24 08:14:25 UTC

Hi,

 When i saw peer status, It was disconnected state. So i issued a peer detach command.

We are hitting this issue very often and i am not sure how this is not a bug. As i have attached the logs as you requested. Please let me know for anything i can provide. 

Please help to guide to solve this issue.


Thanks,
Akshay

Comment 5 Sanju 2020-02-24 10:49:04 UTC

I see that even though the the node is in disconnected state, we are able to detach it successfully.

[root@server4 glusterfs]# gluster pe s
Number of Peers: 3

Hostname: server3
Uuid: 3804f636-c89f-438e-94c0-c99f8e30eda9
State: Peer in Cluster (Disconnected)

Hostname: server1
Uuid: 5732298d-2c2a-4128-b8ad-1970094d4070
State: Peer in Cluster (Connected)

Hostname: server2
Uuid: 399973de-208e-4723-b66d-8119fca75d46
State: Peer in Cluster (Connected)
[root@server4 glusterfs]# gluster pe detach server3
All clients mounted through the peer which is getting detached need to be remounted using one of the other active peers in the trusted storage pool to ensure client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y
peer detach: success
[root@server4 glusterfs]#

Please provide below information by running the following steps:
1. run get-state just before running peer detach command
2. run peer detach
3. run get-state just after running peer detach command

Share command-history.log, get-state output from originator and glusterd.log from all the nodes.

Thanks,
Sanju

Comment 6 Sanju 2020-03-02 11:33:20 UTC

Any update?

Comment 7 Sanju 2020-03-04 10:33:36 UTC

Closing the bug based on comment 3. Please feel free to reopen, if you see the issue again with all the requested information.

Note You need to log in before you can comment on or make changes to this bug.