Bug 1320458 - Peer information is not propagated to all the nodes in the cluster, when the peer is probed with its second interface FQDN/IP
Summary: Peer information is not propagated to all the nodes in the cluster, when the ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kaushal
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1314366
TreeView+ depends on / blocked
 
Reported: 2016-03-23 09:33 UTC by Kaushal
Modified: 2016-06-16 14:01 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1314366
Environment:
Last Closed: 2016-06-16 14:01:30 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Kaushal 2016-03-23 09:33:30 UTC
+++ This bug was initially created as a clone of Bug #1314366 +++

Description of problem:
-----------------------
When there are multiple interfaces available in the gluster node and to make use both the interfaces for gluster traffic, the peer probe should be done with all the network identifiers (i.e) IP or FQDN

While doing so, the other names for the particular peer is updated.
The problem here is that the other name of the particular host is not propogated to all the nodes in the cluster, leading to error - "staging failed on the host" - on the other hosts, for any volume related operation, as that node is unaware of the new hostname or IP

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
3.7.8

How reproducible:
-----------------
Always

Steps to Reproduce:
--------------------
1. Create 3 gluster nodes with 2 network interfaces and each of them connected to different (isolated) network
2. Form a gluster cluster with 2 gluster nodes by peer probing with one set of IP ( from network1 )
3. Probe the node2 ( from node1 ) with IP ( from network2 )
4. Check peer status on both the nodes
5. From node1, peer probe node3 with IP from network1
6. From node1, peer probe node3 with IP from network2

Actual results:
---------------
Peer status on node2 doesn't get updated with other name of node3

Expected results:
-----------------
Peer information should be consistent/updated across all the nodes in the cluster

--- Additional comment from SATHEESARAN on 2016-03-03 18:45:17 IST ---

Peer status on 2 nodes
-----------------------
[root@data-node1 ~]# gluster peer status
Number of Peers: 1

Hostname: mgmt-node2.lab.eng.blr.redhat.com
Uuid: 204a51d3-3c2c-4bec-a005-4e974a49aa7e
State: Peer in Cluster (Connected)
Other names:
data-node2.lab.eng.blr.redhat.com
mgmt-node2

[root@data-node2 ~]# gluster peer status
Number of Peers: 1

Hostname: mgmt-node1.lab.eng.blr.redhat.com
Uuid: 5ba71f4c-fe2e-410d-939a-d5fc903a1ec4
State: Peer in Cluster (Connected)
Other names:
data-node1.lab.eng.blr.redhat.com

Peer status on 3 nodes after probing node3 with network1
---------------------------------------------------------
[root@data-node1 ~]# gluster peer status
Number of Peers: 2

Hostname: mgmt-node2.lab.eng.blr.redhat.com
Uuid: 204a51d3-3c2c-4bec-a005-4e974a49aa7e
State: Peer in Cluster (Connected)
Other names:
data-node2.lab.eng.blr.redhat.com
mgmt-node2

Hostname: mgmt-node3.lab.eng.blr.redhat.com
Uuid: 5b4abfd3-9397-4527-a39e-ee3bc00f5710
State: Peer in Cluster (Connected)

[root@data-node2 ~]# gluster peer status
Number of Peers: 2

Hostname: mgmt-node1.lab.eng.blr.redhat.com
Uuid: 5ba71f4c-fe2e-410d-939a-d5fc903a1ec4
State: Peer in Cluster (Connected)
Other names:
data-node1.lab.eng.blr.redhat.com

Hostname: mgmt-node3.lab.eng.blr.redhat.com
Uuid: 5b4abfd3-9397-4527-a39e-ee3bc00f5710
State: Peer in Cluster (Connected)

[root@localhost ~]# gluster peer status
Number of Peers: 2

Hostname: mgmt-node1.lab.eng.blr.redhat.com
Uuid: 5ba71f4c-fe2e-410d-939a-d5fc903a1ec4
State: Peer in Cluster (Connected)
Other names:
data-node1.lab.eng.blr.redhat.com

Hostname: mgmt-node2.lab.eng.blr.redhat.com
Uuid: 204a51d3-3c2c-4bec-a005-4e974a49aa7e
State: Peer in Cluster (Connected)
Other names:
data-node2.lab.eng.blr.redhat.com
mgmt-node2


Peer status on 3 nodes after probing node3 with network2
---------------------------------------------------------
[root@data-node1 ~]# gluster peer probe data-node3.lab.eng.blr.redhat.com
peer probe: success. Host data-node3.lab.eng.blr.redhat.com port 24007 already in peer list

[root@data-node1 ~]# gluster peer status
Number of Peers: 2

Hostname: mgmt-node2.lab.eng.blr.redhat.com
Uuid: 204a51d3-3c2c-4bec-a005-4e974a49aa7e
State: Peer in Cluster (Connected)
Other names:
data-node2.lab.eng.blr.redhat.com
mgmt-node2

Hostname: mgmt-node3.lab.eng.blr.redhat.com
Uuid: 5b4abfd3-9397-4527-a39e-ee3bc00f5710
State: Peer in Cluster (Connected)
Other names:
data-node3.lab.eng.blr.redhat.com  <--- other name updated in node1

[root@data-node2 ~]# gluster pe s
Number of Peers: 2

Hostname: mgmt-node1.lab.eng.blr.redhat.com
Uuid: 5ba71f4c-fe2e-410d-939a-d5fc903a1ec4
State: Peer in Cluster (Connected)
Other names:
data-node1.lab.eng.blr.redhat.com

Hostname: mgmt-node3.lab.eng.blr.redhat.com <---not updated with other name
Uuid: 5b4abfd3-9397-4527-a39e-ee3bc00f5710
State: Peer in Cluster (Connected)

[root@localhost ~]# gluster peer status
Number of Peers: 2

Hostname: mgmt-node1.lab.eng.blr.redhat.com
Uuid: 5ba71f4c-fe2e-410d-939a-d5fc903a1ec4
State: Peer in Cluster (Connected)
Other names:
data-node1.lab.eng.blr.redhat.com

Hostname: mgmt-node2.lab.eng.blr.redhat.com
Uuid: 204a51d3-3c2c-4bec-a005-4e974a49aa7e
State: Peer in Cluster (Connected)
Other names:
data-node2.lab.eng.blr.redhat.com
mgmt-node

[root@data-node1 ~]# gluster volume create testvol data-node3.lab.eng.blr.redhat.com:/rhs/brick1/brc1
volume create: testvol: failed: Staging failed on mgmt-node2.lab.eng.blr.redhat.com. Error: Host data-node3.lab.eng.blr.redhat.com is not in 'Peer in Cluster' state

Error messages in glusterd log in node1 - 
<snip>
[2016-03-03 18:40:38.034436] I [MSGID: 106487] [glusterd-handler.c:1411:__glusterd_handle_cli_list_friends] 0-glusterd: Received cli list req
[2016-03-03 18:45:20.723287] E [MSGID: 106452] [glusterd-utils.c:5735:glusterd_new_brick_validate] 0-management: Host data-node3.lab.eng.blr.redhat.com is not in 'Peer in Cluster' state
[2016-03-03 18:45:20.723323] E [MSGID: 106536] [glusterd-volume-ops.c:1336:glusterd_op_stage_create_volume] 0-management: Host data-node3.lab.eng.blr.redhat.com is not in 'Peer in Cluster' state
[2016-03-03 18:45:20.723338] E [MSGID: 106301] [glusterd-op-sm.c:5241:glusterd_op_ac_stage_op] 0-management: Stage failed on operation 'Volume Create', Status : -1
</snip>

--- Additional comment from Vijay Bellur on 2016-03-23 14:10:18 IST ---

REVIEW: http://review.gluster.org/13817 (glusterd: Add a new event to handle multi-net probes) posted (#1) for review on master by Kaushal M (kaushal)

Comment 1 Vijay Bellur 2016-03-23 09:35:25 UTC
REVIEW: http://review.gluster.org/13817 (glusterd: Add a new event to handle multi-net probes) posted (#2) for review on master by Kaushal M (kaushal)

Comment 2 Vijay Bellur 2016-03-28 05:48:16 UTC
REVIEW: http://review.gluster.org/13817 (glusterd: Add a new event to handle multi-net probes) posted (#3) for review on master by Kaushal M (kaushal)

Comment 3 Mike McCune 2016-03-28 22:51:39 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 4 Vijay Bellur 2016-03-29 04:43:42 UTC
COMMIT: http://review.gluster.org/13817 committed in master by Atin Mukherjee (amukherj) 
------
commit d0cb21b5e3dd90a851e43bcfac9b1b2edf3db9c2
Author: Kaushal M <kaushal>
Date:   Tue Mar 22 16:32:32 2016 +0530

    glusterd: Add a new event to handle multi-net probes
    
    This allows GlusterD to send updates to all other nodes when attaching
    new addresses using multi-net peer probe.
    
    Change-Id: I62846be750ab3721912e7b49656594347ea61723
    BUG: 1320458
    Signed-off-by: Kaushal M <kaushal>
    Reviewed-on: http://review.gluster.org/13817
    Smoke: Gluster Build System <jenkins.com>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.com>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 5 Niels de Vos 2016-06-16 14:01:30 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.