Bug 1394769

Summary: Server Node not able to connect after stopping and starting the network port
Product: Red Hat Gluster Storage Reporter: Karan Sandha <ksandha>
Component: glusterdAssignee: Gaurav Yadav <gyadav>
Status: CLOSED NOTABUG QA Contact: Byreddy <bsrirama>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: gyadav, ksandha, rhs-bugs, sasundar, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-03 03:53:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Karan Sandha 2016-11-14 12:26:01 UTC
Description of problem:
glusterd not able to connect the servers if we bring down and then bring up a network port of a server. 

Version-Release number of selected component (if applicable):
[root@dhcp47-143 ~]# gluster --version
glusterfs 3.8.4 built on Oct 24 2016 11:13:47
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.


How reproducible:
2/2
log placed at rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

Steps to Reproduce:
1. Create a gluster cluster of three servers by peer probing them 
2. Now bring one network port down from server 3 :- ifdown <port name> eg: eth0
3. now check for gluster peer status on server 1
[root@dhcp47-141 home]# gluster peer status
Number of Peers: 2

Hostname: dhcp47-143.lab.eng.blr.redhat.com
Uuid: 526679d1-7035-4cdd-8c9f-d27a060d7022
State: Peer in Cluster (Connected)

Hostname: dhcp47-144.lab.eng.blr.redhat.com
Uuid: bd889f19-a6a0-4487-8093-8e427c7297d5
State: Peer in Cluster (Disconnected)

4. now bring the port UP:- ifup <portname> eg. eth0
5. now check for peer status on all the servers.
server 1
[root@dhcp47-141 home]# gluster peer status
Number of Peers: 2

Hostname: dhcp47-143.lab.eng.blr.redhat.com
Uuid: 526679d1-7035-4cdd-8c9f-d27a060d7022
State: Peer in Cluster (Connected)

Hostname: dhcp47-144.lab.eng.blr.redhat.com
Uuid: bd889f19-a6a0-4487-8093-8e427c7297d5
State: Peer in Cluster (Disconnected)

Server3:-
[root@dhcp47-144 home]# gluster peer s
Number of Peers: 2

Hostname: dhcp47-141.lab.eng.blr.redhat.com
Uuid: f5cf48a8-02b0-49de-b881-21f91aeae829
State: Peer in Cluster (Connected)

Hostname: dhcp47-143.lab.eng.blr.redhat.com
Uuid: 526679d1-7035-4cdd-8c9f-d27a060d7022
State: Peer in Cluster (Connected)

Actual results:
Two servers are in disconnected state and third showing connected to other two

Expected results:
All servers should be connected with each other.

Additional info:

Comment 2 Atin Mukherjee 2016-11-14 15:55:47 UTC
Initially I suspected it to be a friend-sm issue but the reason the faulty glusterd is not able to connect to other glusterds is because of :

[2016-11-14 15:53:55.710314] E [MSGID: 101075] [common-utils.c:308:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known)

Not sure why getaddrinfo is failing here.

Comment 3 Byreddy 2016-11-15 04:19:44 UTC
Tried 8 times in my setup the same scenario, it's not reproducing single time, all the times peer status showed correctly as per the expectation.

Comment 4 Atin Mukherjee 2016-11-15 04:31:49 UTC
Now with comment 3, it would be worth to see how this set up has been configured and what's the difference between this and Byreddy's set up.

Comment 10 Gaurav Yadav 2017-08-03 03:53:48 UTC
I tried reproducing issue in my setup with the same scenario after multiple trials. I am seeing all the times peer status correctly.

As it is not reproducing issue I am closing this issue.