Bug 1215114

Summary: gluster peer probe hangs
Product: [Community] GlusterFS Reporter: alex <free.aaa>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.2CC: amukherj, bugs, kaushal, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-06-17 15:57:32 UTC Type: Bug
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
gfs1 and gfs3 debug logging none

Description alex 2015-04-24 10:49:37 UTC
Created attachment 1018395 [details]
gfs1 and gfs3 debug logging

Initial data:
All nodes have the same:
1) os distribution (proxmox v3.3, kernel 2.6.32-32-pve)
2) disabled selinux
3) default iptables with ALLOW action
4) version of glusterfs
      - ii  glusterfs-client                 3.5.2-1
      - ii  glusterfs-common                 3.5.2-1
      - ii  glusterfs-server                 3.5.2-1
5) correct and working DNS forward and reverse resolution

6) 
192.168.9.53/gfs3 \ 
192.168.9.54/gfs4  - already in cluster
192.168.9.56/gfs6 /

192.168.9.51/gfs1 - want to add to the cluster
192.168.9.52/gfs2 - want to add to the cluster

gfs1 and gfs2 previously were part of another glusterfs cluster, but I stopped all services and removed /var/lib/glusterd directory.

7) cluster contains several production distributed-replicated volumes

Description of problem: 
When I do gluster peer probe gfs1 or gfs2 from any node in the cluster the command hangs and after timeout new peer has status "Probe Sent to Peer":
gfs3#gluster peer status
Number of Peers: 3

Hostname: gfs6
Uuid: 6bd6ee25-e257-4703-b500-330741b90471
State: Peer in Cluster (Connected)

Hostname: gfs4
Uuid: bb1bed20-25bf-43b0-8faa-49f1b5b9ae59
State: Peer in Cluster (Connected)

Hostname: gfs1
Uuid: c5cd8152-c239-474a-977b-9c6b35edd857
State: Probe Sent to Peer (Connected) 

-- gfs3 added peer gfs1 ------------
gfs3#cat /var/lib/glusterd/peers/c5cd8152-c239-474a-977b-9c6b35edd857
uuid=c5cd8152-c239-474a-977b-9c6b35edd857
state=1
hostname1=gfs1

-- but gfs1 did not add gfs3
gfs1#cat /var/lib/glusterd/peers/192.168.9.53
uuid=00000000-0000-0000-0000-000000000000
state=8
hostname1=192.168.9.53


Debug logging showed that gfs3 sent req, gfs1 sent resp, gfs3 received response and after that gfs3 does not do anything until command timeouts (see attach).
So gfs3 does not complete peer handshake with gfs1.

Comment 1 Niels de Vos 2016-06-17 15:57:32 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.