Bug 1451937

Summary: Cannot probe nodes on ubuntu 16.04 and not on centos 7.3
Product: [Community] GlusterFS Reporter: alex <alexei>
Component: glusterdAssignee: bugs <bugs>
Status: CLOSED EOL QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.10CC: amukherj, bugs
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-20 18:28:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
/var/lib/glusterd/cli.log
none
/var/lib/glusterd/glusterd.log
none
/var/lib/glusterfs/glusterd.log - after attempting to add a peer none

Description alex 2017-05-17 23:23:40 UTC
Created attachment 1279831 [details]
/var/lib/glusterd/cli.log

Description of problem:
Trying to setup glusterfs cluster on centos7.3 machines and a separate cluster on ubuntu 16.04 and getting the same failure when trying to probe a node

Version-Release number of selected component (if applicable):
so far tried 3.7.*, 3.8.*, 3.10.*

How reproducible:
Trying to get new cluster(s) deployed but can't get it to work on CentOS 7 and not on Ubuntu 16.04

Steps to Reproduce:
1. Install glusterfs-server using yum or apt repository
2. start "service glusterd restart" or "service glusterfs-server restart" 
3. attempt to probe a peer with "gluster peer probe srv-eu1"

Actual results:
getting an immediate glusterfs server shut-down and this message in cli "Connection failed. Please check if gluster daemon is operational."

file in /var/lib/glusterd/peers/* is create with the following content:
==============
> # cat /var/lib/glusterd/peers/srv-eu1
> uuid=00000000-0000-0000-0000-000000000000
> state=0
> hostname1=srv-eu1
==============
Unless file is deleted from this folder - service will not restart. After the restart - if i try the same steps - it will continue to fail and will never succeed in creating a cluster.


Expected results:
expected to see peer added to the cluster and the cluster start up.

Additional info:

All host nodes act as kubernetes nodes and have been configured using "kargo" utility.

I already specified local hostname as 127.0.0.1 in /etc/hosts file. iptables is fully open for communication between our hosts on all ports and telnet is able to connect on port 24007 without any issues.

Comment 1 alex 2017-05-17 23:24:12 UTC
Created attachment 1279832 [details]
/var/lib/glusterd/glusterd.log

Comment 2 alex 2017-05-17 23:33:47 UTC
Created attachment 1279833 [details]
/var/lib/glusterfs/glusterd.log - after attempting to add a peer

As mentioned in the original ticket - glusterfs-server gets shut down as soon as I try to add a peer. 
If I don't erase a peer file from /var/lib/glusterd/peers/srv-eu1 - then rebooting glusterfs-server will fail and generate attached log messages in /var/log/glusterfs/glusterd.log file

Comment 3 Gaurav Yadav 2017-06-05 06:05:51 UTC
Recently we have seen an issue which is similar to this one, and I am expecting this issue is similar to that one[https://bugzilla.redhat.com/show_bug.cgi?id=1447523] only.

In order to validate the same I need your help. So, could you please send the output of sysctl  net.ipv4.ip_local_reserved_ports command.

Comment 4 alex 2017-06-06 18:46:00 UTC
I tried various options, all kept of failing with the same issue when probing peers:

[root@comwww8 ~]# sysctl  net.ipv4.ip_local_reserved_ports
net.ipv4.ip_local_reserved_ports = 30000-32767

[root@comwww5 ~]# sysctl  net.ipv4.ip_local_reserved_ports
net.ipv4.ip_local_reserved_ports = 24007-24008,30000-32767,49152-49159


I even tried to empty out this property, still didnt' get any luck

I tried it on at least 15 different nodes, with centos - each node has 2 network interfaces, one for local, one for public network and about 8 ubuntu 14.04 and 16.04 nodes.

That thread is very similar, but I guess there's no resolution in that ticket either?

Thank you.

Comment 5 Gaurav Yadav 2017-06-07 05:05:25 UTC
After seeing the output of "sysctl  net.ipv4.ip_local_reserved_ports" i am 100% sure this issue similar to the one I have mentioned.  It is already fixed in mainline.

Here is the review ID for the bug.
https://review.gluster.org/#/c/17359/

You can try below command in order to validate.
sysctl net.ipv4.ip_local_reserved_ports="
"

Comment 6 alex 2017-06-09 16:49:16 UTC
My apologies, but how do i install mainline on centos7 or ubuntu 16 with the package manager? or do i have to build it from source? Are there any instructions that I should follow? currently i have this repo added:

centos-release-gluster310.noarch : Gluster 3.10 (Long Term Stable) packages from the CentOS Storage SIG repository

and seeing this as the latest available version, but I don't think it's the mainline, is it?  :

yum info glusterfs-server.x86_64
Loaded plugins: versionlock
Available Packages
Name        : glusterfs-server
Arch        : x86_64
Version     : 3.10.2
Release     : 1.el7
Size        : 1.3 M
Repo        : centos-gluster310/7/x86_64
Summary     : Distributed file-system server
URL         : http://gluster.readthedocs.io/en/latest/
License     : GPLv2 or LGPLv3+
Description : GlusterFS is a distributed file-system capable of scaling to several
            : petabytes. It aggregates various storage bricks over Infiniband RDMA
            : or TCP/IP interconnect into one large parallel network file
            : system. GlusterFS is one of the most sophisticated file systems in
            : terms of features and extensibility.  It borrows a powerful concept
            : called Translators from GNU Hurd kernel. Much of the code in GlusterFS
            : is in user space and easily manageable.
            :
            : This package provides the glusterfs server daemon.

Comment 7 alex 2017-06-09 16:50:31 UTC
I think I found it here:

http://gluster.readthedocs.io/en/latest/Developer-guide/Compiling-RPMS/

Comment 8 alex 2017-06-28 11:03:53 UTC
Somehow I'm still getting the same behavior even when I build from source using the Compiling RPMS link.

Any suggestions are highly appreciated.

Comment 10 Shyamsundar 2018-06-20 18:28:54 UTC
This bug reported is against a version of Gluster that is no longer maintained
(or has been EOL'd). See https://www.gluster.org/release-schedule/ for the
versions currently maintained.

As a result this bug is being closed.

If the bug persists on a maintained version of gluster or against the mainline
gluster repository, request that it be reopened and the Version field be marked
appropriately.