1065632 – glusterd: glusterd peer status failed with the connection failed error evenif glusterd is running

Bug 1065632 - glusterd: glusterd peer status failed with the connection failed error evenif glusterd is running

Summary: glusterd: glusterd peer status failed with the connection failed error evenif...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	pre-release
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-02-15 09:34 UTC by ssamanta
Modified:	2017-06-06 17:36 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-22 15:40:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description ssamanta 2014-02-15 09:34:58 UTC

Description of problem:
glusterd peer status failed with the "Connection failed. Please check if gluster daemon is operational" even if the glusterd is running on server nodes.


Version-Release number of selected component (if applicable):
[root@desktop81 yum.repos.d]# rpm -qa | grep glusterfs
glusterfs-server-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-fuse-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-devel-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-api-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-cli-3.5.0-0.5.beta3.fc20.x86_64
glusterfs-libs-3.5.0-0.5.beta3.fc20.x86_64
[root@desktop81 yum.repos.d]


How reproducible:

Steps to Reproduce:
1. Create one server node(i.e first node) and create 2 bricks(XFS) in that server and mount it 
2. Flush IP table rules and set the SELinux to permissive mode
3. Start the glustered in first node
4. Create another server node(i.e second node) and 2 bricks are in(EXt4) in that server and mount it
5. Flush IP table rules and set the SELinux to permissive mode
6. Start the glustered in second node
7. Join the second node from first node using gluster peer probe command and the peer probe succeeded.
8. Gluster peer status command on second node errors out with the following error where as the glusterd is running in that node.
[root@desktop81 yum.repos.d]# pidof glusterd
2885
[root@desktop81 yum.repos.d]# gluster peer status
Connection failed. Please check if gluster daemon is operational.

 
Actual results:
gluster peer status command failed with the "Connection failed. Please check if gluster daemon is operational"

Expected results:
 gluster peer status command should have succeeded without any issues. After deleting the "/var/run/glusterd.socket" and restarting the glusterd "gluster peer status" command was successful.(See below for more information)

Additional info:

Attached is the glusterd and CLI log for more information.

[root@desktop81 yum.repos.d]# ifconfig -a
em1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.81  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::224:e8ff:fe4a:f41a  prefixlen 64  scopeid 0x20<link>
        ether 00:24:e8:4a:f4:1a  txqueuelen 1000  (Ethernet)
        RX packets 33496  bytes 14094772 (13.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6135  bytes 559520 (546.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 6  bytes 504 (504.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6  bytes 504 (504.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p4p1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 00:10:18:59:b4:c7  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 17  
[root@desktop81 yum.repos.d]# service glustered start
Redirecting to /bin/systemctl start  glustered.service
Failed to issue method call: Unit glustered.service failed to load: No such file or directory.
[root@desktop81 yum.repos.d]# service glusterd start
Redirecting to /bin/systemctl start  glusterd.service

[root@desktop81 yum.repos.d]# iptables -F

[root@desktop81 yum.repos.d]# iptables -F
[root@desktop81 yum.repos.d]# mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=2010604k,nr_inodes=502651,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
/dev/mapper/fedora_desktop81-root on / type ext4 (rw,relatime,seclabel,data=ordered)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=35,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
tmpfs on /tmp type tmpfs (rw,seclabel)
/dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/fedora_desktop81-brick1 on /brick1 type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/fedora_desktop81-brick2 on /brick2 type ext4 (rw,relatime,seclabel,data=ordered)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=1000,group_id=1000)
[root@desktop81 yum.repos.d]# gluster volume create replica 2 192.168.0.8:/brick1
Connection failed. Please check if gluster daemon is operational.

[root@desktop81 yum.repos.d]# gluster peer probe 192.168.0.8
Connection failed. Please check if gluster daemon is operational.
[root@desktop81 yum.repos.d]# pidof glusterd
2885
[root@desktop81 yum.repos.d]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[root@desktop81 yum.repos.d]# getenforce
Permissive
[root@desktop81 yum.repos.d]# 
[root@desktop81 yum.repos.d]# 
[root@desktop81 yum.repos.d]# less /var/log/glusterfs/etc-glusterfs-glusterd.vol.log 
[root@desktop81 yum.repos.d]# ls -l /var/run/gluster
gluster/         glusterd.pid     glusterd.socket  
[root@desktop81 yum.repos.d]# ls -l /var/run/glusterd.socket 
srwxr-xr-x. 1 root root 0 Feb 15 13:18 /var/run/glusterd.socket
[root@desktop81 yum.repos.d]# rm -rf /var/run/glusterd.socket
[root@desktop81 yum.repos.d]# service glusterd start
Redirecting to /bin/systemctl start  glusterd.service
[root@desktop81 yum.repos.d]# pidof glusterd
2885
[root@desktop81 yum.repos.d]# gluster peer status
Connection failed. Please check if gluster daemon is operational.

[root@desktop81 yum.repos.d]# netstat -nlp | grep glust
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      2885/glusterd       
[root@desktop81 yum.repos.d]# netstat -lp | grep glust
tcp        0      0 0.0.0.0:24007           0.0.0.0:*               LISTEN      2885/glusterd       
[root@desktop81 yum.repos.d]#

[root@desktop81 yum.repos.d]# /usr/sbin/gluster volume info
Connection failed. Please check if gluster daemon is operational.
[root@desktop81 yum.repos.d]# 
[root@desktop81 yum.repos.d]# !ps
ps aux | grep glusterd
root      2885  0.0  0.3 346744 13212 ?        Ssl  13:25   0:00 /usr/sbin/glusterd -p /run/glusterd.pid
root      3209  0.0  0.0 112664   972 pts/0    S+   13:38   0:00 grep --color=auto glusterd
[root@desktop81 yum.repos.d]# ls -l /proc/2885/fd
total 0
lr-x------. 1 root root 64 Feb 15 13:39 0 -> /dev/null
l-wx------. 1 root root 64 Feb 15 13:39 1 -> /dev/null
lrwx------. 1 root root 64 Feb 15 13:39 10 -> socket:[33884]
l-wx------. 1 root root 64 Feb 15 13:39 2 -> /dev/null
lrwx------. 1 root root 64 Feb 15 13:39 3 -> anon_inode:[eventpoll]
l-wx------. 1 root root 64 Feb 15 13:39 4 -> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
lrwx------. 1 root root 64 Feb 15 13:39 5 -> /run/glusterd.pid
lrwx------. 1 root root 64 Feb 15 13:39 6 -> socket:[33874]
lr-x------. 1 root root 64 Feb 15 13:39 7 -> /dev/urandom
l-wx------. 1 root root 64 Feb 15 13:39 8 -> /var/log/glusterfs/.cmd_log_history
lrwx------. 1 root root 64 Feb 15 13:39 9 -> socket:[33836]
[root@desktop81 yum.repos.d]# less /var/log/glusterfs/cli.log 
[root@desktop81 yum.repos.d]# ls -l /var/run/gluster
gluster/         glusterd.pid     glusterd.socket  
[root@desktop81 yum.repos.d]# ls -l /var/run/glusterd.socket 
srwxr-xr-x. 1 root root 0 Feb 15 13:18 /var/run/glusterd.socket
[root@desktop81 yum.repos.d]# lsof /var/run/glusterd.socket 
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
      Output information may be incomplete.
[root@desktop81 yum.repos.d]#

Work around
=============
[root@desktop81 yum.repos.d]# service glusterd restart
Redirecting to /bin/systemctl restart  glusterd.service
[root@desktop81 yum.repos.d]# gluster peer status
Number of Peers: 1

Hostname: 192.168.0.8
Uuid: 8bc2bf6a-5b1a-4e41-a0a1-90f89de293b8
State: Peer in Cluster (Connected)
[root@desktop81 yum.repos.d]#

Comment 3 Richard 2014-02-16 07:33:06 UTC

I've had this happen also. I've gone back to glusterfs-3.5.0-0.4.beta2 as the peer probe function works in that release.

Comment 4 Gilles Dubreuil 2014-07-16 02:39:05 UTC

Had same issue.

Using glusterfs-server-3.5.1-1.el7.x86_64.

After disabling SELinux (setenforce 0) it works fine.

I hope it helps.

Comment 5 Richard 2014-07-16 07:54:25 UTC

I found that the use of Quotas was the cause of my problems. 

Are you able to check with quotas enabled?

i.e. gluster volume quota my_volume enable

See this bug report for my build process:
https://bugzilla.redhat.com/show_bug.cgi?id=1113460

Thanks,

Rich

Comment 6 Richard 2014-07-29 12:46:15 UTC

Still broken in glusterfs-351beta2-epel.repo
Thanks,
Rich

Comment 7 Michael J. Chudobiak 2015-03-06 13:12:03 UTC

Same problem; deleting /var/run/glusterd.socket and restarting fixed it.


[root@karsh brick1]# gluster peer probe xena
Connection failed. Please check if gluster daemon is operational.

[root@karsh brick1]# systemctl stop glusterd

[root@karsh brick1]# rm /var/run/glusterd.socket
rm: remove socket ‘/var/run/glusterd.socket’? y

[root@karsh brick1]# systemctl start glusterd

[root@karsh brick1]# gluster peer probe xena
peer probe: success. 

[root@karsh brick1]# rpm -qa | grep gluster
glusterfs-api-3.5.3-1.fc20.x86_64
glusterfs-server-3.5.3-1.fc20.x86_64
glusterfs-libs-3.5.3-1.fc20.x86_64
glusterfs-fuse-3.5.3-1.fc20.x86_64
glusterfs-cli-3.5.3-1.fc20.x86_64
glusterfs-3.5.3-1.fc20.x86_64

Comment 10 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC

pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Comment 11 Cesar 2017-06-06 17:36:18 UTC

(In reply to Michael J. Chudobiak from comment #7)
> Same problem; deleting /var/run/glusterd.socket and restarting fixed it.
> 
> 
> [root@karsh brick1]# gluster peer probe xena
> Connection failed. Please check if gluster daemon is operational.
> 
> [root@karsh brick1]# systemctl stop glusterd
> 
> [root@karsh brick1]# rm /var/run/glusterd.socket
> rm: remove socket ‘/var/run/glusterd.socket’? y
> 
> [root@karsh brick1]# systemctl start glusterd
> 
> [root@karsh brick1]# gluster peer probe xena
> peer probe: success. 
> 
> [root@karsh brick1]# rpm -qa | grep gluster
> glusterfs-api-3.5.3-1.fc20.x86_64
> glusterfs-server-3.5.3-1.fc20.x86_64
> glusterfs-libs-3.5.3-1.fc20.x86_64
> glusterfs-fuse-3.5.3-1.fc20.x86_64
> glusterfs-cli-3.5.3-1.fc20.x86_64
> glusterfs-3.5.3-1.fc20.x86_64


Worked for me!

Thanks a lot

Note You need to log in before you can comment on or make changes to this bug.