Bug 1309215 - Mgmt-path-SSL-enabled-cluster ends in disconnected state after multiple 'socket poller: error in polling loop' errors
Mgmt-path-SSL-enabled-cluster ends in disconnected state after multiple 'sock...
Status: NEW
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: core (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity medium
: ---
: ---
Assigned To: Mohit Agrawal
storage-qa-internal@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-02-17 04:02 EST by Sweta Anandpara
Modified: 2017-11-08 13:10 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Sweta Anandpara 2016-02-17 04:02:26 EST
Description of problem:
Had a 2 node cluster. Enabled SSL on management path, by following the steps in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/ch09s03.html. Ran tiering automation suite, consisting of 30 odd test cases. After a successful run of first 6 test cases, the 'gluster pool list' shows one of the nodes as disconnected, resulting in failure of every subsequent test case. Peer probe fails. 

Multiple socket_poller error are seen in the logs.

[2016-02-16 16:51:01.564991] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.566302] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:03.543447] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-217
[2016-02-16 16:51:03.549040] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:26.210562] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.285161] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.289947] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:52:27.766917] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop


Version-Release number of selected component (if applicable):
glusterfs-3.7.5-19.el7rhgs.x86_64

How reproducible: 2:2

Additional info:

[root@dhcp42-245 ~]# rpm -qa | grep gluster
glusterfs-libs-3.7.5-19.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.5-19.el7rhgs.x86_64
glusterfs-api-3.7.5-19.el7rhgs.x86_64
glusterfs-fuse-3.7.5-19.el7rhgs.x86_64
glusterfs-rdma-3.7.5-19.el7rhgs.x86_64
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-19.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-client-xlators-3.7.5-19.el7rhgs.x86_64
glusterfs-cli-3.7.5-19.el7rhgs.x86_64
glusterfs-server-3.7.5-19.el7rhgs.x86_64
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.42.217
Uuid: 1c9025bb-9a31-445d-909d-9f8a866c7934
State: Peer in Cluster (Connected)
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# cd /etc/ssl
[root@dhcp42-245 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:35 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# 



root@dhcp42-217 ~]# 
[root@dhcp42-217 ~]# cd /etc/ssl
[root@dhcp42-217 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:36 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-217 ssl]# 
[root@dhcp42-217 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-217 ssl]# 



[root@client81 mnt]# ll /etc/ssl/
total 12
lrwxrwxrwx. 1 root root   16 Dec 14 17:49 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 21:33 glusterfs.ca
-rw-r--r--. 1 root root 1679 Feb 16 21:31 glusterfs.key
-rw-r--r--. 1 root root 1090 Feb 16 21:32 glusterfs.pem
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 19:55 /var/lib/glusterd/secure-access
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll
total 32
drwxr-xr-x. 4 root root 32768 Feb 17 14:26 glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# df -k glusterfs/
Filesystem            1K-blocks    Used Available Use% Mounted on
10.70.42.245:/testvol 104857600 2433792 102423808   3% /mnt/glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# 


[2016-02-16 16:51:01.337169] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.398896] I [MSGID: 106143] [glusterd-pmap.c:229:pmap_registry_bind] 0-pmap: adding brick /bricks/brick0/test on port 49152
[2016-02-16 16:51:01.400743] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-02-16 16:51:01.400972] I [socket.c:3931:socket_init] 0-management: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.401087] E [socket.c:4009:socket_init] 0-management: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.415991] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2016-02-16 16:51:01.416087] I [socket.c:3931:socket_init] 0-snapd: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416183] E [socket.c:4009:socket_init] 0-snapd: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.416844] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2016-02-16 16:51:01.416922] I [socket.c:3931:socket_init] 0-nfs: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416999] E [socket.c:4009:socket_init] 0-nfs: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled


The log files will be updated in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/
Comment 2 Kaushal 2016-02-23 03:56:52 EST
I suspect this to be similar to the other RPC connections we're seeing in GlusterD. I'll go through the logs and update if I find anything different.

Note You need to log in before you can comment on or make changes to this bug.