Bug 1309215 - Mgmt-path-SSL-enabled-cluster ends in disconnected state after multiple 'socket poller: error in polling loop' errors
Summary: Mgmt-path-SSL-enabled-cluster ends in disconnected state after multiple 'sock...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: core
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Mohit Agrawal
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-02-17 09:02 UTC by Sweta Anandpara
Modified: 2018-02-07 04:26 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-07 04:26:27 UTC
Embargoed:


Attachments (Terms of Use)

Description Sweta Anandpara 2016-02-17 09:02:26 UTC
Description of problem:
Had a 2 node cluster. Enabled SSL on management path, by following the steps in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/ch09s03.html. Ran tiering automation suite, consisting of 30 odd test cases. After a successful run of first 6 test cases, the 'gluster pool list' shows one of the nodes as disconnected, resulting in failure of every subsequent test case. Peer probe fails. 

Multiple socket_poller error are seen in the logs.

[2016-02-16 16:51:01.564991] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.566302] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:03.543447] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-217
[2016-02-16 16:51:03.549040] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:26.210562] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.285161] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.289947] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:52:27.766917] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop


Version-Release number of selected component (if applicable):
glusterfs-3.7.5-19.el7rhgs.x86_64

How reproducible: 2:2

Additional info:

[root@dhcp42-245 ~]# rpm -qa | grep gluster
glusterfs-libs-3.7.5-19.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.5-19.el7rhgs.x86_64
glusterfs-api-3.7.5-19.el7rhgs.x86_64
glusterfs-fuse-3.7.5-19.el7rhgs.x86_64
glusterfs-rdma-3.7.5-19.el7rhgs.x86_64
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-19.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-client-xlators-3.7.5-19.el7rhgs.x86_64
glusterfs-cli-3.7.5-19.el7rhgs.x86_64
glusterfs-server-3.7.5-19.el7rhgs.x86_64
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.42.217
Uuid: 1c9025bb-9a31-445d-909d-9f8a866c7934
State: Peer in Cluster (Connected)
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# cd /etc/ssl
[root@dhcp42-245 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:35 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# 



root@dhcp42-217 ~]# 
[root@dhcp42-217 ~]# cd /etc/ssl
[root@dhcp42-217 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:36 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-217 ssl]# 
[root@dhcp42-217 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-217 ssl]# 



[root@client81 mnt]# ll /etc/ssl/
total 12
lrwxrwxrwx. 1 root root   16 Dec 14 17:49 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 21:33 glusterfs.ca
-rw-r--r--. 1 root root 1679 Feb 16 21:31 glusterfs.key
-rw-r--r--. 1 root root 1090 Feb 16 21:32 glusterfs.pem
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 19:55 /var/lib/glusterd/secure-access
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll
total 32
drwxr-xr-x. 4 root root 32768 Feb 17 14:26 glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# df -k glusterfs/
Filesystem            1K-blocks    Used Available Use% Mounted on
10.70.42.245:/testvol 104857600 2433792 102423808   3% /mnt/glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# 


[2016-02-16 16:51:01.337169] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.398896] I [MSGID: 106143] [glusterd-pmap.c:229:pmap_registry_bind] 0-pmap: adding brick /bricks/brick0/test on port 49152
[2016-02-16 16:51:01.400743] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-02-16 16:51:01.400972] I [socket.c:3931:socket_init] 0-management: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.401087] E [socket.c:4009:socket_init] 0-management: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.415991] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2016-02-16 16:51:01.416087] I [socket.c:3931:socket_init] 0-snapd: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416183] E [socket.c:4009:socket_init] 0-snapd: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.416844] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2016-02-16 16:51:01.416922] I [socket.c:3931:socket_init] 0-nfs: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416999] E [socket.c:4009:socket_init] 0-nfs: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled


The log files will be updated in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

Comment 2 Kaushal 2016-02-23 08:56:52 UTC
I suspect this to be similar to the other RPC connections we're seeing in GlusterD. I'll go through the logs and update if I find anything different.

Comment 6 Amar Tumballi 2018-02-07 04:26:27 UTC
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.