Bug 1309215

Summary: Mgmt-path-SSL-enabled-cluster ends in disconnected state after multiple 'socket poller: error in polling loop' errors
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: coreAssignee: Mohit Agrawal <moagrawa>
Status: CLOSED WORKSFORME QA Contact: storage-qa-internal <storage-qa-internal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: amukherj, rhs-bugs, storage-qa-internal, ueberall, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-07 04:26:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sweta Anandpara 2016-02-17 09:02:26 UTC
Description of problem:
Had a 2 node cluster. Enabled SSL on management path, by following the steps in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/ch09s03.html. Ran tiering automation suite, consisting of 30 odd test cases. After a successful run of first 6 test cases, the 'gluster pool list' shows one of the nodes as disconnected, resulting in failure of every subsequent test case. Peer probe fails. 

Multiple socket_poller error are seen in the logs.

[2016-02-16 16:51:01.564991] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.566302] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:03.543447] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-217
[2016-02-16 16:51:03.549040] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:51:26.210562] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.285161] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81
[2016-02-16 16:51:26.289947] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop
[2016-02-16 16:52:27.766917] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop


Version-Release number of selected component (if applicable):
glusterfs-3.7.5-19.el7rhgs.x86_64

How reproducible: 2:2

Additional info:

[root@dhcp42-245 ~]# rpm -qa | grep gluster
glusterfs-libs-3.7.5-19.el7rhgs.x86_64
python-gluster-3.7.5-19.el7rhgs.noarch
glusterfs-3.7.5-19.el7rhgs.x86_64
glusterfs-api-3.7.5-19.el7rhgs.x86_64
glusterfs-fuse-3.7.5-19.el7rhgs.x86_64
glusterfs-rdma-3.7.5-19.el7rhgs.x86_64
gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64
glusterfs-geo-replication-3.7.5-19.el7rhgs.x86_64
vdsm-gluster-4.16.30-1.3.el7rhgs.noarch
gluster-nagios-common-0.2.3-1.el7rhgs.noarch
glusterfs-client-xlators-3.7.5-19.el7rhgs.x86_64
glusterfs-cli-3.7.5-19.el7rhgs.x86_64
glusterfs-server-3.7.5-19.el7rhgs.x86_64
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.42.217
Uuid: 1c9025bb-9a31-445d-909d-9f8a866c7934
State: Peer in Cluster (Connected)
[root@dhcp42-245 ~]# 
[root@dhcp42-245 ~]# cd /etc/ssl
[root@dhcp42-245 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:35 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-245 ssl]# 
[root@dhcp42-245 ssl]# 



root@dhcp42-217 ~]# 
[root@dhcp42-217 ~]# cd /etc/ssl
[root@dhcp42-217 ssl]# ll
total 12
lrwxrwxrwx. 1 root root   16 Feb 16 15:36 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca
-rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key
-rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem
[root@dhcp42-217 ssl]# 
[root@dhcp42-217 ssl]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access
[root@dhcp42-217 ssl]# 



[root@client81 mnt]# ll /etc/ssl/
total 12
lrwxrwxrwx. 1 root root   16 Dec 14 17:49 certs -> ../pki/tls/certs
-rw-r--r--. 1 root root 3288 Feb 16 21:33 glusterfs.ca
-rw-r--r--. 1 root root 1679 Feb 16 21:31 glusterfs.key
-rw-r--r--. 1 root root 1090 Feb 16 21:32 glusterfs.pem
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll /var/lib/glusterd/secure-access 
-rw-r--r--. 1 root root 0 Feb 16 19:55 /var/lib/glusterd/secure-access
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# ll
total 32
drwxr-xr-x. 4 root root 32768 Feb 17 14:26 glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# df -k glusterfs/
Filesystem            1K-blocks    Used Available Use% Mounted on
10.70.42.245:/testvol 104857600 2433792 102423808   3% /mnt/glusterfs
[root@client81 mnt]# 
[root@client81 mnt]# 
[root@client81 mnt]# 


[2016-02-16 16:51:01.337169] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245
[2016-02-16 16:51:01.398896] I [MSGID: 106143] [glusterd-pmap.c:229:pmap_registry_bind] 0-pmap: adding brick /bricks/brick0/test on port 49152
[2016-02-16 16:51:01.400743] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600
[2016-02-16 16:51:01.400972] I [socket.c:3931:socket_init] 0-management: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.401087] E [socket.c:4009:socket_init] 0-management: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.415991] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600
[2016-02-16 16:51:01.416087] I [socket.c:3931:socket_init] 0-snapd: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416183] E [socket.c:4009:socket_init] 0-snapd: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled
[2016-02-16 16:51:01.416844] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600
[2016-02-16 16:51:01.416922] I [socket.c:3931:socket_init] 0-nfs: SSL support for glusterd is ENABLED
[2016-02-16 16:51:01.416999] E [socket.c:4009:socket_init] 0-nfs: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled


The log files will be updated in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/

Comment 2 Kaushal 2016-02-23 08:56:52 UTC
I suspect this to be similar to the other RPC connections we're seeing in GlusterD. I'll go through the logs and update if I find anything different.

Comment 6 Amar Tumballi 2018-02-07 04:26:27 UTC
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.