Description of problem: Had a 2 node cluster. Enabled SSL on management path, by following the steps in https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Administration_Guide/ch09s03.html. Ran tiering automation suite, consisting of 30 odd test cases. After a successful run of first 6 test cases, the 'gluster pool list' shows one of the nodes as disconnected, resulting in failure of every subsequent test case. Peer probe fails. Multiple socket_poller error are seen in the logs. [2016-02-16 16:51:01.564991] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245 [2016-02-16 16:51:01.566302] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop [2016-02-16 16:51:03.543447] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-217 [2016-02-16 16:51:03.549040] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop [2016-02-16 16:51:26.210562] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81 [2016-02-16 16:51:26.285161] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = client81 [2016-02-16 16:51:26.289947] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop [2016-02-16 16:52:27.766917] E [socket.c:2501:socket_poller] 0-socket.management: error in polling loop Version-Release number of selected component (if applicable): glusterfs-3.7.5-19.el7rhgs.x86_64 How reproducible: 2:2 Additional info: [root@dhcp42-245 ~]# rpm -qa | grep gluster glusterfs-libs-3.7.5-19.el7rhgs.x86_64 python-gluster-3.7.5-19.el7rhgs.noarch glusterfs-3.7.5-19.el7rhgs.x86_64 glusterfs-api-3.7.5-19.el7rhgs.x86_64 glusterfs-fuse-3.7.5-19.el7rhgs.x86_64 glusterfs-rdma-3.7.5-19.el7rhgs.x86_64 gluster-nagios-addons-0.2.5-1.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-19.el7rhgs.x86_64 vdsm-gluster-4.16.30-1.3.el7rhgs.noarch gluster-nagios-common-0.2.3-1.el7rhgs.noarch glusterfs-client-xlators-3.7.5-19.el7rhgs.x86_64 glusterfs-cli-3.7.5-19.el7rhgs.x86_64 glusterfs-server-3.7.5-19.el7rhgs.x86_64 [root@dhcp42-245 ~]# [root@dhcp42-245 ~]# [root@dhcp42-245 ~]# gluster peer status Number of Peers: 1 Hostname: 10.70.42.217 Uuid: 1c9025bb-9a31-445d-909d-9f8a866c7934 State: Peer in Cluster (Connected) [root@dhcp42-245 ~]# [root@dhcp42-245 ~]# cd /etc/ssl [root@dhcp42-245 ssl]# ll total 12 lrwxrwxrwx. 1 root root 16 Feb 16 15:35 certs -> ../pki/tls/certs -rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca -rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key -rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem [root@dhcp42-245 ssl]# [root@dhcp42-245 ssl]# ll /var/lib/glusterd/secure-access -rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access [root@dhcp42-245 ssl]# [root@dhcp42-245 ssl]# root@dhcp42-217 ~]# [root@dhcp42-217 ~]# cd /etc/ssl [root@dhcp42-217 ssl]# ll total 12 lrwxrwxrwx. 1 root root 16 Feb 16 15:36 certs -> ../pki/tls/certs -rw-r--r--. 1 root root 3288 Feb 16 16:29 glusterfs.ca -rw-r--r--. 1 root root 1675 Feb 16 16:16 glusterfs.key -rw-r--r--. 1 root root 1099 Feb 16 16:17 glusterfs.pem [root@dhcp42-217 ssl]# [root@dhcp42-217 ssl]# ll /var/lib/glusterd/secure-access -rw-r--r--. 1 root root 0 Feb 16 16:21 /var/lib/glusterd/secure-access [root@dhcp42-217 ssl]# [root@client81 mnt]# ll /etc/ssl/ total 12 lrwxrwxrwx. 1 root root 16 Dec 14 17:49 certs -> ../pki/tls/certs -rw-r--r--. 1 root root 3288 Feb 16 21:33 glusterfs.ca -rw-r--r--. 1 root root 1679 Feb 16 21:31 glusterfs.key -rw-r--r--. 1 root root 1090 Feb 16 21:32 glusterfs.pem [root@client81 mnt]# [root@client81 mnt]# [root@client81 mnt]# ll /var/lib/glusterd/secure-access -rw-r--r--. 1 root root 0 Feb 16 19:55 /var/lib/glusterd/secure-access [root@client81 mnt]# [root@client81 mnt]# [root@client81 mnt]# ll total 32 drwxr-xr-x. 4 root root 32768 Feb 17 14:26 glusterfs [root@client81 mnt]# [root@client81 mnt]# [root@client81 mnt]# df -k glusterfs/ Filesystem 1K-blocks Used Available Use% Mounted on 10.70.42.245:/testvol 104857600 2433792 102423808 3% /mnt/glusterfs [root@client81 mnt]# [root@client81 mnt]# [root@client81 mnt]# [2016-02-16 16:51:01.337169] I [socket.c:347:ssl_setup_connection] 0-socket.management: peer CN = dhcp42-245 [2016-02-16 16:51:01.398896] I [MSGID: 106143] [glusterd-pmap.c:229:pmap_registry_bind] 0-pmap: adding brick /bricks/brick0/test on port 49152 [2016-02-16 16:51:01.400743] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2016-02-16 16:51:01.400972] I [socket.c:3931:socket_init] 0-management: SSL support for glusterd is ENABLED [2016-02-16 16:51:01.401087] E [socket.c:4009:socket_init] 0-management: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled [2016-02-16 16:51:01.415991] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2016-02-16 16:51:01.416087] I [socket.c:3931:socket_init] 0-snapd: SSL support for glusterd is ENABLED [2016-02-16 16:51:01.416183] E [socket.c:4009:socket_init] 0-snapd: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled [2016-02-16 16:51:01.416844] I [rpc-clnt.c:986:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2016-02-16 16:51:01.416922] I [socket.c:3931:socket_init] 0-nfs: SSL support for glusterd is ENABLED [2016-02-16 16:51:01.416999] E [socket.c:4009:socket_init] 0-nfs: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled The log files will be updated in http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/<bugnumber>/
I suspect this to be similar to the other RPC connections we're seeing in GlusterD. I'll go through the logs and update if I find anything different.
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+). If the bug is still relevant and is being reproduced, feel free to reopen the bug.