Bug 1218167 - [GlusterFS 3.6.3]: Brick crashed after setting up SSL/TLS in I/O access path with error: "E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop"
Summary: [GlusterFS 3.6.3]: Brick crashed after setting up SSL/TLS in I/O access path ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: 3.6.3
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Jeff Darcy
QA Contact:
URL:
Whiteboard:
Depends On: 1211643
Blocks: glusterfs-3.6.4 1222908
TreeView+ depends on / blocked
 
Reported: 2015-05-04 10:50 UTC by ssamanta
Modified: 2016-02-04 15:27 UTC (History)
4 users (show)

Fixed In Version: glusterfs-v3.6.4
Clone Of:
: 1222908 (view as bug list)
Environment:
Last Closed: 2016-02-04 15:27:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description ssamanta 2015-05-04 10:50:36 UTC
Description of problem:
Brick got crashed on a node with the following error in the brick logs: 
"E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop"

Version-Release number of selected component (if applicable):
GlusterFS3.6.3

[root@gqas006 ssl]# rpm -qa | grep gluster
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop-0.1-122.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_selfheal-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_bigtop-0.2.1-24.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_user_mapred_job-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_file_dir_permissions-0.1-9.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_home_dir_listing-0.1-5.noarch
glusterfs-libs-3.6.3-1.fc20.x86_64
glusterfs-geo-replication-3.6.3-1.fc20.x86_64
glusterfs-resource-agents-3.5.3-1.fc20.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_default_block_size-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiuser_support-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multiple_volumes-0.1-18.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hive-0.1-12.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gridmix3-0.1-2.noarch
glusterfs-devel-3.6.3-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_common-0.2-119.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_gluster-0.2-78.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-glusterd_tests-0.2-1.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop-0.1-7.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_special_char_in_path-0.1-2.noarch
glusterfs-debuginfo-3.6.2-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio_io_exception-0.1-8.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_ldap-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_fileappend-0.1-5.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_missing_dirs_create-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_sqoop-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_quota-0.1-6.noarch
glusterfs-3.6.3-1.fc20.x86_64
glusterfs-cli-3.6.3-1.fc20.x86_64
glusterfs-rdma-3.6.3-1.fc20.x86_64
glusterfs-hadoop-2.1.2-2.fc20.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_hcfs_testcli-0.2-7.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_dfsio-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_multifilewc_null_pointer_exception-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_gluster_quota_selfheal-0.2-11.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_append_to_file-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hbase-0.1-4.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_shim_access_error_messages-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_hadoop_mapreduce-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_mahout-0.1-6.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_erroneous_multivolume_filepaths-0.1-4.noarch
glusterfs-fuse-3.6.3-1.fc20.x86_64
glusterfs-server-3.6.3-1.fc20.x86_64
glusterfs-hadoop-javadoc-2.1.2-2.fc20.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_groovy_sync-0.1-24.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_rhs_georep-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_setting_working_directory-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_junit_shim-0.1-13.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-setup_hadoop_security-0.0.1-11.noarch
glusterfs-extra-xlators-3.6.3-1.fc20.x86_64
glusterfs-hadoop-distribution-glusterfs-hadoop-test_brick_sorted_order_of_filenames-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_fs_counters-0.1-11.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_generate_gridmix2_data-0.1-3.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_selinux_persistently_disabled-0.1-2.noarch
glusterfs-hadoop-distribution-glusterfs-hadoop-test_bigtop_pig-0.1-9.noarch
glusterfs-api-3.6.3-1.fc20.x86_64
glusterfs-api-devel-3.6.3-1.fc20.x86_64
[root@gqas006 ssl]# 

How reproducible:
I am not certain which caused the crash. I will update more details if I reproduce it again.

Steps to Reproduce:
1. Create a 2*2 dist-rep volume and start it
2. Create a private+public file for each server and client nodes
3. Concatenate the ca file and copy to server and client nodes. Set the necessary volume option for SSL/TLS to work properly.
https://github.com/gluster/glusterfs/blob/master/doc/admin-guide/en-US/markdown/admin_ssl.md
4. Mount from the client

Actual results:
There is a brick crash for some nodes.

Expected results:
Bricks should not crash.

Additional info:

[root@remote-gluster-server ~]# gluster volume status
Status of volume: gluster-native-volume-1G-1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.156.12:/rhs/brick1/newvol7			49159	Y	22418
Brick 10.16.156.15:/rhs/brick1/newvol7			49159	Y	6059
Brick 10.16.156.24:/rhs/brick1/newvol7			49170	Y	24581
Brick 10.16.156.24:/rhs/brick2/newvol7			49171	Y	24605
NFS Server on localhost					2049	Y	24043
Self-heal Daemon on localhost				N/A	Y	24050
NFS Server on gqas006.sbu.lab.eng.bos.redhat.com	2049	Y	7212
Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com	N/A	Y	7219
NFS Server on gqas009.sbu.lab.eng.bos.redhat.com	2049	Y	26026
Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com	N/A	Y	26033
 
Task Status of Volume gluster-native-volume-1G-1
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: gluster-native-volume-3G-1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.16.156.12:/rhs/brick1/newvol8			49160	Y	13626
Brick 10.16.156.15:/rhs/brick1/newvol8			N/A	N	31104  ---> Brick crashed
Brick 10.16.156.24:/rhs/brick1/newvol8			N/A	N	14854
Brick 10.16.156.24:/rhs/brick2/newvol8			49173	Y	14865
NFS Server on localhost					2049	Y	24043
Self-heal Daemon on localhost				N/A	Y	24050
NFS Server on gqas006.sbu.lab.eng.bos.redhat.com	2049	Y	7212
Self-heal Daemon on gqas006.sbu.lab.eng.bos.redhat.com	N/A	Y	7219
NFS Server on gqas009.sbu.lab.eng.bos.redhat.com	2049	Y	26026
Self-heal Daemon on gqas009.sbu.lab.eng.bos.redhat.com	N/A	Y	26033
 
Task Status of Volume gluster-native-volume-3G-1
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@remote-gluster-server ~]# yum info openssl
Installed Packages
Name        : openssl
Arch        : x86_64
Epoch       : 1
Version     : 1.0.1e
Release     : 42.fc20
Size        : 1.5 M
Repo        : installed
From repo   : fedora-updates
Summary     : Utilities from the general purpose cryptography library with TLS implementation
URL         : http://www.openssl.org/
License     : OpenSSL
Description : The OpenSSL toolkit provides support for secure communications between
            : machines. OpenSSL includes a certificate management tool and shared
            : libraries which provide various cryptographic algorithms and
            : protocols.

[root@remote-gluster-server ~]# 


[2015-04-29 09:32:46.921692] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
[2015-04-29 09:32:47.927424] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
[2015-04-29 09:32:49.084098] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
[2015-04-29 09:32:49.242428] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
[2015-04-29 09:32:50.089756] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
[2015-04-29 09:32:50.250215] E [socket.c:2495:socket_poller] 0-tcp.gluster-native-volume-3G-1-server: error in polling loop
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-04-29 09:32:51
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.3
pending frames:
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2015-04-29 09:32:51
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.3
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f2022cac362]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f2022cc385d]
/lib64/libc.so.6(+0x358f0)[0x7f2021cc68f0]
/lib64/libcrypto.so.10(sk_value+0x19)[0x7f20221323f9]
/lib64/libcrypto.so.10(+0x10126b)[0x7f202215026b]
/lib64/libcrypto.so.10(ASN1_item_ex_i2d+0x163)[0x7f2022154f03]
/lib64/libcrypto.so.10(+0x1061ff)[0x7f20221551ff]
/lib64/libcrypto.so.10(X509_NAME_cmp+0x5a)[0x7f202216963a]
/lib64/libcrypto.so.10(X509_check_issued+0x28)[0x7f202217b628]
/lib64/libcrypto.so.10(+0x11b8a5)[0x7f202216a8a5]
/lib64/libcrypto.so.10(X509_verify_cert+0xb4)[0x7f202216bfa4]
/lib64/libssl.so.10(ssl3_output_cert_chain+0x1a8)[0x7f2013bacb68]
/lib64/libssl.so.10(ssl3_send_server_certificate+0x35)[0x7f2013ba03d5]
/lib64/libssl.so.10(ssl3_accept+0xd1d)[0x7f2013ba184d]
/usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so(+0x478a)[0x7f2013def78a]
/usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so(+0x5e50)[0x7f2013df0e50]
/usr/lib64/glusterfs/3.6.3/rpc-transport/socket.so(+0xb159)[0x7f2013df6159]
/lib64/libpthread.so.0(+0x7ee5)[0x7f202243eee5]
/lib64/libc.so.6(clone+0x6d)[0x7f2021d85d1d]
---------
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb2)[0x7f2022cac362]
/lib64/libglusterfs.so.0(gf_print_trace+0x32d)[0x7f2022cc385d]
/lib64/libc.so.6(+0x358f0)[0x7f2021cc68f0]
	
sos-repots: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/sosreport-gqas006.sbu.lab.eng.bos.redhat.com-20150504043010.tar.xz

Comment 3 Jeff Darcy 2015-05-05 14:49:42 UTC
I was unable to reproduce this (in 100 tries) on a Fedora 21 system with the 3.6.3-1 packages from download.gluster.org and OpenSSL 1.0.1j.  I notice that you were using OpenSSL 1.0.1e.  Before I downgrade my test system, or build a new one, can we please verify that 1.0.1e was the correct OpenSSL version to have on your test system?

Comment 4 Jeff Darcy 2015-05-05 15:00:45 UTC
Also, how exactly were the certificates generated?  What SSL "subject" did you use?

Comment 5 Jeff Darcy 2015-05-05 15:34:28 UTC
It's possible that this is a manifestation of a multi-threading issue, which tends to show up in X509_verify_cert.  See http://review.gluster.org/#/c/10075/ for details.  That would explain the non-deterministic appearance of the bug.  Perhaps we need to backport that patch to 3.6?

Comment 6 Anand Avati 2015-05-05 20:59:47 UTC
REVIEW: http://review.gluster.org/10591 (socket: use OpenSSL multi-threading interfaces) posted (#1) for review on release-3.6 by Jeff Darcy (jdarcy)

Comment 7 ssamanta 2015-05-06 04:56:14 UTC
After talking with Kaushal earlier I came to know that openssl version to be used is OpenSSL 1.0.1e. I/O data access path from single client(without enabling management SSL/TLS) was working fine with GlusterFS3.6.2. Do we need to use OpenSSL 1.0.1j? I am using fedora-20.

Installed Packages
Name        : openssl
Arch        : x86_64
Epoch       : 1
Version     : 1.0.1e
Release     : 42.fc20
Size        : 1.5 M
Repo        : installed
From repo   : fedora-updates
Summary     : Utilities from the general purpose cryptography library with TLS implementation
URL         : http://www.openssl.org/
License     : OpenSSL
Description : The OpenSSL toolkit provides support for secure communications between
            : machines. OpenSSL includes a certificate management tool and shared
            : libraries which provide various cryptographic algorithms and
            : protocols.

[root@remote-gluster-server ~]#

Comment 8 Anand Avati 2015-05-06 18:03:27 UTC
REVIEW: http://review.gluster.org/10591 (socket: use OpenSSL multi-threading interfaces) posted (#2) for review on release-3.6 by Jeff Darcy (jdarcy)

Comment 9 ssamanta 2015-05-11 06:50:28 UTC
This issue is more frequently seen and so marking this as a blocker.

Comment 10 Jeff Darcy 2015-05-11 15:03:28 UTC
I think the OpenSSL version is a red herring.  At the time I asked, I was still pretty much in the dark and trying to gather information; I hadn't yet realized that the symptom here closely matches that which http://review.gluster.org/#/c/10075/ had fixed in later versions.  I've posted http://review.gluster.org/10591 as a 3.6 backport, and http://review.gluster.org/10617 so that it can pass regression tests (nothing will on 3.6 because of changes to the test machines).  They both *have* passed regression tests, and merely await review/merging.

Comment 13 Kaushal 2016-02-04 15:27:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-v3.6.4, please open a new bug report.

glusterfs-v3.6.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/gluster-users/2015-July/022826.html
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.