Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 877903

Summary: Crash while running SSL unit test
Product: [Community] GlusterFS Reporter: Vijay Bellur <vbellur>
Component: transportAssignee: Jeff Darcy <jdarcy>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: mainlineCC: gluster-bugs, jdarcy, manu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:37:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Backtrace none

Description Vijay Bellur 2012-11-19 08:28:08 UTC
Created attachment 647582 [details]
Backtrace

Description of problem:

Following crash was seen when patch http://review.gluster.org/#change,4118 was applied on top of current git HEAD (cfe51eb7ff5d5d61c1cf9ad1588c7a3e8250736b)

Version-Release number of selected component (if applicable):


How reproducible:

run_tests.sh on the combination specified above

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

[Switching to thread 3 (Thread 0x7f4e8e358700 (LWP 30831))]#0  0x00007f4e93a5bb1b in pthread_once () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f4e93a5bb1b in pthread_once () from /lib64/libpthread.so.0
#1  0x00007f4e93420194 in backtrace () from /lib64/libc.so.6
#2  0x00007f4e940ca87e in gf_print_trace (signum=11, ctx=0xfa2010) at /var/lib/jenkins/jobs/regression/workspace/libglusterfs/src/common-utils.c:468
#3  0x0000000000407b15 in glusterfsd_print_trace (signum=11) at /var/lib/jenkins/jobs/regression/workspace/glusterfsd/src/glusterfsd.c:1595
#4  <signal handler called>
#5  0x00007f4e940ce414 in gf_timer_call_after (ctx=0xfa2010, delta=..., callbk=0x7f4e93e978b0 <rpc_clnt_reconnect>, data=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/libglusterfs/src/timer.c:70
#6  0x00007f4e93e979fc in rpc_clnt_reconnect (trans_ptr=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/rpc/rpc-lib/src/rpc-clnt.c:430
#7  0x00007f4e940ce7b0 in gf_timer_proc (ctx=0xfa2010) at /var/lib/jenkins/jobs/regression/workspace/libglusterfs/src/timer.c:168
#8  0x00007f4e93a56851 in start_thread () from /lib64/libpthread.so.0
#9  0x00007f4e9340a11d in clone () from /lib64/libc.so.6
(gdb) f 5
#5  0x00007f4e940ce414 in gf_timer_call_after (ctx=0xfa2010, delta=..., callbk=0x7f4e93e978b0 <rpc_clnt_reconnect>, data=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/libglusterfs/src/timer.c:70
70	                event->next->prev = event;
(gdb) p event
$1 = (gf_timer_t *) 0x7f4e880008e0
(gdb) p event->next
$2 = (struct _gf_timer *) 0x0

======================================================
(gdb) t 2
[Switching to thread 2 (Thread 0x7f4e875fe700 (LWP 30841))]#0  0x00007f4e93a5bb1b in pthread_once () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f4e93a5bb1b in pthread_once () from /lib64/libpthread.so.0
#1  0x00007f4e93420194 in backtrace () from /lib64/libc.so.6
#2  0x00007f4e93391ffb in __libc_message () from /lib64/libc.so.6
#3  0x00007f4e93397916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f4e9339b6cf in _int_malloc () from /lib64/libc.so.6
#5  0x00007f4e9339c141 in malloc () from /lib64/libc.so.6
#6  0x00007f4e94332c72 in local_strdup () from /lib64/ld-linux-x86-64.so.2
#7  0x00007f4e94336634 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2
#8  0x00007f4e943409b4 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00007f4e9433c196 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#10 0x00007f4e9434046a in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007f4e93447b40 in do_dlopen () from /lib64/libc.so.6
#12 0x00007f4e9433c196 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x00007f4e93447c97 in __libc_dlopen_mode () from /lib64/libc.so.6
#14 0x00007f4e93420095 in init () from /lib64/libc.so.6
#15 0x00007f4e93a5bb23 in pthread_once () from /lib64/libpthread.so.0
#16 0x00007f4e93420194 in backtrace () from /lib64/libc.so.6
#17 0x00007f4e940ca87e in gf_print_trace (signum=11, ctx=0xfa2010) at /var/lib/jenkins/jobs/regression/workspace/libglusterfs/src/common-utils.c:468
#18 0x0000000000407b15 in glusterfsd_print_trace (signum=11) at /var/lib/jenkins/jobs/regression/workspace/glusterfsd/src/glusterfsd.c:1595
#19 <signal handler called>
#20 0x00007f4e933a5d18 in __memset_sse2 () from /lib64/libc.so.6
#21 0x00007f4e93762a5c in BUF_MEM_free () from /usr/lib64/libcrypto.so.10
#22 0x00007f4e8f93d4c8 in SSL_clear () from /usr/lib64/libssl.so.10
#23 0x00007f4e8fb652bc in __socket_disconnect (this=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/rpc/rpc-transport/socket/src/socket.c:500
#24 0x00007f4e8fb6b616 in socket_poller (ctx=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/rpc/rpc-transport/socket/src/socket.c:2235
#25 0x00007f4e93a56851 in start_thread () from /lib64/libpthread.so.0
#26 0x00007f4e9340a11d in clone () from /lib64/libc.so.6
(gdb) f 23
#23 0x00007f4e8fb652bc in __socket_disconnect (this=0x101a370) at /var/lib/jenkins/jobs/regression/workspace/rpc/rpc-transport/socket/src/socket.c:500
500				SSL_clear(priv->ssl_ssl);
(gdb) p priv->ssl_ssl
$3 = (SSL *) 0x7f4e80000970
(gdb) 


Log file attached.

Comment 1 Jeff Darcy 2012-11-19 17:19:34 UTC
This might be the same thing that Emmanuel Dreyfus reported on NetBSD a while ago.  Unfortunately, I'm still having trouble getting it to happen on my machines - probably because it's some sort of race that only happens on slow hardware.  I'll keep trying.

Comment 2 Vijay Bellur 2012-11-19 17:29:50 UTC
(In reply to comment #1)
> This might be the same thing that Emmanuel Dreyfus reported on NetBSD a
> while ago.  Unfortunately, I'm still having trouble getting it to happen on
> my machines - probably because it's some sort of race that only happens on
> slow hardware.  I'll keep trying.

FWIW, the crash was seen on a VM. Not sure if the slowness there is inducing this race.

Comment 3 Jeff Darcy 2012-11-19 19:11:15 UTC
OK, I can reproduce this by making ssl_setup_connection fail (e.g. by trying to make an SSL connection to a non-SSL server).  What happens is that ssl_setup_connection frees priv->ssl_ssl, but then the caller (in this case socket_poller) frees it again.  Now I need to figure out why Krishnan's change causes ssl_setup_connection to fail.

Comment 4 Vijay Bellur 2012-11-23 10:10:55 UTC
CHANGE: http://review.gluster.org/4208 (socket: fix double-free when ssl_setup_connection fails) merged in master by Vijay Bellur (vbellur)

Comment 5 Vijay Bellur 2012-12-11 09:29:07 UTC
Verified with 3.4.0qa4