Bug 1244187

Summary: glusterd, rebalance and gnfs process crashed on SSL setup running BVT and rebalance tests
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: M S Vishwanath Bhat <vbhat>
Component: coreAssignee: Vijay Bellur <vbellur>
Status: CLOSED WORKSFORME QA Contact: Rahul Hinduja <rhinduja>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: akhakhar, amukherj, annair, asrivast, kaushal, mzywusko, nbalacha, nlevinki, pkarampu, rcyriac, rgowdapp, rhs-bugs, sankarshan, smohan, storage-qa-internal, vbellur
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: ssl
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-07 04:26:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 3 M S Vishwanath Bhat 2015-07-17 13:22:37 UTC
Information extracted from core

(gdb) bt
#0  rpc_clnt_reconnect (conn_ptr=0x7f4f200026e0) at rpc-clnt.c:409
#1  0x00007f4f506ff743 in gf_timer_proc (ctx=0x7f4f52030010) at timer.c:184
#2  0x00007f4f4f7c8a51 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f4f4f13296d in clone () from /lib64/libc.so.6
(gdb) f 1
#1  0x00007f4f506ff743 in gf_timer_proc (ctx=0x7f4f52030010) at timer.c:184
184                                     event->callbk (event->data);
(gdb) p event
$1 = <value optimized out>
(gdb) p *event
value has been optimized out
(gdb) info thr
  16 Thread 0x7f4ef6bfd700 (LWP 6729)  0x00007f4f4f7cfe9d in fsync () from /lib64/libpthread.so.0
  15 Thread 0x7f4f40d5e700 (LWP 4420)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  14 Thread 0x7f4f50b8b740 (LWP 4148)  0x00007f4f4f7c92ad in pthread_join () from /lib64/libpthread.so.0
  13 Thread 0x7f4ef75fe700 (LWP 6728)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  12 Thread 0x7f4ef4dfa700 (LWP 6895)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  11 Thread 0x7f4ee61fc700 (LWP 13271)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  10 Thread 0x7f4ecf1e4700 (LWP 16393)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  9 Thread 0x7f4f0cdfa700 (LWP 6089)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  8 Thread 0x7f4f45e1a700 (LWP 4152)  0x00007f4f4f7cca0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f4f33fff700 (LWP 4421)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  6 Thread 0x7f4f00dfa700 (LWP 6538)  0x00007f4f4f129143 in poll () from /lib64/libc.so.6
  5 Thread 0x7f4f4681b700 (LWP 4151)  0x00007f4f4f7cca0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f4f4721c700 (LWP 4150)  0x00007f4f4f7d0535 in sigwait () from /lib64/libpthread.so.0
  3 Thread 0x7f4f41b73700 (LWP 4319)  0x00007f4f4f132f63 in epoll_wait () from /lib64/libc.so.6
  2 Thread 0x7f4f42574700 (LWP 4318)  0x00007f4f4f7cc63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1 Thread 0x7f4f47c1d700 (LWP 4149)  rpc_clnt_reconnect (conn_ptr=0x7f4f200026e0) at rpc-clnt.c:409




(gdb) ptype (struct rpc_clnt)
A syntax error in expression, near `'.
(gdb) ptype struct rpc_clnt
type = struct rpc_clnt {
    pthread_mutex_t lock;
    rpc_clnt_notify_t notifyfn;
    rpc_clnt_connection_t conn;
    void *mydata;
    uint64_t xid;
    struct list_head programs;
    struct mem_pool *reqpool;
    struct mem_pool *saved_frames_pool;
    glusterfs_ctx_t *ctx;
    int refcount;
    int auth_null;
    char disabled;
}


(gdb) p event->fired
value has been optimized out
(gdb) f 0
#0  rpc_clnt_reconnect (conn_ptr=0x7f4f200026e0) at rpc-clnt.c:409
409                             gf_timer_call_cancel (clnt->ctx,
(gdb) l
404                     if (!trans) {
405                             pthread_mutex_unlock (&conn->lock);
406                             return;
407                     }
408                     if (conn->reconnect)
409                             gf_timer_call_cancel (clnt->ctx,
410                                                   conn->reconnect);
411                     conn->reconnect = 0;
412
413                     if ((conn->connected == 0) && !clnt->disabled) {
(gdb) p *conn_ptr
Attempt to dereference a generic pointer.
(gdb) p clnt
$3 = (struct rpc_clnt *) 0x6c6f766c61
(gdb) p *clnt
Cannot access memory at address 0x6c6f766c61

Comment 15 Kaushal 2015-10-01 06:06:26 UTC
The crashes observed above, were because of 2 different causes.

The SSL/Encryption cause has been fixed as a fix for 1243722.

The other crash was caused due to a race in timer. The probablity of hitting this race which has always existed (AFAIK), was increased due the timer patch mentioned in comments above. Reverting the patch just reduces the chances of hitting this race, but does not eliminate it.

This is not GlusterD bug or a bug with SSL. I'm removing the assignment of this to a-team.

Comment 18 Amar Tumballi 2018-02-07 04:26:20 UTC
We have noticed that the bug is not reproduced in the latest version of the product (RHGS-3.3.1+).

If the bug is still relevant and is being reproduced, feel free to reopen the bug.