Bug 1194605

Summary: DHT + epoll : client crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: distributeAssignee: Raghavendra G <rgowdapp>
Status: CLOSED ERRATA QA Contact: amainkar
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: achauras, annair, asrivast, bturner, nbalacha, nsathyan, rcyriac, shmohan, ssaha
Target Milestone: ---Keywords: Regression, ZStream
Target Release: RHGS 3.0.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.48-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1195120 (view as bug list) Environment:
Last Closed: 2015-03-26 06:36:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1182947, 1195120    

Description Rachana Patel 2015-02-20 10:56:59 UTC
Description of problem:
=======================
On DHT volume, while doing parallel mkdir and file creation inside, client got crashed


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.45-1.el6rhs.x86_64

How reproducible:
=================
Haven't tried


Steps to Reproduce:
===================
1. created DHT volume having 10 bricks
2. mounted on 2 client - 4 FUSE and 4 NFS
3. start creating Directories from multiple mount and file creations inside it.
4. keep changing epoll thread value.


Actual results:
===============
After sometime client process was crashed on 2 client machine

Expected results:
=================



Additional info:
=================

log:-
[2015-02-19 15:15:01.756215] I [event-epoll.c:628:event_dispatch_e
poll_worker] 0-epoll: Exited thread with index 22
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-02-19 15:15:59
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.45
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3dcac207b6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3dcac3b3cf]
/lib64/libc.so.6[0x3dc94326a0]
/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_discover_complete+0x3a)[0x7fe9f8f7d98a]
/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_discover_cbk+0x273)[0x7fe9f8f85733]
/usr/lib64/glusterfs/3.6.0.45/xlator/protocol/client.so(client3_3_lookup_cbk+0x647)[0x7fe9f91c9687]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x3dca80e785]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x142)[0x3dca80fc12]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3dca80b3e8]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0x9301)[0x7fe9fa42d301]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0xad1d)[0x7fe9fa42ed1d]
/usr/lib64/libglusterfs.so.0[0x3dcac77d1c]
/lib64/libpthread.so.0[0x3dc98079d1]
/lib64/libc.so.6(clone+0x6d)[0x3dc94e89dd]



core:-
(gdb) bt
#0  0x00007fe9f8f7d98a in dht_discover_complete ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so
#1  0x00007fe9f8f85733 in dht_discover_cbk ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so
#2  0x00007fe9f91c9687 in client3_3_lookup_cbk ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/protocol/client.so
#3  0x0000003dca80e785 in rpc_clnt_handle_reply ()
   from /usr/lib64/libgfrpc.so.0
#4  0x0000003dca80fc12 in rpc_clnt_notify ()
   from /usr/lib64/libgfrpc.so.0
#5  0x0000003dca80b3e8 in rpc_transport_notify ()
   from /usr/lib64/libgfrpc.so.0
#6  0x00007fe9fa42d301 in ?? ()
   from /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so
#7  0x00007fe9fa42ed1d in ?? ()
   from /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so
#8  0x0000003dcac77d1c in ?? () from /usr/lib64/libglusterfs.so.0
#9  0x0000003dc98079d1 in start_thread ()
   from /lib64/libpthread.so.0
#10 0x0000003dc94e89dd in clone () from /lib64/libc.so.6

Comment 2 Sayan Saha 2015-02-20 16:54:27 UTC
Changing epoll thread value continuously and randomly does not seem to be a real world scenario. The epoll thread value is expected to be mostly static. While we should investigate the crash, it is not a blocker.

Comment 3 Raghavendra G 2015-02-25 06:48:14 UTC
https://code.engineering.redhat.com/gerrit/42685

Comment 6 Amit Chaurasia 2015-03-05 13:32:36 UTC
Tested this on following build:

[root@dht-rhs-19 nfs_mnt1]# rpm -qa  |grep -i gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-api-3.6.0.48-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.48-1.el6rhs.x86_64
glusterfs-3.6.0.48-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
glusterfs-fuse-3.6.0.48-1.el6rhs.x86_64
glusterfs-server-3.6.0.48-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.48-1.el6rhs.x86_64
glusterfs-debuginfo-3.6.0.48-1.el6rhs.x86_64
glusterfs-libs-3.6.0.48-1.el6rhs.x86_64
glusterfs-cli-3.6.0.48-1.el6rhs.x86_64
[root@dht-rhs-19 nfs_mnt1]# 


Started a recursive directory creation and rename operation and while it is in progress, kept on changing the epoll server and client thread values.

No crash seen.

Marking the bug verified.

Comment 8 errata-xmlrpc 2015-03-26 06:36:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html