1194605 – DHT + epoll : client crashed

Bug 1194605 - DHT + epoll : client crashed

Summary: DHT + epoll : client crashed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.0.4
Assignee:	Raghavendra G
QA Contact:	amainkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1182947 1195120
TreeView+	depends on / blocked

Reported:	2015-02-20 10:56 UTC by Rachana Patel
Modified:	2015-05-13 17:51 UTC (History)
CC List:	9 users (show)
Fixed In Version:	glusterfs-3.6.0.48-1
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1195120 (view as bug list)
Environment:
Last Closed:	2015-03-26 06:36:25 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2015:0682	0	normal	SHIPPED_LIVE	Red Hat Storage 3.0 enhancement and bug fix update #4	2015-03-26 10:32:55 UTC

Description Rachana Patel 2015-02-20 10:56:59 UTC

Description of problem:
=======================
On DHT volume, while doing parallel mkdir and file creation inside, client got crashed


Version-Release number of selected component (if applicable):
=============================================================
3.6.0.45-1.el6rhs.x86_64

How reproducible:
=================
Haven't tried


Steps to Reproduce:
===================
1. created DHT volume having 10 bricks
2. mounted on 2 client - 4 FUSE and 4 NFS
3. start creating Directories from multiple mount and file creations inside it.
4. keep changing epoll thread value.


Actual results:
===============
After sometime client process was crashed on 2 client machine

Expected results:
=================



Additional info:
=================

log:-
[2015-02-19 15:15:01.756215] I [event-epoll.c:628:event_dispatch_e
poll_worker] 0-epoll: Exited thread with index 22
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-02-19 15:15:59
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.45
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3dcac207b6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3dcac3b3cf]
/lib64/libc.so.6[0x3dc94326a0]
/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_discover_complete+0x3a)[0x7fe9f8f7d98a]
/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_discover_cbk+0x273)[0x7fe9f8f85733]
/usr/lib64/glusterfs/3.6.0.45/xlator/protocol/client.so(client3_3_lookup_cbk+0x647)[0x7fe9f91c9687]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x3dca80e785]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x142)[0x3dca80fc12]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x3dca80b3e8]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0x9301)[0x7fe9fa42d301]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0xad1d)[0x7fe9fa42ed1d]
/usr/lib64/libglusterfs.so.0[0x3dcac77d1c]
/lib64/libpthread.so.0[0x3dc98079d1]
/lib64/libc.so.6(clone+0x6d)[0x3dc94e89dd]



core:-
(gdb) bt
#0  0x00007fe9f8f7d98a in dht_discover_complete ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so
#1  0x00007fe9f8f85733 in dht_discover_cbk ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so
#2  0x00007fe9f91c9687 in client3_3_lookup_cbk ()
   from /usr/lib64/glusterfs/3.6.0.45/xlator/protocol/client.so
#3  0x0000003dca80e785 in rpc_clnt_handle_reply ()
   from /usr/lib64/libgfrpc.so.0
#4  0x0000003dca80fc12 in rpc_clnt_notify ()
   from /usr/lib64/libgfrpc.so.0
#5  0x0000003dca80b3e8 in rpc_transport_notify ()
   from /usr/lib64/libgfrpc.so.0
#6  0x00007fe9fa42d301 in ?? ()
   from /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so
#7  0x00007fe9fa42ed1d in ?? ()
   from /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so
#8  0x0000003dcac77d1c in ?? () from /usr/lib64/libglusterfs.so.0
#9  0x0000003dc98079d1 in start_thread ()
   from /lib64/libpthread.so.0
#10 0x0000003dc94e89dd in clone () from /lib64/libc.so.6

Comment 2 Sayan Saha 2015-02-20 16:54:27 UTC

Changing epoll thread value continuously and randomly does not seem to be a real world scenario. The epoll thread value is expected to be mostly static. While we should investigate the crash, it is not a blocker.

Comment 3 Raghavendra G 2015-02-25 06:48:14 UTC

https://code.engineering.redhat.com/gerrit/42685

Comment 6 Amit Chaurasia 2015-03-05 13:32:36 UTC

Tested this on following build:

[root@dht-rhs-19 nfs_mnt1]# rpm -qa  |grep -i gluster
gluster-nagios-common-0.1.4-1.el6rhs.noarch
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-api-3.6.0.48-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.48-1.el6rhs.x86_64
glusterfs-3.6.0.48-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
glusterfs-fuse-3.6.0.48-1.el6rhs.x86_64
glusterfs-server-3.6.0.48-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.48-1.el6rhs.x86_64
glusterfs-debuginfo-3.6.0.48-1.el6rhs.x86_64
glusterfs-libs-3.6.0.48-1.el6rhs.x86_64
glusterfs-cli-3.6.0.48-1.el6rhs.x86_64
[root@dht-rhs-19 nfs_mnt1]# 


Started a recursive directory creation and rename operation and while it is in progress, kept on changing the epoll server and client thread values.

No crash seen.

Marking the bug verified.

Comment 8 errata-xmlrpc 2015-03-26 06:36:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html

Note You need to log in before you can comment on or make changes to this bug.