1399147 – [Eventing]: GlusterFS brick process crashed after add-brick and 'rebalance start' operation

Bug 1399147 - [Eventing]: GlusterFS brick process crashed after add-brick and 'rebalance start' operation

Summary: [Eventing]: GlusterFS brick process crashed after add-brick and 'rebalance st...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	eventsapi
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	Sweta Anandpara
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-28 12:25 UTC by Prasad Desala
Modified:	2018-11-08 12:44 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-11-08 12:44:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Prasad Desala 2016-11-28 12:25:47 UTC

Description of problem:
=======================
one of the GLusterFS brick process crashed after add-brick and rebalance start operation.

Below is the generated BT :

(gdb) bt
#0  0x00007fbd412cf817 in gf_event (event=event@entry=EVENT_CLIENT_CONNECT, fmt=fmt@entry=0x7fbd30599880 "client_uid=%s;client_identifier=%s;server_identifier=%s;brick_path=%s")
    at events.c:71
#1  0x00007fbd30593074 in server_setvolume (req=0x7fbd2b6a90ac) at server-handshake.c:695
#2  0x00007fbd4101e765 in rpcsvc_handle_rpc_call (svc=0x7fbd2c035590, trans=trans@entry=0x7fbd2c64f880, msg=<optimized out>) at rpcsvc.c:695
#3  0x00007fbd4101e94b in rpcsvc_notify (trans=0x7fbd2c64f880, mydata=<optimized out>, event=<optimized out>, data=<optimized out>) at rpcsvc.c:789
#4  0x00007fbd41020883 in rpc_transport_notify (this=this@entry=0x7fbd2c64f880, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fbd2c2b4a00) at rpc-transport.c:537
#5  0x00007fbd35b12eb4 in socket_event_poll_in (this=this@entry=0x7fbd2c64f880) at socket.c:2267
#6  0x00007fbd35b15365 in socket_event_handler (fd=<optimized out>, idx=13, data=0x7fbd2c64f880, poll_in=1, poll_out=0, poll_err=0) at socket.c:2397
#7  0x00007fbd412b43d0 in event_dispatch_epoll_handler (event=0x7fbd34269e80, event_pool=0x7fbd42122f00) at event-epoll.c:571
#8  event_dispatch_epoll_worker (data=0x7fbd421760c0) at event-epoll.c:674
#9  0x00007fbd400bbdc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007fbd3fa0073d in clone () from /lib64/libc.so.6



#0  0x00007fbd412cf817 in gf_event (event=event@entry=EVENT_CLIENT_CONNECT, 
    fmt=fmt@entry=0x7fbd30599880 "client_uid=%s;client_identifier=%s;server_identifier=%s;brick_path=%s") at events.c:71
71	                host = inet_ntoa (*(struct in_addr *)(host_data->h_addr));
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-26.el7.x86_64 libacl-2.2.51-12.el7.x86_64 libaio-0.3.109-13.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 sqlite-3.7.17-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) ptype host_data
type = struct hostent {
    char *h_name;
    char **h_aliases;
    int h_addrtype;
    int h_length;
    char **h_addr_list;
} *
(gdb) 


Brick logs:
===========
[2016-11-28 11:35:25.607044] I [MSGID: 115029] [server-handshake.c:693:server_setvolume] 0-distrep-server: accepted client from dhcp42-7.lab.eng.blr.redhat.com-21736-2016/11/28-11:35:20:733849-distrep-client-1-0-0 (version: 3.8.4)
pending frames:
frame : type(1) op(NULL)
frame : type(0) op(0)
frame : type(0) op(13)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash:
2016-11-28 11:35:25
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.8.4
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fbd4125abd2]
/lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fbd41264664]
/lib64/libc.so.6(+0x35250)[0x7fbd3f93e250]
/lib64/libglusterfs.so.0(gf_event+0x137)[0x7fbd412cf817]
/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x31074)[0x7fbd30593074]
/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7fbd4101e765]
/lib64/libgfrpc.so.0(rpcsvc_notify+0x10b)[0x7fbd4101e94b]
/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fbd41020883]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x6eb4)[0x7fbd35b12eb4]
/usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9365)[0x7fbd35b15365]
/lib64/libglusterfs.so.0(+0x833d0)[0x7fbd412b43d0]
/lib64/libpthread.so.0(+0x7dc5)[0x7fbd400bbdc5]
/lib64/libc.so.6(clone+0x6d)[0x7fbd3fa0073d]
---------

Version-Release number of selected component (if applicable):
3.8.4-5.el7rhgs.x86_64

How reproducible:
=================
1/1

Steps to Reproduce:
===================
1) Create a distributed-replicate volume and start it.
2) FUSE mount the volume on multiple clients.
3) From clients, start creating files and do continuous lookups.
4) Add-few bricks and start rebalance opration.

Crash is seen.

Actual results:
===============
One of the brick process crashed.
Expected results:
=================
There should not be any crashes.

Note You need to log in before you can comment on or make changes to this bug.