1037597 – glusterd crash after performing rebalance and volume delete and create operations

Bug 1037597 - glusterd crash after performing rebalance and volume delete and create operations

Summary: glusterd crash after performing rebalance and volume delete and create operat...

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nithya Balachandran
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1286155
TreeView+	depends on / blocked

Reported:	2013-12-03 12:53 UTC by senaik
Modified:	2015-11-27 12:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1286155 (view as bug list)
Environment:
Last Closed:	2015-11-27 12:09:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description senaik 2013-12-03 12:53:43 UTC

Description of problem:
=======================
glusterd crashed after performing rebalance and volume delete and create operations 


Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3.4.0.44.1u2rhs


How reproducible:
================
faced it once till now 

Steps to Reproduce:
====================
1.Create a distributed volume and start it 

2.Fuse mount the volume and create some files
 
3.add bricks and perform fix layout multiple times and check rebalance status (with glusterd restart a couple of times before checking rebalance status)

4.start rebalance and check status 

5.Perform rebalance start force and check status

6.Stop the volume and delete it 

7. Create another volume and mount it 

8. add 2 more bricks and check rebalance status . Below is the output : There were no nodes listed in the output 

gluster v rebalance vol1 status
Node  Rebalanced-files  size  scanned failures skipped  status  run time in secs
----  ----------------  ----  ------- -------- -------  ------  ----------------

--------part of glusterd log --------------------

[2013-12-03 12:29:26.906589] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 12:29:29.907000] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 12:29:29.907066] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 12:29:32.907509] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2013-12-03 12:29:32.907611] I [socket.c:2235:socket_event_handler] 0-transport: disconnecting now
pending frames:

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2013-12-03 12:29:35configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0.44.1u2rhs
/lib64/libc.so.6[0x30d3c32960]
/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(__glusterd_defrag_notify+0x1d0)[0x7f6a194fd550]
/usr/lib64/glusterfs/3.4.0.44.1u2rhs/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x60)[0x7f6a194ad830]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x109)[0x7f6a1ccdf2e9]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f6a1ccdab78]
/usr/lib64/glusterfs/3.4.0.44.1u2rhs/rpc-transport/socket.so(+0x557c)[0x7f6a17d5057c]
/usr/lib64/glusterfs/3.4.0.44.1u2rhs/rpc-transport/socket.so(+0xa5b8)[0x7f6a17d555b8]
/usr/lib64/libglusterfs.so.0(+0x62327)[0x7f6a1cf4a327]
/usr/sbin/glusterd(main+0x6c7)[0x4069d7]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x30d3c1ecdd]
/usr/sbin/glusterd[0x404619]

--------------------------------------------------------------------
Actual results:


Expected results:


Additional info:

Comment 1 senaik 2013-12-03 13:04:25 UTC

sosreports : 

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1037597/

Comment 3 senaik 2013-12-26 13:52:33 UTC

glusterd crashed while stopping and deleting volumes : 

Steps :
======
While trying to install rpms (from glusterfs 3.4.0.44.1u2rhs to glusterfs-3.4.0.52rhs ), volumes were in the started state , so tried to stop and delete volumes from the other nodes and glusterd crashed on one of the nodes.

(gdb) bt
#0  __glusterd_defrag_notify (rpc=0x15b2970, mydata=0x14fdc20, event=RPC_CLNT_CONNECT, data=<value optimized out>) at glusterd-rebalance.c:119
#1  0x00007f81df8ac830 in glusterd_big_locked_notify (rpc=0x15b2970, mydata=0x14fdc20, event=RPC_CLNT_CONNECT, data=0x0, 
    notify_fn=0x7f81df8fc380 <__glusterd_defrag_notify>) at glusterd-handler.c:66
#2  0x0000003701a0f2e9 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x15b29a0, event=<value optimized out>, data=<value optimized out>)
    at rpc-clnt.c:937
#3  0x0000003701a0ab78 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>)
    at rpc-transport.c:512
#4  0x00007f81de14f57c in socket_connect_finish (this=0x15a90e0) at socket.c:2192
#5  0x00007f81de1545b8 in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x15a90e0, poll_in=0, poll_out=4, 
    poll_err=16) at socket.c:2222
#6  0x0000003701662327 in event_dispatch_epoll_handler (event_pool=0x14d8ee0) at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x14d8ee0) at event-epoll.c:445
#8  0x00000000004069d7 in main (argc=2, argv=0x7fffcceffaa8) at glusterfsd.c:2048

Comment 4 senaik 2013-12-26 13:57:54 UTC

core file and glusterd logs for comment 3 : 

http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1037597_26dec/

Comment 6 Susant Kumar Palai 2015-11-27 12:09:43 UTC

Cloning this to 3.1. to be fixed in future release.

Note You need to log in before you can comment on or make changes to this bug.