Bug 801787 - Crash during rebalance
Summary: Crash during rebalance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 817967
TreeView+ depends on / blocked
 
Reported: 2012-03-09 13:53 UTC by shylesh
Modified: 2015-12-01 16:45 UTC (History)
2 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 18:03:00 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions: 3.3.0qa41
Embargoed:


Attachments (Terms of Use)

Description shylesh 2012-03-09 13:53:11 UTC
Description of problem:
glusterfs crashed during rebalance operation

Version-Release number of selected component (if applicable):
Mainline

How reproducible:


Steps to Reproduce:
1.create a distribute volume with few bricks
2.fill some data 
3.add-brick to the volume and initiate rebalance
4. while rebalance is happening perform some I/O on mount point
5. at the same time restart glusterd
  
Actual results:
glusterfs crashed

Expected results:


Additional info:

(gdb) p ctx->active
$1 = (glusterfs_graph_t *) 0x0
(gdb) p *ctx->active
Cannot access memory at address 0x0



==============================================================================
(gdb) bt
#0  0x000000000040a708 in glusterfs_handle_defrag (req=0x137ed6c) at glusterfsd-mgmt.c:765
#1  0x000000000040b1fb in glusterfs_handle_rpc_msg (req=0x137ed6c) at glusterfsd-mgmt.c:983
#2  0x00007f88243260b5 in rpcsvc_handle_rpc_call (svc=0x137ebf0, trans=0x13937f0, msg=0x1393660) at rpcsvc.c:514
#3  0x00007f8824326458 in rpcsvc_notify (trans=0x13937f0, mydata=0x137ebf0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1393660)
    at rpcsvc.c:610
#4  0x00007f882432be10 in rpc_transport_notify (this=0x13937f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1393660)
    at rpc-transport.c:498
#5  0x00007f882103d27c in socket_event_poll_in (this=0x13937f0) at socket.c:1686
#6  0x00007f882103d800 in socket_event_handler (fd=8, idx=2, data=0x13937f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801
#7  0x00007f8824586080 in event_dispatch_epoll_handler (event_pool=0x1379d90, events=0x1392a40, i=0) at event.c:794
#8  0x00007f88245862a3 in event_dispatch_epoll (event_pool=0x1379d90) at event.c:856
#9  0x00007f882458662e in event_dispatch (event_pool=0x1379d90) at event.c:956
#10 0x0000000000407d6d in main (argc=21, argv=0x7fffa1bb5518) at glusterfsd.c:1611

======================================================================
[2012-03-09 08:33:25.164547] W [client.c:2011:client_rpc_notify] 0-dist-client-2: Registering a grace timer
[2012-03-09 08:33:25.164561] I [client.c:2024:client_rpc_notify] 0-dist-client-2: disconnected
[2012-03-09 08:33:24.170727] I [dht-rebalance.c:852:dht_migrate_file] 0-dist-dht: completed migration of /linux-3.2.1/arch/arm/include
/asm/hardware/ssp.h from subvolume dist-client-2 to dist-client-5
[2012-03-09 08:33:25.164561] I [client.c:2024:client_rpc_notify] 0-dist-client-2: disconnected
[2012-03-09 08:33:25.164570] W [dht-common.c:4476:dht_notify] 0-dist-dht: Received CHILD_DOWN. Exiting
ad.so.0() [0x39674077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40741e]))) 0-: received signum (15), shutting down
 3git
[2012-03-09 08:33:43.404701] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8
t supported
pending frames:
===============================================================================

Comment 1 Amar Tumballi 2012-03-12 09:46:54 UTC
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.

Comment 2 Anand Avati 2012-03-12 14:59:29 UTC
CHANGE: http://review.gluster.com/2924 (glusterfsd: handle a case of NULL dereference during rebalance) merged in master by Vijay Bellur (vijay)

Comment 3 shylesh 2012-05-17 13:25:12 UTC
No crash happens upon restarting glusterd


Note You need to log in before you can comment on or make changes to this bug.