Description of problem: glusterfs crashed during rebalance operation Version-Release number of selected component (if applicable): Mainline How reproducible: Steps to Reproduce: 1.create a distribute volume with few bricks 2.fill some data 3.add-brick to the volume and initiate rebalance 4. while rebalance is happening perform some I/O on mount point 5. at the same time restart glusterd Actual results: glusterfs crashed Expected results: Additional info: (gdb) p ctx->active $1 = (glusterfs_graph_t *) 0x0 (gdb) p *ctx->active Cannot access memory at address 0x0 ============================================================================== (gdb) bt #0 0x000000000040a708 in glusterfs_handle_defrag (req=0x137ed6c) at glusterfsd-mgmt.c:765 #1 0x000000000040b1fb in glusterfs_handle_rpc_msg (req=0x137ed6c) at glusterfsd-mgmt.c:983 #2 0x00007f88243260b5 in rpcsvc_handle_rpc_call (svc=0x137ebf0, trans=0x13937f0, msg=0x1393660) at rpcsvc.c:514 #3 0x00007f8824326458 in rpcsvc_notify (trans=0x13937f0, mydata=0x137ebf0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1393660) at rpcsvc.c:610 #4 0x00007f882432be10 in rpc_transport_notify (this=0x13937f0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1393660) at rpc-transport.c:498 #5 0x00007f882103d27c in socket_event_poll_in (this=0x13937f0) at socket.c:1686 #6 0x00007f882103d800 in socket_event_handler (fd=8, idx=2, data=0x13937f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:1801 #7 0x00007f8824586080 in event_dispatch_epoll_handler (event_pool=0x1379d90, events=0x1392a40, i=0) at event.c:794 #8 0x00007f88245862a3 in event_dispatch_epoll (event_pool=0x1379d90) at event.c:856 #9 0x00007f882458662e in event_dispatch (event_pool=0x1379d90) at event.c:956 #10 0x0000000000407d6d in main (argc=21, argv=0x7fffa1bb5518) at glusterfsd.c:1611 ====================================================================== [2012-03-09 08:33:25.164547] W [client.c:2011:client_rpc_notify] 0-dist-client-2: Registering a grace timer [2012-03-09 08:33:25.164561] I [client.c:2024:client_rpc_notify] 0-dist-client-2: disconnected [2012-03-09 08:33:24.170727] I [dht-rebalance.c:852:dht_migrate_file] 0-dist-dht: completed migration of /linux-3.2.1/arch/arm/include /asm/hardware/ssp.h from subvolume dist-client-2 to dist-client-5 [2012-03-09 08:33:25.164561] I [client.c:2024:client_rpc_notify] 0-dist-client-2: disconnected [2012-03-09 08:33:25.164570] W [dht-common.c:4476:dht_notify] 0-dist-dht: Received CHILD_DOWN. Exiting ad.so.0() [0x39674077e1] (-->/usr/local/sbin/glusterfs(glusterfs_sigwaiter+0xfc) [0x40741e]))) 0-: received signum (15), shutting down 3git [2012-03-09 08:33:43.404701] W [socket.c:419:__socket_keepalive] 0-socket: failed to set keep idle on socket 8 t supported pending frames: ===============================================================================
please update these bugs w.r.to 3.3.0qa27, need to work on it as per target milestone set.
CHANGE: http://review.gluster.com/2924 (glusterfsd: handle a case of NULL dereference during rebalance) merged in master by Vijay Bellur (vijay)
No crash happens upon restarting glusterd