Created a volume with tcp,rdma volume type. mounted via knfs in another client machine. nfs mountpoint was hanging for sometime. So with the intention of restarting the volume with TRACE log level I issued volume stop command. glusterd crashed with following back trace. Core was generated by `glusterd'. Program terminated with signal 6, Aborted. #0 0x0000003283a30265 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x0000003283a30265 in raise () from /lib64/libc.so.6 #1 0x0000003283a31d10 in abort () from /lib64/libc.so.6 #2 0x0000003283a296e6 in __assert_fail () from /lib64/libc.so.6 #3 0x00002b056efe48ba in __gf_free (free_ptr=0x16ca4a90) at ../../../libglusterfs/src/mem-pool.c:273 #4 0x00002b056efcb41a in gf_timer_call_cancel (ctx=0x1515f010, event=0x16ca4a90) at ../../../libglusterfs/src/timer.c:131 #5 0x00002aaaaaae7a8e in glusterd_op_brick_disconnect (data=0x16ca4a30) at ../../../../../xlators/mgmt/glusterd/src/glusterd-op-sm.c:1872 #6 0x00002aaaaaadf33b in glusterd_brick_rpc_notify (rpc=0x16bdcab0, mydata=0x191a3020, event=RPC_CLNT_DISCONNECT, data=0x0) at ../../../../../xlators/mgmt/glusterd/src/glusterd-handler.c:2565 #7 0x00002b056f23b510 in rpc_clnt_notify (trans=0x16bdcce0, mydata=0x16bdcae0, event=RPC_TRANSPORT_DISCONNECT, data=0x16bdcce0) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:879 #8 0x00002b056f2379f3 in rpc_transport_notify (this=0x16bdcce0, event=RPC_TRANSPORT_DISCONNECT, data=0x16bdcce0) at ../../../../rpc/rpc-lib/src/rpc-transport.c:498 #9 0x00002aaaaae07006 in socket_event_poll_err (this=0x16bdcce0) at ../../../../../rpc/rpc-transport/socket/src/socket.c:694 #10 0x00002aaaaae0b4b8 in socket_event_handler (fd=21, idx=10, data=0x16bdcce0, poll_in=1, poll_out=0, poll_err=16) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1797 #11 0x00002b056efe3988 in event_dispatch_epoll_handler (event_pool=0x151617a0, events=0x167cc070, i=0) at ../../../libglusterfs/src/event.c:794 #12 0x00002b056efe3b8d in event_dispatch_epoll (event_pool=0x151617a0) at ../../../libglusterfs/src/event.c:856 #13 0x00002b056efe3ee7 in event_dispatch (event_pool=0x151617a0) at ../../../libglusterfs/src/event.c:956 #14 0x000000000040784d in main (argc=1, argv=0x7fff989efb38) at ../../../glusterfsd/src/glusterfsd.c:1592 (gdb) f 3 #3 0x00002b056efe48ba in __gf_free (free_ptr=0x16ca4a90) at ../../../libglusterfs/src/mem-pool.c:273 273 GF_ASSERT (0); (gdb) f 5 #5 0x00002aaaaaae7a8e in glusterd_op_brick_disconnect (data=0x16ca4a30) at ../../../../../xlators/mgmt/glusterd/src/glusterd-op-sm.c:1872 1872 gf_timer_call_cancel (THIS->ctx, brickinfo->timer); I saw following erros in glusterd logs before it crashed. [2011-10-05 04:00:20.53453] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:00:20.53491] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:00:20.150984] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:999) after handshake is complete [2011-10-05 04:00:20.162281] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1000) after handshake is complete [2011-10-05 04:00:20.179142] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1001) after handshake is complete [2011-10-05 04:00:20.190451] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1002) after handshake is complete [2011-10-05 04:00:20.196186] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1003) after handshake is complete [2011-10-05 04:00:20.201945] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1004) after handshake is complete [2011-10-05 04:00:20.212742] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1005) after handshake is complete [2011-10-05 04:00:20.218558] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:998) after handshake is complete [2011-10-05 04:01:21.137325] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:993) after handshake is complete [2011-10-05 04:01:21.322276] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:992) after handshake is complete [2011-10-05 04:02:26.583109] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1015) after handshake is complete [2011-10-05 04:02:26.670842] E [rdma.c:4447:rdma_event_handler] 0-rpc-transport/rdma: rdma.management: pollin received on tcp socket (peer: 10.1.10.21:1014) after handshake is complete [2011-10-05 04:03:39.191342] I [glusterd-volume-ops.c:305:glusterd_handle_cli_stop_volume] 0-glusterd: Received stop vol reqfor volume hosdu [2011-10-05 04:03:39.218057] I [glusterd-utils.c:258:glusterd_lock] 0-glusterd: Cluster lock held by 2040c7b1-4fba-4435-b47a-683ce18e9c42 [2011-10-05 04:03:39.218093] I [glusterd-handler.c:438:glusterd_op_txn_begin] 0-management: Acquired local lock [2011-10-05 04:03:39.294191] I [glusterd-op-sm.c:1611:glusterd_op_ac_send_stage_op] 0-glusterd: Sent op req to 0 peers [2011-10-05 04:03:39.404534] I [glusterd-pmap.c:282:pmap_registry_remove] 0-pmap: removing brick /tmp/brick8 on port 24023 [2011-10-05 04:03:39.543404] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543453] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543509] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543550] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543594] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543635] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543675] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.543714] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.723750] I [glusterd-pmap.c:282:pmap_registry_remove] 0-pmap: removing brick /tmp/brick4 on port 24015 [2011-10-05 04:03:39.723800] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.723843] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.723882] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.723922] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.743246] I [glusterd-pmap.c:282:pmap_registry_remove] 0-pmap: removing brick /tmp/brick3 on port 24013 [2011-10-05 04:03:39.743294] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.743334] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.743374] I [glusterd-utils.c:214:glusterd_is_local_addr] 0-glusterd: 10.1.10.21 is local [2011-10-05 04:03:39.863767] W [socket.c:204:__socket_rwv] 0-socket.management: readv failed (Connection reset by peer) [2011-10-05 04:03:39.863897] W [socket.c:1510:__socket_proto_state_machine] 0-management: reading from socket failed. Error (Transport endpoint is not connected), peer (/tmp/8789d7b89b2809e8991e5ae1fe789aaf.socket) [2011-10-05 04:03:39.976645] E [socket.c:2110:socket_connect] 0-management: connection attempt failed (No such file or directory) I have archived the core file and glusterd log file.
CHANGE: http://review.gluster.com/575 (Earlier we waited for brick disconnect or 5s whichever is smaller, before we) merged in master by Vijay Bellur (vijay)
CHANGE: http://review.gluster.com/626 (Earlier we would wait until brick disconnect or 5s whichever was earlier,) merged in release-3.2 by Vijay Bellur (vijay)