I m having a gluster setup with replicate mode. [root@jr4-2 glusterd]# gluster volume info Volume Name: gluster-fs1 Type: Replicate Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: jr4-1-ib:/data/gluster/brick-md2 Brick2: jr4-2-ib:/data/gluster/brick-md2 We are facing continuous crash of the server. The crashes in turn in crashing the client connections giving: Transport endpoint is not connected The issue will get solved once you restart the glusterd service on the server nodes. The logs on /var/log/glusterfs/nfs.log are given below: [2011-09-14 16:44:56.700384] E [rdma.c:4479:rdma_event_handler] 0-rpc-transport/rdma: gluster-fs1-client-1: pollin received on tcp socket (peer: 172.31.100.2 28:24009) after handshake is complete [2011-09-14 16:44:56.700616] I [client.c:1605:client_rpc_notify] 0-gluster-fs1-client-1: disconnected [2011-09-14 16:45:07.202041] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused) [2011-09-14 16:45:09.413960] E [rdma.c:4479:rdma_event_handler] 0-rpc-transport/rdma: gluster-fs1-client-0: pollin received on tcp socket (peer: 172.31.100.2 27:24009) after handshake is complete [2011-09-14 16:45:09.414286] I [client.c:1605:client_rpc_notify] 0-gluster-fs1-client-0: disconnected [2011-09-14 16:45:09.414315] E [afr-common.c:2584:afr_notify] 0-gluster-fs1-replicate-0: All subvolumes are down. Going offline until atleast one of them com es back up. [2011-09-14 16:45:10.204430] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused) [2011-09-14 16:45:13.206879] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused) [2011-09-14 16:45:16.209385] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused) On /var/log/messages I got a line similar to this: Sep 14 16:55:02 jr4-1 kernel: [5131784.955627] possible SYN flooding on port 24009. Sending cookies. Does it have any connection wit the crash. Please check it .
Hi Dheeraj, Can you please get a gdb backtrace from the coredump of glusterfsd (server) for us? #gdb -c <core-file> glusterfsd gdb> thr apply all bt full Can you also get us the log-files of server which crashed? regards, Raghavendra.
will bump up the priority once we take RDMA tasks.
no updates from last one year... needed more info.