Bug 852584 - Gluster getting crashed randomly
Gluster getting crashed randomly
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rdma (Show other bugs)
2.0
x86_64 Linux
high Severity medium
: ---
: ---
Assigned To: Raghavendra G
Sudhir D
:
Depends On: GLUSTER-3551
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-28 21:31 EDT by Vidya Sakar
Modified: 2013-03-03 21:07 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: GLUSTER-3551
Environment:
Last Closed: 2012-10-11 05:59:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vidya Sakar 2012-08-28 21:31:13 EDT
+++ This bug was initially created as a clone of Bug #765283 +++

I m having a gluster setup with replicate mode. 

[root@jr4-2 glusterd]# gluster volume info 

Volume Name: gluster-fs1
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: jr4-1-ib:/data/gluster/brick-md2
Brick2: jr4-2-ib:/data/gluster/brick-md2


We are facing continuous crash of the server. The crashes in turn in crashing the client connections giving:  Transport endpoint is not connected

The issue will get solved once you restart the glusterd service on the server nodes.

The logs on /var/log/glusterfs/nfs.log are given below: 

[2011-09-14 16:44:56.700384] E [rdma.c:4479:rdma_event_handler] 0-rpc-transport/rdma: gluster-fs1-client-1: pollin received on tcp socket (peer: 172.31.100.2
28:24009) after handshake is complete
[2011-09-14 16:44:56.700616] I [client.c:1605:client_rpc_notify] 0-gluster-fs1-client-1: disconnected
[2011-09-14 16:45:07.202041] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-09-14 16:45:09.413960] E [rdma.c:4479:rdma_event_handler] 0-rpc-transport/rdma: gluster-fs1-client-0: pollin received on tcp socket (peer: 172.31.100.2
27:24009) after handshake is complete
[2011-09-14 16:45:09.414286] I [client.c:1605:client_rpc_notify] 0-gluster-fs1-client-0: disconnected
[2011-09-14 16:45:09.414315] E [afr-common.c:2584:afr_notify] 0-gluster-fs1-replicate-0: All subvolumes are down. Going offline until atleast one of them com
es back up.
[2011-09-14 16:45:10.204430] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-09-14 16:45:13.206879] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-09-14 16:45:16.209385] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)


On /var/log/messages I got a line similar to this:

Sep 14 16:55:02 jr4-1 kernel: [5131784.955627] possible SYN flooding on port 24009. Sending cookies.

Does it have any connection wit the crash. 

Please check it .

--- Additional comment from raghavendra@gluster.com on 2011-09-14 21:07:15 EDT ---

Hi Dheeraj,

Can you please get a gdb backtrace from the coredump of glusterfsd (server) for us?

#gdb -c <core-file> glusterfsd

gdb> thr apply all bt full

Can you also get us the log-files of server which crashed?

regards,
Raghavendra.

--- Additional comment from amarts@redhat.com on 2012-02-27 22:26:23 EST ---

will bump up the priority once we take RDMA tasks.
Comment 2 Amar Tumballi 2012-10-11 05:59:50 EDT
upstream bug closed as insufficient data...

Note You need to log in before you can comment on or make changes to this bug.