Bug 765226 (GLUSTER-3494)

Summary: Gluster Disconnection
Product: [Community] GlusterFS Reporter: Ramana Kumar Kasaraneni <crlindiadc>
Component: rdmaAssignee: Raghavendra G <raghavendra>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 3.2.1CC: amarts, gluster-bugs, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
URL: Problem using Gluster Quotas
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ramana Kumar Kasaraneni 2011-08-30 08:23:47 UTC
Hi, 

 

    We have glusterFS setup with version 3.1.4 installed and configured in replica mode.

 

The setup was configured in the following manner. The gluster volume info looks like 

 

# gluster volume info

Volume Name: gluster-fs1
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: rdma
Bricks:
Brick1: jr4-1-ib:/data/gluster/brick-md2
Brick2: jr4-2-ib:/data/gluster/brick-md2


We are having strange problem with gluster disconnection. We see the following error from the servers side.

 

[2011-08-30 07:41:02.868432] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:05.870965] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:05.872927] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:08.875478] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:08.877490] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:11.880046] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:11.882048] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:14.884589] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:14.886616] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:17.889162] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:17.891141] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:20.893646] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:20.895656] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 07:41:23.898240] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 07:41:23.900252] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)


brick logs shows the following error messgaes

 

[2011-08-29 18:16:00.308719] E [rpcsvc.c:1554:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x13460x, Program: GlusterFS-3.1.0, Pro
gVers: 310, Proc: 34) to rpc-transport (rdma.gluster-fs1-server)
[2011-08-29 18:16:00.308737] E [server.c:137:server_submit_reply] 0-: Reply submission failed
[2011-08-29 18:16:00.308751] I [server-helpers.c:756:server_connection_destroy] 0-gluster-fs1-server: destroyed connection of n1710-1749-2011/08/26-15:43:
13:560803-gluster-fs1-client-0
[2011-08-29 18:16:00.308773] E [rpcsvc.c:1554:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x15372x, Program: GlusterFS-3.1.0, Pro
gVers: 310, Proc: 34) to rpc-transport (rdma.gluster-fs1-server)
[2011-08-29 18:16:00.308800] E [server.c:137:server_submit_reply] 0-: Reply submission failed
[2011-08-29 18:16:00.308819] I [server-helpers.c:756:server_connection_destroy] 0-gluster-fs1-server: destroyed connection of n1711-1788-2011/08/26-15:43:
38:361566-gluster-fs1-client-0
[2011-08-29 18:16:00.309051] E [rpcsvc.c:1554:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1463x, Program: GlusterFS-3.1.0, Prog
Vers: 310, Proc: 34) to rpc-transport (rdma.gluster-fs1-server)
[2011-08-29 18:16:00.309070] E [server.c:137:server_submit_reply] 0-: Reply submission failed
[2011-08-29 18:16:00.309143] I [server-helpers.c:756:server_connection_destroy] 0-gluster-fs1-server: destroyed connection of n1722-1765-2011/08/26-21:13:
31:834472-gluster-fs1-client-0
[2011-08-29 18:16:00.310517] E [rpcsvc.c:1554:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1599091x, Program: GlusterFS-3.1.0, P
rogVers: 310, Proc: 34) to rpc-transport (rdma.gluster-fs1-server)
[2011-08-29 18:16:00.310539] E [rpc-transport.c:976:rpc_transport_ref] 0-rpc_transport: invalid argument: this
[2011-08-29 18:16:00.310543] E [server.c:137:server_submit_reply] 0-: Reply submission failed
[2011-08-29 18:16:00.310564] E [rpc-transport.c:996:rpc_transport_unref] 0-rpc_transport: invalid argument: this
[2011-08-29 18:16:00.310607] E [rpc-transport.c:976:rpc_transport_ref] 0-rpc_transport: invalid argument: this


 

and on the client side we see the following error messages

 

[2011-08-30 10:15:52.149071] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 10:15:53.152158] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 10:15:54.888495] W [fuse-bridge.c:413:fuse_attr_cbk] 0-glusterfs-fuse: 9817037: LOOKUP() / => -1 (Transport endpoint is not connected)
[2011-08-30 10:15:55.155257] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 10:15:56.158282] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 10:15:58.161525] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)
[2011-08-30 10:15:59.164618] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-1: tcp connect to 172.31.100.228:24009 failed (Connection refused)
[2011-08-30 10:16:01.167819] E [rdma.c:4428:tcp_connect_finish] 0-gluster-fs1-client-0: tcp connect to 172.31.100.227:24009 failed (Connection refused)


 

Currently i only option we see is to restart the gluster services on the gluster brick nodes, which allows to automatically connect the glusterfs.

Could you please suggest us what would be the reason for the same.

Comment 1 Ramana Kumar Kasaraneni 2011-09-12 04:41:58 UTC
Hi, 

     We are having a strange problem, while using the quotas in GlusterFS. We are using Gluster 3.2. We have configured gluster on Storage nodes. 

# rpm -qa |grep glust
glusterfs-fuse-3.2.0-1
glusterfs-core-3.2.0-1
glusterfs-rdma-3.2.0-1

The glusterFS configuration looks as follows:

[root@gs01 ~]# gluster volume info

Volume Name: testfs01
Type: Distribute
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: gs02-ib:/data/gluster/brick-2
Brick2: gs01-ib:/data/gluster/brick-1
Options Reconfigured:
features.quota: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.limit-usage: /testgluster/dheeraj:5GB

We have configured the gluster quota, at the directory level. It does not list the usage on quota enabled directory. 

# gluster volume quota testfs01 list
        path              limit_set          size
---------------------------------------------------------------------------
/testgluster/dheeraj        5GB

We tried copying more data on to the specified folder, more than 5 GB, and we are able to write more data on the file system. 

Request you to provide some inputs on the same.

On the client nodes we had mounted the gluster file system

[root@n0 testgluster]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/cciss/c0d0p5      39G   12G   25G  32% /
tmpfs                 7.9G  380K  7.9G   1% /dev/shm
/dev/cciss/c0d0p1     281M   44M  223M  17% /boot
/dev/cciss/c0d0p2      21G   14G  6.6G  67% /var
10.1.32.33@o2ib0:/lfs1
                      135G  112G   17G  88% /lustre
10.1.60.1@o2ib0:/lfs01
                       68G   22G   43G  35% /testlustre
/etc/glusterfs/testfs01.vol
                      135G   28G  101G  22% /testgluster


We want to implement quota on the gluster file system. So request your help on the same. 

Regards,
Ramana Kasaraneni.

Comment 2 Amar Tumballi 2011-09-28 04:03:45 UTC
Hi Ramana,

Are you seeing the issues still with RDMA transport? Please let us know your latest status on this issue, so we can take this forward.