764924 – (GLUSTER-3192) Instability and High Server Memory Usage When Using RDMA Transport

Bug 764924 (GLUSTER-3192) - Instability and High Server Memory Usage When Using RDMA Transport

Summary: Instability and High Server Memory Usage When Using RDMA Transport

Keywords:
Status:	CLOSED DEFERRED
Alias:	GLUSTER-3192
Product:	GlusterFS
Classification:	Community
Component:	rdma
Sub Component:
Version:	3.2.1
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	849121 858448 952693 962431
TreeView+	depends on / blocked

Reported:	2011-07-19 14:14 UTC by jpenney
Modified:	2014-12-14 19:40 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Clones:	849121 (view as bug list)
Environment:
Last Closed:	2014-12-14 19:40:32 UTC
Regression:	---
Mount Type:	fuse
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Log Files (66.42 KB, text/plain) 2011-07-19 11:14 UTC, jpenney	no flags	Details
View All

Description Amar Tumballi 2011-07-19 11:18:50 UTC

Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's support to RDMA became more complete with version 3.2.2, which got released last week. Please check the behavior with new version and do let us know.

Comment 1 jpenney 2011-07-19 11:21:23 UTC

(In reply to comment #1)
> Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's
> support to RDMA became more complete with version 3.2.2, which got released
> last week. Please check the behavior with new version and do let us know.

The 3.2.2 upgrade did not change this behavior.

BTW, 3.2.2 is not a choice in the Bugzilla menu.

Comment 2 Amar Tumballi 2011-07-19 11:49:03 UTC

> 
> The 3.2.2 upgrade did not change this behavior.

Will take a look on this.

> 
> BTW, 3.2.2 is not a choice in the Bugzilla menu.

Added it to the Versions now.

Comment 3 jpenney 2011-07-19 14:14:14 UTC

=====
Symptoms
=====
- Error from test script (included below): "Could not open write: out.node02.15: Invalid argument"
- Client write failure when multiple clients are reading and writing while
 using the RDMA transport.
- High memory usage on one Gluster server. In this case node06. Output is from 'top' command.
 - node06: 14850 root      16   0 23.1g  17g 1956 S 120.9 56.6   8:38.78 glusterfsd
 - node05: 12633 root      16   0  418m 157m 1852 S  0.0  0.5   2:56.02 glusterfsd 
 - node04: 21066 root      15   0  355m 151m 1852 S  0.0  0.6   1:07.71 glusterfsd 
- Temporary work around by using IPoIB instead of RDMA
- May take 10 - 15 minutes for first failure.

=====
Version Information
=====
- CentOS 5.6 kernel 2.6.18-238.9.1.el5
- OFED 1.5.3.1
- Gluster 3.2.1 RPMs 
- Ext3 filesystem

=====
Roles of nodes
=====

node04, node05, node06 - Gluster servers. 
node01, node02, node03 - Clients. Mount node04:/gluster-vol01 on /gluster and run the test script in /gluster/test

=====
Gluster Volume Info.
=====
Volume Name: gluster-vol01
Type: Distribute
Status: Started
Number of Bricks: 3
Transport-type: rdma
Bricks:
Brick1: node04:/gluster-raw-storage
Brick2: node05:/gluster-raw-storage
Brick3: node06:/gluster-raw-storage

=====
Gluster peer status.
=====
Number of Peers: 2

Hostname: node06
Uuid: 2c5f66b3-ddc8-4811-bd45-f12c60a22891
State: Peer in Cluster (Connected)

Hostname: node05
Uuid: 00b8d063-8d74-4ffe-9a44-c50e46eca78c
State: Peer in Cluster (Connected)


=====
Simple test script. Run in /gluster/test. The files read.1-4 are 2 GB files
created with dd if=/dev/zero of=read.4 bs=1024k count=2048
=====
#!/usr/bin/perl
$| = 0;
$/ = undef;

$hostname = $ENV{HOSTNAME};
my $i = 1;
my $x = 1;
while ($i) {
	if ($i >= 5) { $i = 1 }
	print "Read: read.$i\n";
	open(FILE, "read.$i") || die "Could not open read.$i: $!\n";
	my $string = <FILE>;
	close(FILE);
	open(OUT, ">out.$hostname.$x") || die "Could not open write: out.$hostname.$x: $!\n";
	print OUT "This was read.$i\n";
	close(OUT);
	$i++;
	$x++;
}

Comment 4 jpenney 2011-08-25 12:46:29 UTC

Has there been any progress on this report? Has the problem been able to be replicated? 

Our workaround is to use IPoIB and the TCP transport which does work as expected.

Comment 5 Amar Tumballi 2012-02-27 10:35:45 UTC

This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.

Comment 6 Niels de Vos 2014-11-27 14:54:38 UTC

The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.