849121 – Instability and High Server Memory Usage When Using RDMA Transport

Bug 849121 - Instability and High Server Memory Usage When Using RDMA Transport

Summary: Instability and High Server Memory Usage When Using RDMA Transport

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	rdma
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	GLUSTER-3192
Blocks:	858448
TreeView+	depends on / blocked

Reported:	2012-08-17 11:39 UTC by Vidya Sakar
Modified:	2015-02-13 10:12 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	GLUSTER-3192
Clones:	858448 (view as bug list)
Environment:
Last Closed:	2015-02-13 10:12:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Vidya Sakar 2012-08-17 11:39:24 UTC

+++ This bug was initially created as a clone of Bug #764924 +++

Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's support to RDMA became more complete with version 3.2.2, which got released last week. Please check the behavior with new version and do let us know.

--- Additional comment from jpenney on 2011-07-19 07:21:23 EDT ---

(In reply to comment #1)
> Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's
> support to RDMA became more complete with version 3.2.2, which got released
> last week. Please check the behavior with new version and do let us know.

The 3.2.2 upgrade did not change this behavior.

BTW, 3.2.2 is not a choice in the Bugzilla menu.

--- Additional comment from amarts on 2011-07-19 07:49:03 EDT ---


> 
> The 3.2.2 upgrade did not change this behavior.

Will take a look on this.

> 
> BTW, 3.2.2 is not a choice in the Bugzilla menu.

Added it to the Versions now.

--- Additional comment from jpenney on 2011-07-19 10:14:14 EDT ---

=====
Symptoms
=====
- Error from test script (included below): "Could not open write: out.node02.15: Invalid argument"
- Client write failure when multiple clients are reading and writing while
 using the RDMA transport.
- High memory usage on one Gluster server. In this case node06. Output is from 'top' command.
 - node06: 14850 root      16   0 23.1g  17g 1956 S 120.9 56.6   8:38.78 glusterfsd
 - node05: 12633 root      16   0  418m 157m 1852 S  0.0  0.5   2:56.02 glusterfsd 
 - node04: 21066 root      15   0  355m 151m 1852 S  0.0  0.6   1:07.71 glusterfsd 
- Temporary work around by using IPoIB instead of RDMA
- May take 10 - 15 minutes for first failure.

=====
Version Information
=====
- CentOS 5.6 kernel 2.6.18-238.9.1.el5
- OFED 1.5.3.1
- Gluster 3.2.1 RPMs 
- Ext3 filesystem

=====
Roles of nodes
=====

node04, node05, node06 - Gluster servers. 
node01, node02, node03 - Clients. Mount node04:/gluster-vol01 on /gluster and run the test script in /gluster/test

=====
Gluster Volume Info.
=====
Volume Name: gluster-vol01
Type: Distribute
Status: Started
Number of Bricks: 3
Transport-type: rdma
Bricks:
Brick1: node04:/gluster-raw-storage
Brick2: node05:/gluster-raw-storage
Brick3: node06:/gluster-raw-storage

=====
Gluster peer status.
=====
Number of Peers: 2

Hostname: node06
Uuid: 2c5f66b3-ddc8-4811-bd45-f12c60a22891
State: Peer in Cluster (Connected)

Hostname: node05
Uuid: 00b8d063-8d74-4ffe-9a44-c50e46eca78c
State: Peer in Cluster (Connected)


=====
Simple test script. Run in /gluster/test. The files read.1-4 are 2 GB files
created with dd if=/dev/zero of=read.4 bs=1024k count=2048
=====
#!/usr/bin/perl
$| = 0;
$/ = undef;

$hostname = $ENV{HOSTNAME};
my $i = 1;
my $x = 1;
while ($i) {
	if ($i >= 5) { $i = 1 }
	print "Read: read.$i\n";
	open(FILE, "read.$i") || die "Could not open read.$i: $!\n";
	my $string = <FILE>;
	close(FILE);
	open(OUT, ">out.$hostname.$x") || die "Could not open write: out.$hostname.$x: $!\n";
	print OUT "This was read.$i\n";
	close(OUT);
	$i++;
	$x++;
}

--- Additional comment from jpenney on 2011-08-25 08:46:29 EDT ---

Has there been any progress on this report? Has the problem been able to be replicated? 

Our workaround is to use IPoIB and the TCP transport which does work as expected.

--- Additional comment from amarts on 2012-02-27 05:35:45 EST ---

This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.

Note You need to log in before you can comment on or make changes to this bug.