Bug 764924 (GLUSTER-3192)

Summary: Instability and High Server Memory Usage When Using RDMA Transport
Product: [Community] GlusterFS Reporter: jpenney
Component: rdmaAssignee: Raghavendra G <rgowdapp>
Status: CLOSED DEFERRED QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: 3.2.1CC: bugs, gluster-bugs, jdarcy, rwheeler, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 849121 (view as bug list) Environment:
Last Closed: 2014-12-14 14:40:32 EST Type: ---
Regression: --- Mount Type: fuse
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Bug Depends On:    
Bug Blocks: 849121, 858448, 952693, 962431    
Attachments:
Description Flags
Log Files none

Description Amar Tumballi 2011-07-19 07:18:50 EDT
Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's support to RDMA became more complete with version 3.2.2, which got released last week. Please check the behavior with new version and do let us know.
Comment 1 jpenney 2011-07-19 07:21:23 EDT
(In reply to comment #1)
> Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's
> support to RDMA became more complete with version 3.2.2, which got released
> last week. Please check the behavior with new version and do let us know.

The 3.2.2 upgrade did not change this behavior.

BTW, 3.2.2 is not a choice in the Bugzilla menu.
Comment 2 Amar Tumballi 2011-07-19 07:49:03 EDT
> 
> The 3.2.2 upgrade did not change this behavior.

Will take a look on this.

> 
> BTW, 3.2.2 is not a choice in the Bugzilla menu.

Added it to the Versions now.
Comment 3 jpenney 2011-07-19 10:14:14 EDT
=====
Symptoms
=====
- Error from test script (included below): "Could not open write: out.node02.15: Invalid argument"
- Client write failure when multiple clients are reading and writing while
 using the RDMA transport.
- High memory usage on one Gluster server. In this case node06. Output is from 'top' command.
 - node06: 14850 root      16   0 23.1g  17g 1956 S 120.9 56.6   8:38.78 glusterfsd
 - node05: 12633 root      16   0  418m 157m 1852 S  0.0  0.5   2:56.02 glusterfsd 
 - node04: 21066 root      15   0  355m 151m 1852 S  0.0  0.6   1:07.71 glusterfsd 
- Temporary work around by using IPoIB instead of RDMA
- May take 10 - 15 minutes for first failure.

=====
Version Information
=====
- CentOS 5.6 kernel 2.6.18-238.9.1.el5
- OFED 1.5.3.1
- Gluster 3.2.1 RPMs 
- Ext3 filesystem

=====
Roles of nodes
=====

node04, node05, node06 - Gluster servers. 
node01, node02, node03 - Clients. Mount node04:/gluster-vol01 on /gluster and run the test script in /gluster/test

=====
Gluster Volume Info.
=====
Volume Name: gluster-vol01
Type: Distribute
Status: Started
Number of Bricks: 3
Transport-type: rdma
Bricks:
Brick1: node04:/gluster-raw-storage
Brick2: node05:/gluster-raw-storage
Brick3: node06:/gluster-raw-storage

=====
Gluster peer status.
=====
Number of Peers: 2

Hostname: node06
Uuid: 2c5f66b3-ddc8-4811-bd45-f12c60a22891
State: Peer in Cluster (Connected)

Hostname: node05
Uuid: 00b8d063-8d74-4ffe-9a44-c50e46eca78c
State: Peer in Cluster (Connected)


=====
Simple test script. Run in /gluster/test. The files read.1-4 are 2 GB files
created with dd if=/dev/zero of=read.4 bs=1024k count=2048
=====
#!/usr/bin/perl
$| = 0;
$/ = undef;

$hostname = $ENV{HOSTNAME};
my $i = 1;
my $x = 1;
while ($i) {
	if ($i >= 5) { $i = 1 }
	print "Read: read.$i\n";
	open(FILE, "read.$i") || die "Could not open read.$i: $!\n";
	my $string = <FILE>;
	close(FILE);
	open(OUT, ">out.$hostname.$x") || die "Could not open write: out.$hostname.$x: $!\n";
	print OUT "This was read.$i\n";
	close(OUT);
	$i++;
	$x++;
}
Comment 4 jpenney 2011-08-25 08:46:29 EDT
Has there been any progress on this report? Has the problem been able to be replicated? 

Our workaround is to use IPoIB and the TCP transport which does work as expected.
Comment 5 Amar Tumballi 2012-02-27 05:35:45 EST
This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.
Comment 6 Niels de Vos 2014-11-27 09:54:38 EST
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.