Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's support to RDMA became more complete with version 3.2.2, which got released last week. Please check the behavior with new version and do let us know.
(In reply to comment #1) > Hi, I see that the version here is '3.2.1', Is upgrading an option? GlusterFS's > support to RDMA became more complete with version 3.2.2, which got released > last week. Please check the behavior with new version and do let us know. The 3.2.2 upgrade did not change this behavior. BTW, 3.2.2 is not a choice in the Bugzilla menu.
> > The 3.2.2 upgrade did not change this behavior. Will take a look on this. > > BTW, 3.2.2 is not a choice in the Bugzilla menu. Added it to the Versions now.
===== Symptoms ===== - Error from test script (included below): "Could not open write: out.node02.15: Invalid argument" - Client write failure when multiple clients are reading and writing while using the RDMA transport. - High memory usage on one Gluster server. In this case node06. Output is from 'top' command. - node06: 14850 root 16 0 23.1g 17g 1956 S 120.9 56.6 8:38.78 glusterfsd - node05: 12633 root 16 0 418m 157m 1852 S 0.0 0.5 2:56.02 glusterfsd - node04: 21066 root 15 0 355m 151m 1852 S 0.0 0.6 1:07.71 glusterfsd - Temporary work around by using IPoIB instead of RDMA - May take 10 - 15 minutes for first failure. ===== Version Information ===== - CentOS 5.6 kernel 2.6.18-238.9.1.el5 - OFED 1.5.3.1 - Gluster 3.2.1 RPMs - Ext3 filesystem ===== Roles of nodes ===== node04, node05, node06 - Gluster servers. node01, node02, node03 - Clients. Mount node04:/gluster-vol01 on /gluster and run the test script in /gluster/test ===== Gluster Volume Info. ===== Volume Name: gluster-vol01 Type: Distribute Status: Started Number of Bricks: 3 Transport-type: rdma Bricks: Brick1: node04:/gluster-raw-storage Brick2: node05:/gluster-raw-storage Brick3: node06:/gluster-raw-storage ===== Gluster peer status. ===== Number of Peers: 2 Hostname: node06 Uuid: 2c5f66b3-ddc8-4811-bd45-f12c60a22891 State: Peer in Cluster (Connected) Hostname: node05 Uuid: 00b8d063-8d74-4ffe-9a44-c50e46eca78c State: Peer in Cluster (Connected) ===== Simple test script. Run in /gluster/test. The files read.1-4 are 2 GB files created with dd if=/dev/zero of=read.4 bs=1024k count=2048 ===== #!/usr/bin/perl $| = 0; $/ = undef; $hostname = $ENV{HOSTNAME}; my $i = 1; my $x = 1; while ($i) { if ($i >= 5) { $i = 1 } print "Read: read.$i\n"; open(FILE, "read.$i") || die "Could not open read.$i: $!\n"; my $string = <FILE>; close(FILE); open(OUT, ">out.$hostname.$x") || die "Could not open write: out.$hostname.$x: $!\n"; print OUT "This was read.$i\n"; close(OUT); $i++; $x++; }
Has there been any progress on this report? Has the problem been able to be replicated? Our workaround is to use IPoIB and the TCP transport which does work as expected.
This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug. If there has been no update before 9 December 2014, this bug will get automatocally closed.