839810 – RDMA high cpu usage and poor performance

Bug 839810 - RDMA high cpu usage and poor performance

Summary: RDMA high cpu usage and poor performance

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	ib-verbs
Sub Component:
Version:	3.3.0
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Assignee:	Raghavendra G
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	852370 858479 952693 962431 1134839 1164079 1166140 1166515
TreeView+	depends on / blocked

Reported:	2012-07-12 22:16 UTC by Bryan Whitehead
Modified:	2014-12-14 19:40 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Clones:	852370 (view as bug list)
Environment:
Last Closed:	2014-12-14 19:40:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
output of gluster volume profile testrdma info (5.38 KB, application/octet-stream) 2012-07-12 22:16 UTC, Bryan Whitehead	no flags	Details
View All

Description Bryan Whitehead 2012-07-12 22:16:03 UTC

Created attachment 597910 [details]
output of gluster volume profile testrdma info

Description of problem:
create volume using rdma transport. I'm using Mellanox Technologies MT26428 IB QDR.

using native verbs I can barely get 1/3 the speed of underlying disks. If I use IPoIB, I can get full speed of underlying disks (and very little CPU usage from glusterfsd.

Version-Release number of selected component (if applicable):
3.3.0 also same problems on 3.2.5

How reproducible:
always

Steps to Reproduce:
1. setup rdma only volume
2. mount to a directoy (fuse)
3. dd if=/dev/zero bs=1M of=/path/to/mount/test.out
  
Actual results:
poor performance and high CPU usage.

Expected results:
extremely fast performance with nominal CPU usage.

Additional info:

I will attach IPoIB performance output once I reconfigured IB. (I turned it off to make sure IBoIP was not affecting the native verbs).

Comment 1 Andrei Mikhailovsky 2013-02-05 15:03:10 UTC

I have a very similar setup actually, but I do not experience the performance issues that you've described. 

Same IB cards: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev a0) installed in Ubuntu 12.04 file server. I am not using the drives provided by Mellanox, instead I am using the Ubuntu infiniband PPA and original kernel modules.

I've ran some tests and each connected client can easily do about 700-800mb/s using iozone benchmark and around 1GB/s with 4 concurrent dd threads from /dev/zero. The server load stays reasonably low, however, I don't remember the figures.

Comment 2 pjameson 2013-06-03 13:56:14 UTC

I was just wondering if any progress had been made on this in the past few months. We were testing with ConnectX 3 cards (Mellanox Technologies MT27500 Family [ConnectX-3]), and we were also getting much poorer performance using RDMA. We were mostly testing using DDs on a VM that utilized libgfapi (since the NFS server and fuse plugins seemed to be a bottleneck themselves), and got the following speeds on a replicated volume:

Local:
[root@node2 glusterfs]# time dd if=/dev/zero of=/mnt/raid/local_test.img bs=1M count=200000 oflag=direct conv=fdatasync
200000+0 records in
200000+0 records out
209715200000 bytes (210 GB) copied, 377.337 s, 556 MB/s


TCP/IP, 10Gbit ethernet:

livecd ~ # time dd if=/dev/zero of=/dev/vdb bs=1M count=200000 oflag=direct conv=fdatasync
200000+0 records in
200000+0 records out
209715200000 bytes (210 GB) copied, 405.462 s, 517 MB/s

Infiniband (RDMA volume type):

livecd ~ # time dd if=/dev/zero of=/dev/vdb bs=1M count=200000 oflag=direct conv=fdatasync
200000+0 records in
200000+0 records out
209715200000 bytes (210 GB) copied, 664.39 s, 316 MB/s


IPoIB, Infiniband (TCP volume type):

livecd ~ # dd if=/dev/zero of=/dev/vdb bs=1M count=200000 oflag=direct conv=fdatasync
200000+0 records in
200000+0 records out
209715200000 bytes (210 GB) copied, 408.181 s, 514 MB/s


We also just tried SRP to a ram disk to make sure it wasn't just the transport (using SRPT/SCST on the server, and the SRP module on the client):

[root@node1 ~]# dd if=/dev/zero of=/dev/sde bs=1M count=13836 oflag=direct conv=fdatasync
13836+0 records in
13836+0 records out
14508097536 bytes (15 GB) copied, 8.91368 s, 1.6 GB/s

Comment 3 Niels de Vos 2014-11-27 14:53:44 UTC

The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.

Note You need to log in before you can comment on or make changes to this bug.