Bug 1598769 - Brick is not operational when using RDMA transport
Summary: Brick is not operational when using RDMA transport
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: rdma
Version: 4.1
Hardware: Unspecified
OS: Linux
low
low
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-06 12:21 UTC by angelosching
Modified: 2019-07-15 06:18 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-15 06:18:28 UTC
Regression: ---
Mount Type: fuse
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description angelosching 2018-07-06 12:21:25 UTC
Description of problem:
Brick is not operational when using RDMA transport.
Volume transport could not be set to "tcp,rdma"
"gluster volume status" shows N/A for all TCP, RDMA port and Pid when transport is rdma

Version-Release number of selected component (if applicable):
glusterfs-4.1.1-1.el7.x86_64
glusterfs-api-4.1.1-1.el7.x86_64
glusterfs-cli-4.1.1-1.el7.x86_64
glusterfs-client-xlators-4.1.1-1.el7.x86_64
glusterfs-fuse-4.1.1-1.el7.x86_64
glusterfs-libs-4.1.1-1.el7.x86_64
glusterfs-rdma-4.1.1-1.el7.x86_64
glusterfs-server-4.1.1-1.el7.x86_64

How reproducible:
After upgrading from GlusterFS 3.12

Steps to Reproduce:
gluster> volume set lumiere config.transport tcp,rdma
volume set: failed: Commit failed on localhost. Please check the log file for more details.
gluster> volume set lumiere config.transport rdma
volume set: success
gluster> volume start lumiere
volume start: lumiere: success
gluster> volume status lumiere
Status of volume: lumiere
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick lumi124i.vhpc.clustertech.com:/data/g
lusterfs/lumiere/brick0/brick               N/A       N/A        N       N/A  
Brick lumi125i.vhpc.clustertech.com:/data/g
lusterfs/lumiere/brick1/brick               N/A       N/A        N       N/A  
 
Task Status of Volume lumiere
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 74462743-aab9-4f00-8873-812b4534ba23
Status               : completed           
 
gluster> volume stop lumiere
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: lumiere: success
gluster> volume set lumiere config.transport tcp
volume set: success
gluster> volume start lumiere
volume start: lumiere: success
gluster> volume status lumiere
Status of volume: lumiere
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick lumi124i.vhpc.clustertech.com:/data/g
lusterfs/lumiere/brick0/brick               49152     0          Y       6458 
Brick lumi125i.vhpc.clustertech.com:/data/g
lusterfs/lumiere/brick1/brick               49152     0          Y       3919 
 
Task Status of Volume lumiere
------------------------------------------------------------------------------
Task                 : Rebalance           
ID                   : 74462743-aab9-4f00-8873-812b4534ba23
Status               : completed

Actual results:
Volume transport could not be set to tcp,rdma
Brick not listening when configured to use rdma transport

Expected results:
Brick would be operation when using rdma transport
Volume could be configured to use tcp,rdma as transport

Comment 1 angelosching 2018-07-06 12:26:09 UTC
Output of volume info when it's working with tcp transport

Volume Name: lumiere
Type: Distribute
Volume ID: 9575e0f3-affb-4d29-bc1f-12beb7b0fa82
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: lumi124i.vhpc.clustertech.com:/data/glusterfs/lumiere/brick0/brick
Brick2: lumi125i.vhpc.clustertech.com:/data/glusterfs/lumiere/brick1/brick
Options Reconfigured:
cluster.weighted-rebalance: on
nfs.disable: on
client.event-threads: 8
server.event-threads: 8
cluster.lookup-optimize: on
performance.flush-behind: on
performance.write-behind-window-size: 1GB
performance.readdir-ahead: on
performance.parallel-readdir: on
features.cache-invalidation: off
performance.cache-invalidation: on
performance.md-cache-timeout: 600
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
network.inode-lru-limit: 50000
config.transport: tcp

Comment 2 angelosching 2018-07-09 02:31:15 UTC
Also tested on a freshly install CentOS 7.5 minimal host:
$ systemctl disable --now firewalld
$ yum groupinstall "infiniband support"
$ modprobe ib_ipoib
$ nmcli con add con-name ib0 ifname ib0 type infiniband ipv4.method manual ipv4.address 10.1.4.77/24
$ yum -y install centos-release-gluster41
$ yum -y install glusterfs-server glusterfs-rdma
$ systemctl enable --now glusterd
$ gluster

gluster> pool list
UUID					Hostname 	State
51baf4f1-9229-4e2a-8c77-60cf256db33e	localhost	Connected 

gluster> volume create rdma-test 10.1.4.77:/home/brick
volume create: rdma-test: success: please start the volume to access data
gluster> volume info
 
Volume Name: rdma-test
Type: Distribute
Volume ID: ff0bd79c-4935-4f97-b52f-64d6bb3342c1
Status: Created
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.1.4.77:/home/brick
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
gluster> volume set rdma-test config.transport tcp,rdma
volume set: failed: Commit failed on localhost. Please check the log file for more details.
gluster> volume set rdma-test config.transport rdma
volume set: success
gluster> volume start rdma-test
volume start: rdma-test: success
gluster> volume status
Status of volume: rdma-test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.1.4.77:/home/brick                 N/A       N/A        N       N/A  
 
Task Status of Volume rdma-test
------------------------------------------------------------------------------
There are no active volume tasks

Comment 3 Amar Tumballi 2019-06-14 10:08:29 UTC
Thanks for the report, but we are not able to look into the RDMA section actively, and are seriously considering from dropping it from active support.

More on this @ https://lists.gluster.org/pipermail/gluster-devel/2018-July/054990.html


> ‘RDMA’ transport support:
> 
> Gluster started supporting RDMA while ib-verbs was still new, and very high-end infra around that time were using Infiniband. Engineers did work
> with Mellanox, and got the technology into GlusterFS for better data migration, data copy. While current day kernels support very good speed with
> IPoIB module itself, and there are no more bandwidth for experts in these area to maintain the feature, we recommend migrating over to TCP (IP
> based) network for your volume.
> 
> If you are successfully using RDMA transport, do get in touch with us to prioritize the migration plan for your volume. Plan is to work on this
> after the release, so by version 6.0, we will have a cleaner transport code, which just needs to support one type.

Comment 4 Amar Tumballi 2019-07-15 06:18:28 UTC
RDMA will not be supported from glusterfs-8.0, and hence marking this bug as WONTFIX/EOL.

(ref: https://review.gluster.org/23033)


Note You need to log in before you can comment on or make changes to this bug.