Description of problem: Brick is not operational when using RDMA transport. Volume transport could not be set to "tcp,rdma" "gluster volume status" shows N/A for all TCP, RDMA port and Pid when transport is rdma Version-Release number of selected component (if applicable): glusterfs-4.1.1-1.el7.x86_64 glusterfs-api-4.1.1-1.el7.x86_64 glusterfs-cli-4.1.1-1.el7.x86_64 glusterfs-client-xlators-4.1.1-1.el7.x86_64 glusterfs-fuse-4.1.1-1.el7.x86_64 glusterfs-libs-4.1.1-1.el7.x86_64 glusterfs-rdma-4.1.1-1.el7.x86_64 glusterfs-server-4.1.1-1.el7.x86_64 How reproducible: After upgrading from GlusterFS 3.12 Steps to Reproduce: gluster> volume set lumiere config.transport tcp,rdma volume set: failed: Commit failed on localhost. Please check the log file for more details. gluster> volume set lumiere config.transport rdma volume set: success gluster> volume start lumiere volume start: lumiere: success gluster> volume status lumiere Status of volume: lumiere Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick lumi124i.vhpc.clustertech.com:/data/g lusterfs/lumiere/brick0/brick N/A N/A N N/A Brick lumi125i.vhpc.clustertech.com:/data/g lusterfs/lumiere/brick1/brick N/A N/A N N/A Task Status of Volume lumiere ------------------------------------------------------------------------------ Task : Rebalance ID : 74462743-aab9-4f00-8873-812b4534ba23 Status : completed gluster> volume stop lumiere Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: lumiere: success gluster> volume set lumiere config.transport tcp volume set: success gluster> volume start lumiere volume start: lumiere: success gluster> volume status lumiere Status of volume: lumiere Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick lumi124i.vhpc.clustertech.com:/data/g lusterfs/lumiere/brick0/brick 49152 0 Y 6458 Brick lumi125i.vhpc.clustertech.com:/data/g lusterfs/lumiere/brick1/brick 49152 0 Y 3919 Task Status of Volume lumiere ------------------------------------------------------------------------------ Task : Rebalance ID : 74462743-aab9-4f00-8873-812b4534ba23 Status : completed Actual results: Volume transport could not be set to tcp,rdma Brick not listening when configured to use rdma transport Expected results: Brick would be operation when using rdma transport Volume could be configured to use tcp,rdma as transport
Output of volume info when it's working with tcp transport Volume Name: lumiere Type: Distribute Volume ID: 9575e0f3-affb-4d29-bc1f-12beb7b0fa82 Status: Started Snapshot Count: 0 Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: lumi124i.vhpc.clustertech.com:/data/glusterfs/lumiere/brick0/brick Brick2: lumi125i.vhpc.clustertech.com:/data/glusterfs/lumiere/brick1/brick Options Reconfigured: cluster.weighted-rebalance: on nfs.disable: on client.event-threads: 8 server.event-threads: 8 cluster.lookup-optimize: on performance.flush-behind: on performance.write-behind-window-size: 1GB performance.readdir-ahead: on performance.parallel-readdir: on features.cache-invalidation: off performance.cache-invalidation: on performance.md-cache-timeout: 600 features.cache-invalidation-timeout: 600 performance.stat-prefetch: on network.inode-lru-limit: 50000 config.transport: tcp
Also tested on a freshly install CentOS 7.5 minimal host: $ systemctl disable --now firewalld $ yum groupinstall "infiniband support" $ modprobe ib_ipoib $ nmcli con add con-name ib0 ifname ib0 type infiniband ipv4.method manual ipv4.address 10.1.4.77/24 $ yum -y install centos-release-gluster41 $ yum -y install glusterfs-server glusterfs-rdma $ systemctl enable --now glusterd $ gluster gluster> pool list UUID Hostname State 51baf4f1-9229-4e2a-8c77-60cf256db33e localhost Connected gluster> volume create rdma-test 10.1.4.77:/home/brick volume create: rdma-test: success: please start the volume to access data gluster> volume info Volume Name: rdma-test Type: Distribute Volume ID: ff0bd79c-4935-4f97-b52f-64d6bb3342c1 Status: Created Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.1.4.77:/home/brick Options Reconfigured: transport.address-family: inet nfs.disable: on gluster> volume set rdma-test config.transport tcp,rdma volume set: failed: Commit failed on localhost. Please check the log file for more details. gluster> volume set rdma-test config.transport rdma volume set: success gluster> volume start rdma-test volume start: rdma-test: success gluster> volume status Status of volume: rdma-test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.1.4.77:/home/brick N/A N/A N N/A Task Status of Volume rdma-test ------------------------------------------------------------------------------ There are no active volume tasks
Thanks for the report, but we are not able to look into the RDMA section actively, and are seriously considering from dropping it from active support. More on this @ https://lists.gluster.org/pipermail/gluster-devel/2018-July/054990.html > ‘RDMA’ transport support: > > Gluster started supporting RDMA while ib-verbs was still new, and very high-end infra around that time were using Infiniband. Engineers did work > with Mellanox, and got the technology into GlusterFS for better data migration, data copy. While current day kernels support very good speed with > IPoIB module itself, and there are no more bandwidth for experts in these area to maintain the feature, we recommend migrating over to TCP (IP > based) network for your volume. > > If you are successfully using RDMA transport, do get in touch with us to prioritize the migration plan for your volume. Plan is to work on this > after the release, so by version 6.0, we will have a cleaner transport code, which just needs to support one type.
RDMA will not be supported from glusterfs-8.0, and hence marking this bug as WONTFIX/EOL. (ref: https://review.gluster.org/23033)