Hi, I'm using a three node cluster infrastructure. Every node has a two-port Infiniband HCA. All ports are connected to a single unmanaged IB switch. I've managed to bond the two ports in the tcp-layer to double the throughput and for failover reasons for the tcp-layer. As far as I know, openMPI which uses ib-verbs too is able to use the two ports of the HCA in the same way like tcp-bonding. It automaticaly recognizes the two ports in the same IB fabric and will use them bonded to each other in the rdma stack. So, it will double the mpi throughput natively as soon as two ports are available. A nice enhancement for glusterfs would be a option which allows the user to choose a rdma-bonded two-port HCA. At the moment the user has the choice to switch either between the physical port one or two of the HCA but there is no way to use both ports sumultaniosly. It would be nice (like in OpenMPI) if glusterfs recognizes all active ports, tests if they are in the same IB-fabric and then use both ports under different options of load-balancing (round robin etc) or failover methods. Because glusterfs also uses ib-verbs, now called rdma transport, I thought it could be possible to use two ports bonded in the rdma-stack. regards, Oliver
Hi Oliver, We can work on this feature only after 3.3.x release cycle.
This is the priority for immediate future (before 3.3.0 GA release). Will bump the priority up once we take RDMA related tasks.
Has this feature been added in 3.3.0?
why do you need bonding? If you have 40-Gbps IB infrastructure (QDR) or 56-Gbps IB infrastructure, surely you don't need it. Also, there are other ways to employ multiple network ports besides bonding - if you are not using native glusterfs, you could use one port for non-Gluster traffic and one port for Gluster traffic on a separate network. I heard Gluster is now integrated with librdmacm, does librdmacm support bonding? For that matter, does IPOIB support bonding? If you have 10-Gbps Ethernet ports, you can use "balance-alb" NIC bonding with jumbo frames with size MTU=9000 and this will get you up to 1.5 GB/s network throughput for a single Gluster client (I know, should be 2 GB/s, but it's a lot better than with 1 port). This will get you considerable performance gain without using RDMA at all.
since there have been no further comments and original post is 3 years old, am closing.