Description of problem: Trying to mount any of the first four GlusterFS volumes (in a five RDMA volume setup) is failing over NFS, even though the volumes are definitely mountable using the native client. # mount gluster1-2:/test1 /foo1 mount.nfs: mounting gluster1-2:/test1 failed, reason given by server: No such file or directory # mount gluster1-2:/test2 /foo2 mount.nfs: mounting gluster1-2:/test2 failed, reason given by server: No such file or directory # mount gluster1-2:/test3 /foo3 mount.nfs: mounting gluster1-2:/test3 failed, reason given by server: No such file or directory # mount gluster1-2:/test4 /foo4 mount.nfs: mounting gluster1-2:/test4 failed, reason given by server: No such file or directory # mount gluster1-2:/test5 /foo5 # showmount -e gluster1-2 Export list for gluster1-2: /test5 * # df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_jcpc-lv_root 50G 1.8G 45G 4% / tmpfs 3.7G 0 3.7G 0% /dev/shm /dev/sda1 485M 53M 407M 12% /boot /dev/mapper/vg_jcpc-lv_home 53G 219M 51G 1% /home gluster1-2:/test5 176G 65M 176G 1% /foo5 Interestingly, only the last created volume is showing up for "showmount -e", which is probably related. According to gluster volume info, the NFS Servers are online for all of the volumes: # gluster volume status Status of volume: test4 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1-2:/export/brick1/test4 49163 Y 1895 Brick gluster2-2:/export/brick1/test4 49160 Y 1821 Brick gluster1-2:/export/brick2/test4 49164 Y 1899 Brick gluster2-2:/export/brick2/test4 49161 Y 1827 NFS Server on localhost 2049 Y 1960 Self-heal Daemon on localhost N/A Y 1964 NFS Server on gluster2-2 2049 Y 1855 Self-heal Daemon on gluster2-2 N/A Y 1861 There are no active volume tasks Status of volume: test3 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1-2:/export/brick2/test3 49162 Y 1909 Brick gluster2-2:/export/brick2/test3 49159 Y 1837 NFS Server on localhost 2049 Y 1960 Self-heal Daemon on localhost N/A Y 1964 NFS Server on gluster2-2 2049 Y 1855 Self-heal Daemon on gluster2-2 N/A Y 1861 There are no active volume tasks Status of volume: test1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1-2:/export/brick1/test1 49160 Y 1913 NFS Server on localhost 2049 Y 1960 NFS Server on gluster2-2 2049 Y 1855 There are no active volume tasks Status of volume: test5 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1-2:/export/brick1/test5 49165 Y 1922 Brick gluster2-2:/export/brick1/test5 49162 Y 1850 NFS Server on localhost 2049 Y 1960 NFS Server on gluster2-2 2049 Y 1855 There are no active volume tasks Status of volume: test2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1-2:/export/brick2/test2 49161 Y 1927 Brick gluster2-2:/export/brick2/test2 49158 Y 1841 NFS Server on localhost 2049 Y 1960 NFS Server on gluster2-2 2049 Y 1855 There are no active volume tasks # gluster volume info Volume Name: test1 Type: Distribute Volume ID: 63017415-d946-473e-8aa4-8746e5265f9c Status: Started Number of Bricks: 1 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test1 Volume Name: test4 Type: Distributed-Replicate Volume ID: 384780ee-306c-419e-a6d6-d58abbd24a58 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test4 Brick2: gluster2-2:/export/brick1/test4 Brick3: gluster1-2:/export/brick2/test4 Brick4: gluster2-2:/export/brick2/test4 Volume Name: test3 Type: Replicate Volume ID: a3b5e22c-28c9-4963-84fd-c192f4b9261b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick2/test3 Brick2: gluster2-2:/export/brick2/test3 Volume Name: test2 Type: Distribute Volume ID: 694f1cbf-e1e5-42e6-9f07-605d409ff95f Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick2/test2 Brick2: gluster2-2:/export/brick2/test2 Volume Name: test5 Type: Stripe Volume ID: e6e78330-3bc8-445c-b500-93fb15ebdf6d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test5 Brick2: gluster2-2:/export/brick1/test5 Version-Release number of selected component (if applicable): glusterfs-3.4.0-0.6.beta3.el6.x86_64 glusterfs-api-3.4.0-0.6.beta3.el6.x86_64 glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64 glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64 glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64 glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64 glusterfs-server-3.4.0-0.6.beta3.el6.x86_64 How reproducible: Every time, even after rebooting all servers in the environment. Steps to Reproduce: 1. With a 3 node setup (2 gluster nodes, one "client" node for mounting), create all of the volumes as per the Gluster 3.4.0 beta3 RDMA test day instructions. Add "transport rdma" to every volume creation command though, so it's using RDMA. http://www.gluster.org/community/documentation/index.php/3.4.0_Beta_1_Tests 2. Attempt to mount all of the volumes as per Test 4b (NFS mounts). The problem occurs here. Additional info: Attaching sosreports for all 3 nodes, and tarball of /var/lib/glusterd/ for both storage nodes. The sosreports include the Gluster /var/log/glusterfs/ directory.
Forgot to mention, this is all on RHEL 6.4 x86_64.
Created attachment 765404 [details] /var/lib/glusterd/ for first gluster storage node
Created attachment 765405 [details] /var/lib/glusterd/ for second gluster storage node
Created attachment 765406 [details] Sosreport from client node Generated using "sosreport -e infiniband"
Created attachment 765407 [details] Sosreport from first storage node Generated using "sosreport -e infiniband"
Created attachment 765408 [details] Sosreport from second storage node Generated using "sosreport -e infiniband"
I set up two servers and two clients with IB. (RHEL6.5, Mellanox drivers), community glusterfs-3.5.1. I provisioned six transport rdma dht volumes on the servers. I'm able mount all six volumes using and nfs (-o transport=tcp) and native, and write to the volumes on all mount points.
Justin, is this still an issue for you? It seems that Kaleb could not reproduce the issue. Can this be CLOSED/WORKSFORME or similar?
Nah, close it. It was a long time ago, and the code has changed a lot since. If it happens again when I later test this (after setting up my RDMA stuff again), then I'll make a new BZ.