Description of problem: Attempting to mount a Distributed-Replicate volume using RDMA transport is hanging, using upstream GlusterFS 3.4.0 Beta3 on RHEL 6.4. (this is a late entry for the 3.4.0 beta 3 RDMA "Test Day") Tried several times (using Control-C to cancel after a few minutes each time). # mount -t glusterfs gluster1-2:test4 /foo4 ^C # mount -t glusterfs gluster1-2:test4 /foo4 ^C # mount -t glusterfs gluster1-2:test4 /foo4 ^C # ps -ef|grep -i gluster root 1808 1 0 05:21 ? 00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid root 1874 1 0 05:21 ? 00:00:00 /usr/sbin/glusterfs --volfile-id=test4 --volfile-server=gluster1-2 /foo4 root 1882 1 0 05:21 pts/0 00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw root 1888 1 0 05:22 pts/0 00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw root 1904 1 0 05:22 pts/0 00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw # The Gluster log for the mount point (attached) seems to say it is having issues connecting to one of the subvolumes, although none of the other volumes on the same servers are having issues. It seems to only be a problem with the Distributed-Replica volume, and it's using the same backend physical storage as the other volumes (just different directories). # gluster volume info Volume Name: test4 Type: Distributed-Replicate Volume ID: 384780ee-306c-419e-a6d6-d58abbd24a58 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test4 Brick2: gluster2-2:/export/brick1/test4 Brick3: gluster1-2:/export/brick2/test4 Brick4: gluster2-2:/export/brick2/test4 Volume Name: test3 Type: Replicate Volume ID: a3b5e22c-28c9-4963-84fd-c192f4b9261b Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick2/test3 Brick2: gluster2-2:/export/brick2/test3 Volume Name: test1 Type: Distribute Volume ID: 63017415-d946-473e-8aa4-8746e5265f9c Status: Started Number of Bricks: 1 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test1 Volume Name: test5 Type: Stripe Volume ID: e6e78330-3bc8-445c-b500-93fb15ebdf6d Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick1/test5 Brick2: gluster2-2:/export/brick1/test5 Volume Name: test2 Type: Distribute Volume ID: 694f1cbf-e1e5-42e6-9f07-605d409ff95f Status: Started Number of Bricks: 2 Transport-type: rdma Bricks: Brick1: gluster1-2:/export/brick2/test2 Brick2: gluster2-2:/export/brick2/test2 Version-Release number of selected component (if applicable): glusterfs-3.4.0-0.6.beta3.el6.x86_64 glusterfs-api-3.4.0-0.6.beta3.el6.x86_64 glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64 glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64 glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64 glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64 glusterfs-server-3.4.0-0.6.beta3.el6.x86_64 How reproducible: Every time, even after several reboots of every server in the environment. :( Steps to Reproduce: 1. With a 3 node setup (2 gluster nodes, one "client" node for mounting), create all of the volumes as per the Gluster 3.4.0 beta3 RDMA test day instructions. Add "transport rdma" to every volume creation command though, so it's using RDMA. http://www.gluster.org/community/documentation/index.php/3.4.0_Beta_1_Tests 2. Attempt to mount all of the volumes as per Test 4a (native client mounting). The problem occurs here. Mounting will hang on the "test4" (distributed replica) volume. Additional info: Tarballs for the /var/log/glusterfs/ directory for all 3 nodes are attached. The logs are clean (wiped just before recreating this issue), so they should be relevant. Also including the sosreports for all three nodes, generated using "sosreport -e infiniband".
Created attachment 765385 [details] Client node gluster log directory. The "foo4.log" file is the log for the hanging mount point.
Created attachment 765386 [details] First storage node gluster log directory
Created attachment 765387 [details] Second storage node log directory
Created attachment 765389 [details] Sosreport from client node Generated using "sosreport -e infiniband"
Created attachment 765390 [details] Sosreport from first storage node Generated using "sosreport -e infiniband"
Created attachment 765391 [details] Sosreport from second storage node Generated using "sosreport -e infiniband"
Interestingly, "ps -ef|grep -i glusterfsd" is showing two glusterfsd processes for the test4 volume. Probably relevant to the problem. # ps -ef|grep -i glusterfsd root 1895 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test4.gluster1-2.export-brick1-test4 -p /var/lib/glusterd/vols/test4/run/gluster1-2-export-brick1-test4.pid -S /var/run/0e97d6ffbb276bbdda66eefdfa0177a3.socket --brick-name /export/brick1/test4 -l /var/log/glusterfs/bricks/export-brick1-test4.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49163 --xlator-option test4-server.listen-port=49163 root 1899 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test4.gluster1-2.export-brick2-test4 -p /var/lib/glusterd/vols/test4/run/gluster1-2-export-brick2-test4.pid -S /var/run/bfabdcacf2d8f16138631e941242b7c3.socket --brick-name /export/brick2/test4 -l /var/log/glusterfs/bricks/export-brick2-test4.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49164 --xlator-option test4-server.listen-port=49164 root 1909 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test3.gluster1-2.export-brick2-test3 -p /var/lib/glusterd/vols/test3/run/gluster1-2-export-brick2-test3.pid -S /var/run/e351ec752b247414834521b8ee755418.socket --brick-name /export/brick2/test3 -l /var/log/glusterfs/bricks/export-brick2-test3.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49162 --xlator-option test3-server.listen-port=49162 root 1913 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test1.gluster1-2.export-brick1-test1 -p /var/lib/glusterd/vols/test1/run/gluster1-2-export-brick1-test1.pid -S /var/run/265f49a2e421e2c1c52658c1215db1e8.socket --brick-name /export/brick1/test1 -l /var/log/glusterfs/bricks/export-brick1-test1.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49160 --xlator-option test1-server.listen-port=49160 root 1922 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test5.gluster1-2.export-brick1-test5 -p /var/lib/glusterd/vols/test5/run/gluster1-2-export-brick1-test5.pid -S /var/run/1a66008adc837acb4b95ef45312a69f1.socket --brick-name /export/brick1/test5 -l /var/log/glusterfs/bricks/export-brick1-test5.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49165 --xlator-option test5-server.listen-port=49165 root 1927 1 0 06:29 ? 00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test2.gluster1-2.export-brick2-test2 -p /var/lib/glusterd/vols/test2/run/gluster1-2-export-brick2-test2.pid -S /var/run/4f0257c2b42a1968d918e757d0ad9779.socket --brick-name /export/brick2/test2 -l /var/log/glusterfs/bricks/export-brick2-test2.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49161 --xlator-option test2-server.listen-port=49161 #
Created attachment 765399 [details] /var/lib/glusterd/ from gluster storage node 1
Created attachment 765400 [details] /var/lib/glusterd/ from gluster storage node 2
pre-release version is ambiguous and about to be removed as a choice. If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.