978148 – Attempting to mount distributed-replicate volume on RHEL 6.4 hangs in upstream 3.4.0 Beta 3

Bug 978148 - Attempting to mount distributed-replicate volume on RHEL 6.4 hangs in upstream 3.4.0 Beta 3

Summary: Attempting to mount distributed-replicate volume on RHEL 6.4 hangs in upstrea...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	rdma
Sub Component:
Version:	pre-release
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-06-26 04:57 UTC by Justin Clift
Modified:	2015-10-22 15:40 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2015-10-22 15:40:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Client node gluster log directory. (2.48 KB, application/x-bzip2) 2013-06-26 04:59 UTC, Justin Clift	no flags	Details
First storage node gluster log directory (8.94 KB, application/x-bzip2) 2013-06-26 05:00 UTC, Justin Clift	no flags	Details
Second storage node log directory (8.89 KB, application/x-bzip2) 2013-06-26 05:01 UTC, Justin Clift	no flags	Details
Sosreport from client node (720.24 KB, application/x-xz) 2013-06-26 05:03 UTC, Justin Clift	no flags	Details
Sosreport from first storage node (742.74 KB, application/x-xz) 2013-06-26 05:04 UTC, Justin Clift	no flags	Details
Sosreport from second storage node (664.95 KB, application/x-xz) 2013-06-26 05:05 UTC, Justin Clift	no flags	Details
/var/lib/glusterd/ from gluster storage node 1 (4.86 KB, application/x-bzip2) 2013-06-26 05:59 UTC, Justin Clift	no flags	Details
/var/lib/glusterd/ from gluster storage node 2 (4.81 KB, application/x-bzip2) 2013-06-26 05:59 UTC, Justin Clift	no flags	Details
View All

Description Justin Clift 2013-06-26 04:57:29 UTC

Description of problem:

  Attempting to mount a Distributed-Replicate volume using RDMA
  transport is hanging, using upstream GlusterFS 3.4.0 Beta3 on
  RHEL 6.4.  (this is a late entry for the 3.4.0 beta 3 RDMA
  "Test Day")

  Tried several times (using Control-C to cancel after a few
  minutes each time).

    # mount -t glusterfs gluster1-2:test4 /foo4
    ^C
    # mount -t glusterfs gluster1-2:test4 /foo4
    ^C
    # mount -t glusterfs gluster1-2:test4 /foo4
    ^C
    # ps -ef|grep -i gluster
    root      1808     1  0 05:21 ?        00:00:00 /usr/sbin/glusterd -p /var/run/glusterd.pid
    root      1874     1  0 05:21 ?        00:00:00 /usr/sbin/glusterfs --volfile-id=test4 --volfile-server=gluster1-2 /foo4
    root      1882     1  0 05:21 pts/0    00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw
    root      1888     1  0 05:22 pts/0    00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw
    root      1904     1  0 05:22 pts/0    00:00:00 /bin/sh /sbin/mount.glusterfs gluster1-2:test4 /foo4 -o rw
    #

  The Gluster log for the mount point (attached) seems to say
  it is having issues connecting to one of the subvolumes,
  although none of the other volumes on the same servers are
  having issues.  It seems to only be a problem with the
  Distributed-Replica volume, and it's using the same backend
  physical storage as the other volumes (just different
  directories).

    # gluster volume info

    Volume Name: test4
    Type: Distributed-Replicate
    Volume ID: 384780ee-306c-419e-a6d6-d58abbd24a58
    Status: Started
    Number of Bricks: 2 x 2 = 4
    Transport-type: rdma
    Bricks:
    Brick1: gluster1-2:/export/brick1/test4
    Brick2: gluster2-2:/export/brick1/test4
    Brick3: gluster1-2:/export/brick2/test4
    Brick4: gluster2-2:/export/brick2/test4

    Volume Name: test3
    Type: Replicate
    Volume ID: a3b5e22c-28c9-4963-84fd-c192f4b9261b
    Status: Started
    Number of Bricks: 1 x 2 = 2
    Transport-type: rdma
    Bricks:
    Brick1: gluster1-2:/export/brick2/test3
    Brick2: gluster2-2:/export/brick2/test3

    Volume Name: test1
    Type: Distribute
    Volume ID: 63017415-d946-473e-8aa4-8746e5265f9c
    Status: Started
    Number of Bricks: 1
    Transport-type: rdma
    Bricks:
    Brick1: gluster1-2:/export/brick1/test1

    Volume Name: test5
    Type: Stripe
    Volume ID: e6e78330-3bc8-445c-b500-93fb15ebdf6d
    Status: Started
    Number of Bricks: 1 x 2 = 2
    Transport-type: rdma
    Bricks:
    Brick1: gluster1-2:/export/brick1/test5
    Brick2: gluster2-2:/export/brick1/test5

    Volume Name: test2
    Type: Distribute
    Volume ID: 694f1cbf-e1e5-42e6-9f07-605d409ff95f
    Status: Started
    Number of Bricks: 2
    Transport-type: rdma
    Bricks:
    Brick1: gluster1-2:/export/brick2/test2
    Brick2: gluster2-2:/export/brick2/test2


Version-Release number of selected component (if applicable):

  glusterfs-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-api-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-debuginfo-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-devel-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-fuse-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-rdma-3.4.0-0.6.beta3.el6.x86_64
  glusterfs-server-3.4.0-0.6.beta3.el6.x86_64


How reproducible:

  Every time, even after several reboots of every server in
  the environment. :(


Steps to Reproduce:

  1. With a 3 node setup (2 gluster nodes, one "client"
     node for mounting), create all of the volumes as per
     the Gluster 3.4.0 beta3 RDMA test day instructions.
     Add "transport rdma" to every volume creation command
     though, so it's using RDMA.

       http://www.gluster.org/community/documentation/index.php/3.4.0_Beta_1_Tests

  2. Attempt to mount all of the volumes as per Test 4a
     (native client mounting).

     The problem occurs here.  Mounting will hang on the
     "test4" (distributed replica) volume.


Additional info:

  Tarballs for the /var/log/glusterfs/ directory for all
  3 nodes are attached.  The logs are clean (wiped just
  before recreating this issue), so they should be relevant.

  Also including the sosreports for all three nodes,
  generated using "sosreport -e infiniband".

Comment 1 Justin Clift 2013-06-26 04:59:39 UTC

Created attachment 765385 [details]
Client node gluster log directory.

The "foo4.log" file is the log for the hanging mount point.

Comment 2 Justin Clift 2013-06-26 05:00:19 UTC

Created attachment 765386 [details]
First storage node gluster log directory

Comment 3 Justin Clift 2013-06-26 05:01:10 UTC

Created attachment 765387 [details]
Second storage node log directory

Comment 4 Justin Clift 2013-06-26 05:03:57 UTC

Created attachment 765389 [details]
Sosreport from client node

Generated using "sosreport -e infiniband"

Comment 5 Justin Clift 2013-06-26 05:04:44 UTC

Created attachment 765390 [details]
Sosreport from first storage node

Generated using "sosreport -e infiniband"

Comment 6 Justin Clift 2013-06-26 05:05:26 UTC

Created attachment 765391 [details]
Sosreport from second storage node

Generated using "sosreport -e infiniband"

Comment 7 Justin Clift 2013-06-26 05:47:48 UTC

Interestingly, "ps -ef|grep -i glusterfsd" is showing two glusterfsd processes for the test4 volume.  Probably relevant to the problem.

# ps -ef|grep -i glusterfsd
root      1895     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test4.gluster1-2.export-brick1-test4 -p /var/lib/glusterd/vols/test4/run/gluster1-2-export-brick1-test4.pid -S /var/run/0e97d6ffbb276bbdda66eefdfa0177a3.socket --brick-name /export/brick1/test4 -l /var/log/glusterfs/bricks/export-brick1-test4.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49163 --xlator-option test4-server.listen-port=49163
root      1899     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test4.gluster1-2.export-brick2-test4 -p /var/lib/glusterd/vols/test4/run/gluster1-2-export-brick2-test4.pid -S /var/run/bfabdcacf2d8f16138631e941242b7c3.socket --brick-name /export/brick2/test4 -l /var/log/glusterfs/bricks/export-brick2-test4.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49164 --xlator-option test4-server.listen-port=49164
root      1909     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test3.gluster1-2.export-brick2-test3 -p /var/lib/glusterd/vols/test3/run/gluster1-2-export-brick2-test3.pid -S /var/run/e351ec752b247414834521b8ee755418.socket --brick-name /export/brick2/test3 -l /var/log/glusterfs/bricks/export-brick2-test3.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49162 --xlator-option test3-server.listen-port=49162
root      1913     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test1.gluster1-2.export-brick1-test1 -p /var/lib/glusterd/vols/test1/run/gluster1-2-export-brick1-test1.pid -S /var/run/265f49a2e421e2c1c52658c1215db1e8.socket --brick-name /export/brick1/test1 -l /var/log/glusterfs/bricks/export-brick1-test1.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49160 --xlator-option test1-server.listen-port=49160
root      1922     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test5.gluster1-2.export-brick1-test5 -p /var/lib/glusterd/vols/test5/run/gluster1-2-export-brick1-test5.pid -S /var/run/1a66008adc837acb4b95ef45312a69f1.socket --brick-name /export/brick1/test5 -l /var/log/glusterfs/bricks/export-brick1-test5.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49165 --xlator-option test5-server.listen-port=49165
root      1927     1  0 06:29 ?        00:00:00 /usr/sbin/glusterfsd -s gluster1-2 --volfile-id test2.gluster1-2.export-brick2-test2 -p /var/lib/glusterd/vols/test2/run/gluster1-2-export-brick2-test2.pid -S /var/run/4f0257c2b42a1968d918e757d0ad9779.socket --brick-name /export/brick2/test2 -l /var/log/glusterfs/bricks/export-brick2-test2.log --xlator-option *-posix.glusterd-uuid=6776b03c-9406-46bd-ae4f-4b6c63a46503 --brick-port 49161 --xlator-option test2-server.listen-port=49161
#

Comment 8 Justin Clift 2013-06-26 05:59:09 UTC

Created attachment 765399 [details]
/var/lib/glusterd/ from gluster storage node 1

Comment 9 Justin Clift 2013-06-26 05:59:33 UTC

Created attachment 765400 [details]
/var/lib/glusterd/ from gluster storage node 2

Comment 11 Kaleb KEITHLEY 2015-10-22 15:40:20 UTC

pre-release version is ambiguous and about to be removed as a choice.

If you believe this is still a bug, please change the status back to NEW and choose the appropriate, applicable version for it.

Note You need to log in before you can comment on or make changes to this bug.