+++ This bug was initially created as a clone of Bug #1222442 +++ Description of problem: I/O's hanging on tiered volumes [root@dhcp42-250 gluster]# gluster vol info v1 Volume Name: v1 Type: Tier Volume ID: cdebe3d4-bf02-4f19-9803-96852a9973a1 Status: Started Number of Bricks: 4 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: 10.70.43.107:/rhs/brick2 Brick2: 10.70.42.250:/rhs/brick2 Cold Bricks: Cold Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick3: 10.70.42.250:/rhs/brick1 Brick4: 10.70.43.107:/rhs/brick1 Options Reconfigured: performance.readdir-ahead: on Version-Release number of selected component (if applicable): glusterfs 3.7.0 built on May 15 2015 01:31:12 How reproducible: Steps to Reproduce: 1. Create a replica 2 vol 2. Attach another replica 2 hot tier 3. Mount (NFS) on client and start linux untar Actual results: I/O's are hanging: linux-2.6.31.1/drivers/net/skfp/h/hwmtm.h linux-2.6.31.1/drivers/net/skfp/h/mbuf.h linux-2.6.31.1/drivers/net/skfp/h/osdef1st.h linux-2.6.31.1/drivers/net/skfp/h/sba.h linux-2.6.31.1/drivers/net/skfp/h/sba_def.h linux-2.6.31.1/drivers/net/skfp/h/skfbi.h linux-2.6.31.1/drivers/net/skfp/h/skfbiinc.h linux-2.6.31.1/drivers/net/skfp/h/smc.h linux-2.6.31.1/drivers/net/skfp/h/smt.h linux-2.6.31.1/drivers/net/skfp/h/smt_p.h linux-2.6.31.1/drivers/net/skfp/h/smtstate.h linux-2.6.31.1/drivers/net/skfp/h/supern_2.h linux-2.6.31.1/drivers/net/skfp/h/targethw.h linux-2.6.31.1/drivers/net/skfp/h/targetos.h linux-2.6.31.1/drivers/net/skfp/h/types.h linux-2.6.31.1/drivers/net/skfp/hwmtm.c Expected results: I/O's should not hang. Additional info: Attaching sosreport. --- Additional comment from Anoop on 2015-05-18 04:53:11 EDT --- --- Additional comment from RHEL Product and Program Management on 2015-05-18 06:13:36 EDT --- This request has been proposed as a blocker, but a release flag has not been requested. Please set a release flag to ? to ensure we may track this bug against the appropriate upcoming release, and reset the blocker flag to ?. --- Additional comment from Anoop on 2015-05-18 12:34:57 EDT --- I figured out that I get into this issues when I try creating the second tiered volume. This is what I see in the logs: glusterd.log The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2015-05-18 19:09:56.628006] and [2015-05-18 19:11:53.652759] nfs.log [2015-05-18 21:09:08.032573] E [graph.y:153:new_volume] 0-parser: Line 175: volume 'tier-dht' defined again [2015-05-18 21:09:08.032897] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 21:09:08.033278] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down This is consistently reproducable. --- Additional comment from Triveni Rao on 2015-05-19 02:58:44 EDT --- i see similar problem on my setup. if i have many tiered volumes and create new volumes like distrep/distribute then tried mounting the newly created volume using nfs will show connection timed out. [root@rhsqa14-vm1 ~]# gluster v info test Volume Name: test Type: Distribute Volume ID: 345406fa-17c9-4523-bb00-1b489bb552a0 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.46.233:/rhs/brick1/j0 Brick2: 10.70.46.236:/rhs/brick1/j0 Brick3: 10.70.46.233:/rhs/brick5/j0 Brick4: 10.70.46.236:/rhs/brick5/j0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm5 ~]# mount -t nfs 10.70.46.233:/test /mnt2 mount.nfs: Connection timed out [root@rhsqa14-vm5 ~]# Log messages: 2015-05-18 08:37:26.174773] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:26.174850] I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-05-18 08:37:29.175448] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:32.176345] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:35.177158] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:38.177997] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:41.179047] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:44.179887] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:47.180348] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:50.181147] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:53.182271] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:55.110046] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-05-18 08:37:55.120848] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id 0570cbd5-a643-4f2f-b19f-12add534b25e for key remove-brick-id [2015-05-18 08:37:55.660814] E [graph.y:153:new_volume] 0-parser: Line 197: volume 'tier-dht' defined again [2015-05-18 08:37:55.669775] W [glusterd-brick-ops.c:2253:glusterd_op_remove_brick] 0-management: Unable to reconfigure NFS-Server [2015-05-18 08:37:55.669801] E [glusterd-syncop.c:1372:gd_commit_op_phase] 0-management: Commit of operation 'Volume Remove brick' failed on localhost [2015-05-18 08:37:55.670829] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:55.672561] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:55.675863] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:56.183110] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:59.183753] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:02.185611] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:05.186239] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:08.186897] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:11.187513] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) NFS logs: [2015-05-18 10:02:56.640509] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:02:56.658596] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:02:57.663067] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again [2015-05-18 10:02:57.663212] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:02:57.663564] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-18 10:06:01.581244] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:06:01.598237] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:06:01.602635] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again [2015-05-18 10:06:01.602771] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:06:01.602987] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-18 10:23:45.699277] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:23:45.720241] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:23:45.724724] E [graph.y:153:new_volume] 0-parser: Line 167: volume 'tier-dht' defined again [2015-05-18 10:23:45.724861] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:23:45.725066] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-19 06:15:30.577550] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-19 06:15:30.595539] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-19 06:15:30.601199] E [graph.y:153:new_volume] 0-parser: Line 228: volume 'tier-dht' defined again [2015-05-19 06:15:30.601372] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-19 06:15:30.601569] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in clinet graph) posted (#1) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#2) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#3) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#4) for review on master by mohammed rafi kc (rkavunga)
REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#5) for review on master by mohammed rafi kc (rkavunga)
COMMIT: http://review.gluster.org/10820 committed in master by Kaushal M (kaushal) ------ commit 05566baee6b5f0b2a3b083def4fe9bbdd0f63551 Author: Mohammed Rafi KC <rkavunga> Date: Tue May 19 14:54:32 2015 +0530 tiering/nfs: duplication of nodes in client graph When creating client volfiles, xlator tier-dht will be loaded for each volume. So for services like nfs have one or more volumes . So for each volume in the graph a tier-dht xlator will be created. So the graph parser will fail because of the redundant node in graph. By this change tier-dht will be renamed as volname-tier-dht Change-Id: I3c9b9c23ddcb853773a8a02be7fd8a5d09a7f972 BUG: 1222840 Signed-off-by: Mohammed Rafi KC <rkavunga> Reviewed-on: http://review.gluster.org/10820 Reviewed-by: Atin Mukherjee <amukherj> Tested-by: Gluster Build System <jenkins.com> Tested-by: NetBSD Build System Reviewed-by: Kaushal M <kaushal>
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user