Bug 1226029

Summary: I/O's hanging on tiered volumes (NFS)
Product: [Community] GlusterFS Reporter: Mohammed Rafi KC <rkavunga>
Component: tieringAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact: bugs <bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.7.0CC: annair, bugs, dlambrig, nchilaka, rkavunga, trao
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1222840 Environment:
Last Closed: 2015-06-02 08:03:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1222442, 1222840    
Bug Blocks: 1260923    

Description Mohammed Rafi KC 2015-05-28 19:16:43 UTC
+++ This bug was initially created as a clone of Bug #1222840 +++

+++ This bug was initially created as a clone of Bug #1222442 +++

Description of problem:

I/O's hanging on tiered volumes

[root@dhcp42-250 gluster]# gluster vol info v1

Volume Name: v1
Type: Tier
Volume ID: cdebe3d4-bf02-4f19-9803-96852a9973a1
Status: Started
Number of Bricks: 4
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: 10.70.43.107:/rhs/brick2
Brick2: 10.70.42.250:/rhs/brick2
Cold Bricks:
Cold Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick3: 10.70.42.250:/rhs/brick1
Brick4: 10.70.43.107:/rhs/brick1
Options Reconfigured:
performance.readdir-ahead: on

Version-Release number of selected component (if applicable):
glusterfs 3.7.0 built on May 15 2015 01:31:12

How reproducible:

Steps to Reproduce:
1. Create a replica 2 vol
2. Attach another replica 2 hot tier
3. Mount (NFS) on client and start linux untar

Actual results:
I/O's are hanging:
linux-2.6.31.1/drivers/net/skfp/h/hwmtm.h
linux-2.6.31.1/drivers/net/skfp/h/mbuf.h
linux-2.6.31.1/drivers/net/skfp/h/osdef1st.h
linux-2.6.31.1/drivers/net/skfp/h/sba.h
linux-2.6.31.1/drivers/net/skfp/h/sba_def.h
linux-2.6.31.1/drivers/net/skfp/h/skfbi.h
linux-2.6.31.1/drivers/net/skfp/h/skfbiinc.h
linux-2.6.31.1/drivers/net/skfp/h/smc.h
linux-2.6.31.1/drivers/net/skfp/h/smt.h
linux-2.6.31.1/drivers/net/skfp/h/smt_p.h
linux-2.6.31.1/drivers/net/skfp/h/smtstate.h
linux-2.6.31.1/drivers/net/skfp/h/supern_2.h
linux-2.6.31.1/drivers/net/skfp/h/targethw.h
linux-2.6.31.1/drivers/net/skfp/h/targetos.h
linux-2.6.31.1/drivers/net/skfp/h/types.h
linux-2.6.31.1/drivers/net/skfp/hwmtm.c








Expected results:
I/O's should not hang.

Additional info:
Attaching sosreport.

--- Additional comment from Anoop on 2015-05-18 04:53:11 EDT ---



--- Additional comment from RHEL Product and Program Management on 2015-05-18 06:13:36 EDT ---

This request has been proposed as a blocker, but a release flag has
not been requested. Please set a release flag to ? to ensure we may
track this bug against the appropriate upcoming release, and reset
the blocker flag to ?.

--- Additional comment from Anoop on 2015-05-18 12:34:57 EDT ---

I figured out that I get into this issues when I try creating the second tiered volume. 

This is what I see in the logs:

glusterd.log
The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2015-05-18 19:09:56.628006] and [2015-05-18 19:11:53.652759]

nfs.log
[2015-05-18 21:09:08.032573] E [graph.y:153:new_volume] 0-parser: Line 175: volume 'tier-dht' defined again
[2015-05-18 21:09:08.032897] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 21:09:08.033278] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down

                                                   
This is consistently reproducable.

--- Additional comment from Triveni Rao on 2015-05-19 02:58:44 EDT ---

i see similar problem on my setup. if i have many tiered volumes and create new volumes like distrep/distribute then tried mounting the newly created volume using nfs will show connection timed out.

[root@rhsqa14-vm1 ~]# gluster v info test

Volume Name: test
Type: Distribute
Volume ID: 345406fa-17c9-4523-bb00-1b489bb552a0
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: 10.70.46.233:/rhs/brick1/j0
Brick2: 10.70.46.236:/rhs/brick1/j0
Brick3: 10.70.46.233:/rhs/brick5/j0
Brick4: 10.70.46.236:/rhs/brick5/j0
Options Reconfigured:
performance.readdir-ahead: on
[root@rhsqa14-vm1 ~]# 

[root@rhsqa14-vm5 ~]# mount -t nfs 10.70.46.233:/test /mnt2
mount.nfs: Connection timed out
[root@rhsqa14-vm5 ~]#


Log messages:

2015-05-18 08:37:26.174773] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:26.174850] I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd.
[2015-05-18 08:37:29.175448] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:32.176345] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:35.177158] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:38.177997] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:41.179047] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:44.179887] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:47.180348] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:50.181147] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:53.182271] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:55.110046] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req
[2015-05-18 08:37:55.120848] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id 0570cbd5-a643-4f2f-b19f-12add534b25e for key remove-brick-id
[2015-05-18 08:37:55.660814] E [graph.y:153:new_volume] 0-parser: Line 197: volume 'tier-dht' defined again
[2015-05-18 08:37:55.669775] W [glusterd-brick-ops.c:2253:glusterd_op_remove_brick] 0-management: Unable to reconfigure NFS-Server
[2015-05-18 08:37:55.669801] E [glusterd-syncop.c:1372:gd_commit_op_phase] 0-management: Commit of operation 'Volume Remove brick' failed on localhost
[2015-05-18 08:37:55.670829] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo
[2015-05-18 08:37:55.672561] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo
[2015-05-18 08:37:55.675863] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo
[2015-05-18 08:37:56.183110] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:37:59.183753] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:38:02.185611] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:38:05.186239] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:38:08.186897] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)
[2015-05-18 08:38:11.187513] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument)



NFS logs:

[2015-05-18 10:02:56.640509] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:02:56.658596] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-05-18 10:02:57.663067] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again
[2015-05-18 10:02:57.663212] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:02:57.663564] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down
[2015-05-18 10:06:01.581244] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:06:01.598237] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-05-18 10:06:01.602635] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again
[2015-05-18 10:06:01.602771] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:06:01.602987] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down
[2015-05-18 10:23:45.699277] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-18 10:23:45.720241] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-05-18 10:23:45.724724] E [graph.y:153:new_volume] 0-parser: Line 167: volume 'tier-dht' defined again
[2015-05-18 10:23:45.724861] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 10:23:45.725066] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down
[2015-05-19 06:15:30.577550] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket)
[2015-05-19 06:15:30.595539] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2015-05-19 06:15:30.601199] E [graph.y:153:new_volume] 0-parser: Line 228: volume 'tier-dht' defined again
[2015-05-19 06:15:30.601372] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-19 06:15:30.601569] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down

--- Additional comment from Anand Avati on 2015-05-19 05:38:51 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in clinet graph) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-20 00:58:04 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-20 01:22:27 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-26 02:49:59 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-28 01:30:26 EDT ---

REVIEW: http://review.gluster.org/10820 (tiering/nfs: duplication of nodes in client graph) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-28 10:02:39 EDT ---

COMMIT: http://review.gluster.org/10820 committed in master by Kaushal M (kaushal) 
------
commit 05566baee6b5f0b2a3b083def4fe9bbdd0f63551
Author: Mohammed Rafi KC <rkavunga>
Date:   Tue May 19 14:54:32 2015 +0530

    tiering/nfs: duplication of nodes in client graph
    
    When creating client volfiles, xlator tier-dht will
    be loaded for each volume. So for services like nfs
    have one or more volumes . So for each volume in the
    graph a tier-dht xlator will be created. So the graph
    parser will fail because of the redundant node in
    graph.
    
    By this change tier-dht will be renamed as volname-tier-dht
    
    Change-Id: I3c9b9c23ddcb853773a8a02be7fd8a5d09a7f972
    BUG: 1222840
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10820
    Reviewed-by: Atin Mukherjee <amukherj>
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System
    Reviewed-by: Kaushal M <kaushal>

Comment 1 Anand Avati 2015-05-28 19:20:45 UTC
REVIEW: http://review.gluster.org/10981 (tiering/nfs: duplication of nodes in client graph) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 2 Anand Avati 2015-06-01 04:40:50 UTC
COMMIT: http://review.gluster.org/10981 committed in release-3.7 by Kaushal M (kaushal) 
------
commit 1b3d0bb2d8a75806968532d2ee006f34e9bb6334
Author: Mohammed Rafi KC <rkavunga>
Date:   Tue May 19 14:54:32 2015 +0530

    tiering/nfs: duplication of nodes in client graph
    
            Back port of http://review.gluster.org/10820
    
    When creating client volfiles, xlator tier-dht will
    be loaded for each volume. So for services like nfs
    have one or more volumes . So for each volume in the
    graph a tier-dht xlator will be created. So the graph
    parser will fail because of the redundant node in
    graph.
    
    By this change tier-dht will be renamed as volname-tier-dht
    
     >Change-Id: I3c9b9c23ddcb853773a8a02be7fd8a5d09a7f972
     >BUG: 1222840
     >Signed-off-by: Mohammed Rafi KC <rkavunga>
     >Reviewed-on: http://review.gluster.org/10820
     >Reviewed-by: Atin Mukherjee <amukherj>
     >Tested-by: Gluster Build System <jenkins.com>
     >Tested-by: NetBSD Build System
     >Reviewed-by: Kaushal M <kaushal>
    
    Change-Id: I5629d48d4d1dbec8790f75e2fee66729aa2f6eed
    BUG: 1226029
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10981
    Tested-by: NetBSD Build System <jenkins.org>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Joseph Fernandes
    Reviewed-by: Kaushal M <kaushal>

Comment 3 Niels de Vos 2015-06-02 08:03:52 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.1, please reopen this bug report.

glusterfs-3.7.1 has been announced on the Gluster Packaging mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.packaging/1
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user