Bug 1222442
| Summary: | I/O's hanging on tiered volumes (NFS) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Anoop <annair> | ||||
| Component: | tier | Assignee: | Mohammed Rafi KC <rkavunga> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Nag Pavan Chilakam <nchilaka> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | rhgs-3.1 | CC: | asrivast, dlambrig, nsathyan, rhs-bugs, rkavunga, storage-qa-internal, trao, vagarwal | ||||
| Target Milestone: | --- | ||||||
| Target Release: | RHGS 3.1.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | glusterfs-3.7.1-1 | Doc Type: | Bug Fix | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | |||||||
| : | 1222840 (view as bug list) | Environment: | |||||
| Last Closed: | 2015-07-29 04:43:21 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 1202842, 1222840, 1226029 | ||||||
| Attachments: |
|
||||||
|
Description
Anoop
2015-05-18 08:49:20 UTC
Created attachment 1026626 [details]
sosreports
I figured out that I get into this issues when I try creating the second tiered volume.
This is what I see in the logs:
glusterd.log
The message "I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd." repeated 39 times between [2015-05-18 19:09:56.628006] and [2015-05-18 19:11:53.652759]
nfs.log
[2015-05-18 21:09:08.032573] E [graph.y:153:new_volume] 0-parser: Line 175: volume 'tier-dht' defined again
[2015-05-18 21:09:08.032897] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph
[2015-05-18 21:09:08.033278] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down
This is consistently reproducable.
i see similar problem on my setup. if i have many tiered volumes and create new volumes like distrep/distribute then tried mounting the newly created volume using nfs will show connection timed out. [root@rhsqa14-vm1 ~]# gluster v info test Volume Name: test Type: Distribute Volume ID: 345406fa-17c9-4523-bb00-1b489bb552a0 Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 10.70.46.233:/rhs/brick1/j0 Brick2: 10.70.46.236:/rhs/brick1/j0 Brick3: 10.70.46.233:/rhs/brick5/j0 Brick4: 10.70.46.236:/rhs/brick5/j0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm5 ~]# mount -t nfs 10.70.46.233:/test /mnt2 mount.nfs: Connection timed out [root@rhsqa14-vm5 ~]# Log messages: 2015-05-18 08:37:26.174773] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:26.174850] I [MSGID: 106006] [glusterd-svc-mgmt.c:330:glusterd_svc_common_rpc_notify] 0-management: nfs has disconnected from glusterd. [2015-05-18 08:37:29.175448] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:32.176345] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:35.177158] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:38.177997] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:41.179047] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:44.179887] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:47.180348] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:50.181147] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:53.182271] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:55.110046] I [glusterd-brick-ops.c:770:__glusterd_handle_remove_brick] 0-management: Received rem brick req [2015-05-18 08:37:55.120848] I [glusterd-utils.c:8599:glusterd_generate_and_set_task_id] 0-management: Generated task-id 0570cbd5-a643-4f2f-b19f-12add534b25e for key remove-brick-id [2015-05-18 08:37:55.660814] E [graph.y:153:new_volume] 0-parser: Line 197: volume 'tier-dht' defined again [2015-05-18 08:37:55.669775] W [glusterd-brick-ops.c:2253:glusterd_op_remove_brick] 0-management: Unable to reconfigure NFS-Server [2015-05-18 08:37:55.669801] E [glusterd-syncop.c:1372:gd_commit_op_phase] 0-management: Commit of operation 'Volume Remove brick' failed on localhost [2015-05-18 08:37:55.670829] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:55.672561] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:55.675863] E [glusterd-handshake.c:191:build_volfile_path] 0-management: Couldn't find volinfo [2015-05-18 08:37:56.183110] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:37:59.183753] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:02.185611] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:05.186239] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:08.186897] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) [2015-05-18 08:38:11.187513] W [socket.c:642:__socket_rwv] 0-nfs: readv on /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket failed (Invalid argument) NFS logs: [2015-05-18 10:02:56.640509] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:02:56.658596] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:02:57.663067] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again [2015-05-18 10:02:57.663212] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:02:57.663564] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-18 10:06:01.581244] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:06:01.598237] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:06:01.602635] E [graph.y:153:new_volume] 0-parser: Line 131: volume 'tier-dht' defined again [2015-05-18 10:06:01.602771] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:06:01.602987] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-18 10:23:45.699277] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-18 10:23:45.720241] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-18 10:23:45.724724] E [graph.y:153:new_volume] 0-parser: Line 167: volume 'tier-dht' defined again [2015-05-18 10:23:45.724861] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-18 10:23:45.725066] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-05-19 06:15:30.577550] I [MSGID: 100030] [glusterfsd.c:2294:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.0 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/4efb05f465a954b39883d7c3dfd43f78.socket) [2015-05-19 06:15:30.595539] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-05-19 06:15:30.601199] E [graph.y:153:new_volume] 0-parser: Line 228: volume 'tier-dht' defined again [2015-05-19 06:15:30.601372] E [MSGID: 100026] [glusterfsd.c:2149:glusterfs_process_volfp] 0-: failed to construct the graph [2015-05-19 06:15:30.601569] W [glusterfsd.c:1219:cleanup_and_exit] (--> 0-: received signum (0), shutting down upstream patch : http://review.gluster.org/10820 RCA: A new xlator "tier-dht" is introduced for tier volume to choose hot/cold tier and other related operations. So client process like nfs will have two or more tier volumes loaded on the graph, in that case, each volume will have tier-dht xlator and the graph creation will fail because of the duplicate node name. So all the operations which tries to configure nfs will also fail. this bug is verified on both NFS and FUse mounts and no issues found: [root@rhsqa14-vm1 ~]# gluster v create pluto replica 2 10.70.47.165:/rhs/brick1/m0 10.70.47.163:/rhs/brick1/m0 volume create: pluto: success: please start the volume to access data [root@rhsqa14-vm1 ~]# gluster v start pluto volume start: pluto: success [root@rhsqa14-vm1 ~]# gluster v info Volume Name: pluto Type: Replicate Volume ID: 3e850a11-5c59-4265-8c96-202d4205fbd4 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.70.47.165:/rhs/brick1/m0 Brick2: 10.70.47.163:/rhs/brick1/m0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v attach-tier pluto replica 2 10.70.47.165:/rhs/brick3/m0 10.70.47.163:/rhs/brick3/m0 Attach tier is recommended only for testing purposes in this release. Do you want to continue? (y/n) y volume attach-tier: success volume rebalance: pluto: success: Rebalance on pluto has been started successfully. Use rebalance status command to check status of the rebalance process. ID: 8296c3ce-77ac-4fb6-8da1-fd0a8d93c8aa [root@rhsqa14-vm1 ~]# gluster v info Volume Name: pluto Type: Tier Volume ID: 3e850a11-5c59-4265-8c96-202d4205fbd4 Status: Started Number of Bricks: 4 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: 10.70.47.163:/rhs/brick3/m0 Brick2: 10.70.47.165:/rhs/brick3/m0 Cold Tier: Cold Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick3: 10.70.47.165:/rhs/brick1/m0 Brick4: 10.70.47.163:/rhs/brick1/m0 Options Reconfigured: performance.readdir-ahead: on [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# [root@rhsqa14-vm1 ~]# gluster v status Status of volume: pluto Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.47.163:/rhs/brick3/m0 49153 0 Y 13062 Brick 10.70.47.165:/rhs/brick3/m0 49153 0 Y 14333 Cold Bricks: Brick 10.70.47.165:/rhs/brick1/m0 49152 0 Y 14279 Brick 10.70.47.163:/rhs/brick1/m0 49152 0 Y 13017 NFS Server on localhost 2049 0 Y 14353 NFS Server on 10.70.47.163 2049 0 Y 13081 Task Status of Volume pluto ------------------------------------------------------------------------------ Task : Rebalance ID : 8296c3ce-77ac-4fb6-8da1-fd0a8d93c8aa Status : in progress [root@rhsqa14-vm1 ~]# ls -la /rhs/brick*/* /rhs/brick1/m0: total 0 drwxr-xr-x. 5 root root 59 Jun 12 01:04 . drwxr-xr-x. 3 root root 15 Jun 12 00:58 .. drw-------. 13 root root 155 Jun 12 01:04 .glusterfs drwx------. 3 root root 26 Jun 12 01:04 linux-4.1-rc7 drwxr-xr-x. 3 root root 24 Jun 12 01:03 .trashcan /rhs/brick3/m0: total 130056 drwxr-xr-x. 5 root root 86 Jun 12 01:04 . drwxr-xr-x. 3 root root 15 Jun 12 01:02 .. drw-------. 159 root root 8192 Jun 12 01:05 .glusterfs drwx------. 3 root root 86 Jun 12 01:04 linux-4.1-rc7 -rw-r--r--. 2 root root 83014120 Jun 12 01:04 linux-4.1-rc7.tar.xz drwxr-xr-x. 3 root root 24 Jun 12 01:03 .trashcan on mount point linux untar going on. linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-pca953x.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-pcf857x.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-poweroff.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-restart.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-samsung.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-stericsson-coh901.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-stmpe.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-stp-xway.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-sx150x.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-twl4030.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-tz1090-pdc.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-tz1090.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-vf610.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-xgene-sb.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-xgene.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-xilinx.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-zevio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio-zynq.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio_atmel.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/gpio_lpc32xx.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/moxa,moxart-gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/mrvl-gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/nvidia,tegra20-gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/pl061-gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/renesas,gpio-rcar.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/snps-dwapb-gpio.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/sodaville.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpio/spear_spics.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpu/ linux-4.1-rc7/Documentation/devicetree/bindings/gpu/nvidia,gk20a.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpu/nvidia,tegra20-host1x.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpu/samsung-g2d.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpu/samsung-rotator.txt linux-4.1-rc7/Documentation/devicetree/bindings/gpu/st,stih4xx.txt linux-4.1-rc7/Documentation/devicetree/bindings/graph.txt linux-4.1-rc7/Documentation/devicetree/bindings/hid/ linux-4.1-rc7/Documentation/devicetree/bindings/hid/hid-over-i2c.txt [root@rhsqa14-vm1 ~]# glusterfs --version glusterfs 3.7.1 built on Jun 9 2015 02:31:54 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation. [root@rhsqa14-vm1 ~]# rpm -qa | grep gluster glusterfs-3.7.1-1.el6rhs.x86_64 glusterfs-cli-3.7.1-1.el6rhs.x86_64 glusterfs-libs-3.7.1-1.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-1.el6rhs.x86_64 glusterfs-fuse-3.7.1-1.el6rhs.x86_64 glusterfs-server-3.7.1-1.el6rhs.x86_64 glusterfs-api-3.7.1-1.el6rhs.x86_64 [root@rhsqa14-vm1 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html |