Bug 1219846 - Data Tiering: glusterd(management) communication issues seen on tiering setup
Summary: Data Tiering: glusterd(management) communication issues seen on tiering setup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: 3.7.0
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On: 1211264
Blocks: qe_tracker_everglades glusterfs-3.7.0 1260923
TreeView+ depends on / blocked
 
Reported: 2015-05-08 13:11 UTC by Mohammed Rafi KC
Modified: 2016-06-16 12:59 UTC (History)
5 users (show)

Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1211264
Environment:
Last Closed: 2016-06-16 12:59:33 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Mohammed Rafi KC 2015-05-08 13:11:17 UTC
+++ This bug was initially created as a clone of Bug #1211264 +++

Description of problem:
======================
While executing commands like quota on, attach-tier, detach tier etc on a cluster
with one tiered volume atleast, there are errors observed like updating the tables on other nodes of clusters.
Some examples are:
1)volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.

2) Sometimes when a command like detach tier or quota disable is issued on a multi node cluster, the command gets executed only on the local node and fails to get updated in the tables or graphs of other nodes.

We have seen this issue even on non tiered volume sometimes , but can be seen after any tiering commands have been executed on that cluster.
There seems to be a issue with management deamon b/w nodes

In more detail, I issued a detach tier command from one node's cli, and following is the o/p seen from both the nodes resepctive cli:

(local node, where i have been executing all the commands so far)
[root@rhs-client6 glusterd]# gluster v info disperse
 
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse

[root@yarrow glusterd]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse


It can be clearly seen that the other nodes havent been updated.


Version-Release number of selected component (if applicable):
============================================================
[root@rhs-client6 glusterd]# gluster --version
glusterfs 3.7dev built on Apr 13 2015 07:14:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

[root@rhs-client6 glusterd]# rpm -qa|grep gluster
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64




How reproducible:
================
quite easily


Steps to Reproduce:
==================
1.Install latest nightly 
2.create a cluster with atleast two nodes
3.create a tired volume
4. try to enable and then disable quotas and we can see the issue
or else sometimes even detach tier can reproduce the issue

--- Additional comment from nchilaka on 2015-04-13 09:19:18 EDT ---

CLI executed logs:
==================
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Created
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v start disperse
volume start: disperse: success
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]#

--- Additional comment from nchilaka on 2015-04-13 09:20:02 EDT ---

sosreports@rhsqe-repo:/home/repo/sosreports/1211264

--- Additional comment from Dan Lambright on 2015-04-22 11:15:28 EDT ---

We have submitted fix 10108, which is not merged. The issues with detach-tier may no longer exist (I do not see them). Returning to QE to retest.

--- Additional comment from nchilaka on 2015-04-28 03:07:55 EDT ---

Dan,
We are stilling seeing issues with glusted communication b/w nodes with a tiered volume as of 28th April.
Kindly put it "ON_QA" only when the fix is availble for testing

--- Additional comment from Anand Avati on 2015-04-29 09:06:14 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-02 09:23:09 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-05 04:41:29 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-05 07:27:10 EDT ---

REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

Comment 1 Anand Avati 2015-05-08 13:12:59 UTC
REVIEW: http://review.gluster.org/10678 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#1) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 2 Anand Avati 2015-05-09 11:46:52 UTC
REVIEW: http://review.gluster.org/10678 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#2) for review on release-3.7 by mohammed rafi  kc (rkavunga)

Comment 3 Anand Avati 2015-05-09 14:07:14 UTC
REVIEW: http://review.gluster.org/10724 (tests: Wait for NFS exports to be ready before mount) posted (#1) for review on master by Raghavendra Talur (rtalur)

Comment 4 Anand Avati 2015-05-09 17:35:23 UTC
REVIEW: http://review.gluster.org/10724 (tests: Wait for NFS exports to be ready before mount) posted (#2) for review on master by Vijay Bellur (vbellur)

Comment 5 Anand Avati 2015-05-09 17:36:25 UTC
REVIEW: http://review.gluster.org/10678 (glusterd/tiering: Exchange tier info during glusterd handshake) posted (#3) for review on release-3.7 by Vijay Bellur (vbellur)

Comment 6 Anand Avati 2015-05-09 22:10:45 UTC
REVIEW: http://review.gluster.org/10724 (tests: Wait for NFS exports to be ready before mount) posted (#3) for review on master by Niels de Vos (ndevos)

Comment 7 Anand Avati 2015-05-10 05:24:46 UTC
REVIEW: http://review.gluster.org/10678 (glusterd/tiering: Exchange tier info during glusterd handshake) posted (#4) for review on release-3.7 by Vijay Bellur (vbellur)

Comment 8 Anand Avati 2015-05-10 07:37:07 UTC
COMMIT: http://review.gluster.org/10678 committed in release-3.7 by Vijay Bellur (vbellur) 
------
commit 2b7048a58ced7dea2d40016b5c6880fcca89f0f0
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Apr 29 12:00:40 2015 +0530

    glusterd/tiering: Exchange tier info during glusterd handshake
    
            Back port of http://review.gluster.org/#/c/10449
    
    Change-Id: Ibc2f8eeb32d3e5dfd6945ca8a6d5f0f80a78ebac
    BUG: 1219846
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10678
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 9 Anand Avati 2015-05-11 20:38:46 UTC
COMMIT: http://review.gluster.org/10724 committed in master by Niels de Vos (ndevos) 
------
commit e32e5fcb0757cd5cc8bfa04f45bab410f6547feb
Author: Raghavendra Talur <rtalur>
Date:   Sat May 9 19:35:31 2015 +0530

    tests: Wait for NFS exports to be ready before mount
    
    Change-Id: Ie71e8c80d6a43dd618c9decb946a459b211295ce
    BUG: 1219846
    Signed-off-by: Raghavendra Talur <rtalur>
    Reviewed-on: http://review.gluster.org/10724
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Niels de Vos <ndevos>

Comment 10 Niels de Vos 2015-05-14 17:29:40 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 11 Niels de Vos 2015-05-14 17:36:02 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 12 Niels de Vos 2015-05-14 17:38:23 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 13 Niels de Vos 2015-05-14 17:47:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report.

glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Comment 14 Niels de Vos 2016-06-16 12:59:33 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.