Bug 1211264 - Data Tiering: glusterd(management) communication issues seen on tiering setup
Summary: Data Tiering: glusterd(management) communication issues seen on tiering setup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: tiering
Version: mainline
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: ---
Assignee: Mohammed Rafi KC
QA Contact: bugs@gluster.org
URL:
Whiteboard:
Depends On:
Blocks: qe_tracker_everglades 1219846 1260923
TreeView+ depends on / blocked
 
Reported: 2015-04-13 13:12 UTC by Nag Pavan Chilakam
Modified: 2016-06-16 12:50 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.8rc2
Clone Of:
: 1219846 (view as bug list)
Environment:
Last Closed: 2016-06-16 12:50:21 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2015-04-13 13:12:35 UTC
Description of problem:
======================
While executing commands like quota on, attach-tier, detach tier etc on a cluster
with one tiered volume atleast, there are errors observed like updating the tables on other nodes of clusters.
Some examples are:
1)volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.

2) Sometimes when a command like detach tier or quota disable is issued on a multi node cluster, the command gets executed only on the local node and fails to get updated in the tables or graphs of other nodes.

We have seen this issue even on non tiered volume sometimes , but can be seen after any tiering commands have been executed on that cluster.
There seems to be a issue with management deamon b/w nodes

In more detail, I issued a detach tier command from one node's cli, and following is the o/p seen from both the nodes resepctive cli:

(local node, where i have been executing all the commands so far)
[root@rhs-client6 glusterd]# gluster v info disperse
 
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse

[root@yarrow glusterd]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse


It can be clearly seen that the other nodes havent been updated.


Version-Release number of selected component (if applicable):
============================================================
[root@rhs-client6 glusterd]# gluster --version
glusterfs 3.7dev built on Apr 13 2015 07:14:27
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.

[root@rhs-client6 glusterd]# rpm -qa|grep gluster
glusterfs-api-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-libs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-fuse-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-cli-3.7dev-0.994.gitf522001.el6.x86_64
glusterfs-server-3.7dev-0.994.gitf522001.el6.x86_64




How reproducible:
================
quite easily


Steps to Reproduce:
==================
1.Install latest nightly 
2.create a cluster with atleast two nodes
3.create a tired volume
4. try to enable and then disable quotas and we can see the issue
or else sometimes even detach tier can reproduce the issue

Comment 1 Nag Pavan Chilakam 2015-04-13 13:19:18 UTC
CLI executed logs:
==================
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Created
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v start disperse
volume start: disperse: success
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: yarrow:/yarrow_200G_7/disperse
Brick2: yarrow:/yarrow_200G_8/disperse
Brick3: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]# gluster v info disperse
 
Volume Name: disperse
Type: Tier
Volume ID: a6e4f8dd-bbf8-484f-9d1c-c6267899bb0a
Status: Started
Number of Bricks: 2 x 2 = 5
Transport-type: tcp
Bricks:
Brick1: rhs-client6:/brick16/disperse
Brick2: yarrow:/yarrow_ssd_75G_2/disperse
Brick3: yarrow:/yarrow_200G_7/disperse
Brick4: yarrow:/yarrow_200G_8/disperse
Brick5: rhs-client6:/brick15/disperse
[root@yarrow glusterfs]# gluster v detach-tier disperse
volume remove-brick unknown: failed: Commit failed on localhost. Please check the log file for more details.
[root@yarrow glusterfs]# gluster v attach-tier disperse replica 2  yarrow:/yarrow_ssd_75G_2/disperse rhs-client6:/brick16/disperse force
volume add-brick: failed: /yarrow_ssd_75G_2/disperse is already part of a volume
[root@yarrow glusterfs]#

Comment 2 Nag Pavan Chilakam 2015-04-13 13:20:02 UTC
sosreports@rhsqe-repo:/home/repo/sosreports/1211264

Comment 3 Dan Lambright 2015-04-22 15:15:28 UTC
We have submitted fix 10108, which is not merged. The issues with detach-tier may no longer exist (I do not see them). Returning to QE to retest.

Comment 4 Nag Pavan Chilakam 2015-04-28 07:07:55 UTC
Dan,
We are stilling seeing issues with glusted communication b/w nodes with a tiered volume as of 28th April.
Kindly put it "ON_QA" only when the fix is availble for testing

Comment 5 Anand Avati 2015-04-29 13:06:14 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

Comment 6 Anand Avati 2015-05-02 13:23:09 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

Comment 7 Anand Avati 2015-05-05 08:41:29 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

Comment 8 Anand Avati 2015-05-05 11:27:10 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

Comment 9 Anand Avati 2015-05-08 21:37:15 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

Comment 10 Anand Avati 2015-05-09 10:26:16 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#6) for review on master by mohammed rafi  kc (rkavunga)

Comment 11 Anand Avati 2015-05-09 13:39:56 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#7) for review on master by mohammed rafi  kc (rkavunga)

Comment 12 Anand Avati 2015-05-10 08:08:29 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#8) for review on master by Vijay Bellur (vbellur)

Comment 13 Anand Avati 2015-05-11 06:53:13 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#9) for review on master by mohammed rafi  kc (rkavunga)

Comment 14 Anand Avati 2015-05-13 11:25:12 UTC
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

Comment 15 Anand Avati 2015-05-22 08:54:36 UTC
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

Comment 16 Anand Avati 2015-05-22 13:28:11 UTC
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#3) for review on master by Vijay Bellur (vbellur)

Comment 17 Anand Avati 2015-05-26 06:01:10 UTC
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#4) for review on master by mohammed rafi  kc (rkavunga)

Comment 18 Anand Avati 2015-05-28 05:29:32 UTC
REVIEW: http://review.gluster.org/10449 (glusterd/tiering: Exchange tier info during glusted handshake) posted (#10) for review on master by mohammed rafi  kc (rkavunga)

Comment 19 Anand Avati 2015-05-28 05:29:43 UTC
REVIEW: http://review.gluster.org/10768 (tiering: Correct errors in cli and glusterd) posted (#5) for review on master by mohammed rafi  kc (rkavunga)

Comment 20 Anand Avati 2015-05-28 13:56:00 UTC
COMMIT: http://review.gluster.org/10449 committed in master by Kaushal M (kaushal) 
------
commit d133071e7ced1794e09ffe4ef8cb14cf5b9f7e75
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed Apr 29 12:00:40 2015 +0530

    glusterd/tiering: Exchange tier info during glusted handshake
    
    Change-Id: Ibc2f8eeb32d3e5dfd6945ca8a6d5f0f80a78ebac
    BUG: 1211264
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10449
    Reviewed-by: Kaushal M <kaushal>
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System

Comment 21 Anand Avati 2015-05-28 14:00:49 UTC
COMMIT: http://review.gluster.org/10768 committed in master by Kaushal M (kaushal) 
------
commit f1fb71bbf677be40b7bab997221f832c7fa7527a
Author: Mohammed Rafi KC <rkavunga>
Date:   Wed May 13 16:53:22 2015 +0530

    tiering: Correct errors in cli and glusterd
    
    Problem 1:
    
    volume info shows Cold Bricks instead of Tier type
      eg:
    Volume Name: patchy2
    Type: Tier
    Volume ID: 28c25b8d-b8a1-45dc-b4b7-cbd0b344f58f
    Status: Started
    Number of Bricks: 3
    Transport-type: tcp
    Hot Tier :
    Hot Tier Type : Distribute
    Number of Bricks: 1
    Brick1: 10.70.1.35:/home/brick43
    Cold Bricks:
    Cold Tier Type : Distribute
    Number of Bricks: 2
    Brick2: 10.70.1.35:/home/brick19
    Brick3: 10.70.1.35:/home/brick16
    Options Reconfigured:
    
    Problem 2: Detach-tier sending enums of Rebalance
    
       detach-tier has it's own Enum to send with detach-tier command,
    using that enums will make more appropriate.
    
    Problem 3:
    
    Wrongly sets hot_brick count during the dictionary copying for response
    
    Change-Id: Icc054a999a679456881bc70511470d32ff8a86e4
    BUG: 1211264
    Signed-off-by: Mohammed Rafi KC <rkavunga>
    Reviewed-on: http://review.gluster.org/10768
    Reviewed-by: Atin Mukherjee <amukherj>
    Reviewed-by: Kaushal M <kaushal>
    Tested-by: Gluster Build System <jenkins.com>
    Tested-by: NetBSD Build System

Comment 22 Nagaprasad Sathyanarayana 2015-10-25 15:21:58 UTC
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.

Comment 23 Niels de Vos 2016-06-16 12:50:21 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.