1276273 – [Tier]: start tier daemon using rebal tier start doesnt start tierd if it is failed on any of single node

Bug 1276273 - [Tier]: start tier daemon using rebal tier start doesnt start tierd if it is failed on any of single node

Summary: [Tier]: start tier daemon using rebal tier start doesnt start tierd if it is ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	tier
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	hari gowtham
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1284751 (view as bug list)
Depends On:
Blocks:	1260783 1292112 1293698
TreeView+	depends on / blocked

Reported:	2015-10-29 10:20 UTC by Rahul Hinduja
Modified:	2016-09-17 15:38 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.5-13
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1292112 1293698 (view as bug list)
Environment:
Last Closed:	2016-03-01 05:48:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description Rahul Hinduja 2015-10-29 10:20:34 UTC

Description of problem:
=======================

Because of the issue mentione in bz: 1276245, tierd daemon is down on one of the node. gluster volume status on this node still shows Tier in-progress. 

One of the way known to start the tier is "gluster volume rebal tiervolume tier start", but it failed to start. Could start the tierd on local host by using "volume start force".

Initial Status:
===============

[root@dhcp37-165 glusterfs]# gluster volume status 
Status of volume: tiervolume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------

:    :        :            :            :           :               :
Task Status of Volume tiervolume
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : d4aec4e9-c4b9-4ef3-926b-af2b29c22096
Status               : in progress         
 
[root@dhcp37-165 glusterfs]# ps -eaf | grep glusterfs | grep tier
root      6806     1  9 13:47 ?        00:09:25 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick3-tiervolume_hot -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick3-tiervolume_hot.pid -S /var/run/gluster/4f46770e383fab1ee7789ff7a656a342.socket --brick-name /rhs/brick3/tiervolume_hot -l /var/log/glusterfs/bricks/rhs-brick3-tiervolume_hot.log --xlator-option *-posix.glusterd-uuid=506b81d5-08d6-421a-9aff-94e57d3740bb --brick-port 49153 --xlator-option tiervolume-server.listen-port=49153
root      6824     1 21 13:47 ?        00:22:12 /usr/sbin/glusterfsd -s 10.70.37.165 --volfile-id tiervolume.10.70.37.165.rhs-brick1-tiervolume_ct-disp1 -p /var/lib/glusterd/vols/tiervolume/run/10.70.37.165-rhs-brick1-tiervolume_ct-disp1.pid -S /var/run/gluster/4cdd38c5ea86fe823baaa5dcde1b4b57.socket --brick-name /rhs/brick1/tiervolume_ct-disp1 -l /var/log/glusterfs/bricks/rhs-brick1-tiervolume_ct-disp1.log --xlator-option *-posix.glusterd-uuid=506b81d5-08d6-421a-9aff-94e57d3740bb --brick-port 49152 --xlator-option tiervolume-server.listen-port=49152
[root@dhcp37-165 glusterfs]# 

^^^^^ Note: No tierd glusterfs process is running

[root@dhcp37-165 glusterfs]# gluster volume tier tiervolume status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            562                  0                    failed              
10.70.37.133         0                    18824                in progress         
10.70.37.160         0                    0                    in progress         
10.70.37.158         0                    19867                in progress         
10.70.37.110         0                    0                    in progress         
10.70.37.155         0                    22756                in progress         
10.70.37.99          41                   0                    in progress         
10.70.37.88          0                    23585                in progress         
10.70.37.112         0                    0                    in progress         
10.70.37.199         0                    20903                in progress         
10.70.37.162         0                    0                    in progress         
10.70.37.87          0                    21816                in progress         
volume rebalance: tiervolume: success: 
[root@dhcp37-165 glusterfs]# 

[root@dhcp37-165 glusterfs]# gluster volume rebal tiervolume tier start
volume rebalance: tiervolume: success: Rebalance on tiervolume has been started successfully. Use rebalance status command to check status of the rebalance process.
ID: d666a0ae-f03b-4862-8f54-9b2545cfcdc3

[root@dhcp37-165 glusterfs]# gluster volume rebal tiervolume tier status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            562                  0                    failed              
10.70.37.133         0                    18824                in progress         
10.70.37.160         0                    0                    in progress         
10.70.37.158         0                    19867                in progress         
10.70.37.110         0                    0                    in progress         
10.70.37.155         0                    22756                in progress         
10.70.37.99          41                   0                    in progress         
10.70.37.88          0                    23585                in progress         
10.70.37.112         0                    0                    in progress         
10.70.37.199         0                    20903                in progress         
10.70.37.162         0                    0                    in progress         
10.70.37.87          0                    21816                in progress         
volume rebalance: tiervolume: success: 
[root@dhcp37-165 glusterfs]#

rebal start shows that rebalance is started successfully, but the status shows failure. 

[root@dhcp37-165 glusterfs]# gluster volume start tiervolume force
volume start: tiervolume: success
[root@dhcp37-165 glusterfs]# 
[root@dhcp37-165 glusterfs]# gluster volume rebal tiervolume tier status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            562                  0                    in progress         
10.70.37.133         0                    18824                in progress         
10.70.37.160         0                    0                    in progress         
10.70.37.158         0                    19867                in progress         
10.70.37.110         0                    0                    in progress         
10.70.37.155         0                    22756                in progress         
10.70.37.99          41                   0                    in progress         
10.70.37.88          0                    23585                in progress         
10.70.37.112         0                    0                    in progress         
10.70.37.199         0                    20903                in progress         
10.70.37.162         0                    0                    in progress         
10.70.37.87          0                    21816                in progress         
volume rebalance: tiervolume: success: 
[root@dhcp37-165 glusterfs]# 



Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.5-0.3.el7rhgs.x86_64

Comment 5 Mohammed Rafi KC 2015-11-25 07:10:24 UTC

gluster volume rebalance volname tier start will not start the failed process if it is already started, it should throw an error saying "Tier process is already running ". Apparently that is not happening because of the #bug 1285170 .

But we should need a way to start tier forcefully overriding the check. An RFC is filed for this #bug 1284751 .

Workaround for this bug is to start volume forcefully, that start tier daemon if it is failed/not running. It won't start any process that is already running.

Comment 10 Mohammed Rafi KC 2015-12-03 13:38:08 UTC

*** Bug 1284751 has been marked as a duplicate of this bug. ***

Comment 14 hari gowtham 2015-12-28 06:46:08 UTC

The upstream patch is : http://review.gluster.org/#/c/12983/

The downstream patch is : https://code.engineering.redhat.com/gerrit/#/c/64383/

Comment 17 Rahul Hinduja 2016-01-06 12:16:14 UTC

Verified with the build: glusterfs-3.7.5-14.el7rhgs.x86_64

Killed few tierd glusterfs process which marked tierd as failed. 
"tier <volume> tier start force", started only the bricks which were marked faulty without restarting the rest of the bricks tierd glusterfs process. Moving this bug to verified state.

Comment 20 errata-xmlrpc 2016-03-01 05:48:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.