Bug 1229270

Summary: tiering: tier daemon not restarting during volume/glusterd restart
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: tierAssignee: Mohammed Rafi KC <rkavunga>
Status: CLOSED DUPLICATE QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: bugs, dlambrig, josferna, knarra, rhs-bugs, rkavunga, storage-qa-internal, vagarwal
Target Milestone: ---Keywords: Reopened, Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1225330 Environment:
Last Closed: 2015-11-24 06:13:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 994405, 1225330, 1233151, 1235202, 1265890, 1273354    
Bug Blocks:    

Description Nag Pavan Chilakam 2015-06-08 10:53:10 UTC
+++ This bug was initially created as a clone of Bug #1225330 +++

Description of problem:

tier daemon should always run on the node to promote/demote the files, but when volume is stopped , we will stop the daemon, but when start the volume the daemon should also start. Same case for glusterd restart after tier daemon went offline

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1.create a tiered volume
2.stop the volume
3.start the volume
4.check for the tier process

Actual results:

tier daemon was not running

Expected results:

volume restart should run the rebalance again

Additional info:

--- Additional comment from Anand Avati on 2015-05-27 03:14:14 EDT ---

REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#1) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-27 03:17:59 EDT ---

REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#2) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Anand Avati on 2015-05-29 03:42:45 EDT ---

REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#3) for review on master by mohammed rafi  kc (rkavunga)

--- Additional comment from Mohammed Rafi KC on 2015-06-03 10:52:28 EDT ---

apart from http://review.gluster.org/10933, it requires one more fix

Comment 2 Joseph Elwin Fernandes 2015-06-10 09:12:27 UTC
*** Bug 1229271 has been marked as a duplicate of this bug. ***

Comment 3 Mohammed Rafi KC 2015-06-10 13:59:51 UTC
upstream patch : http://review.gluster.org/#/c/10933/

Comment 6 RamaKasturi 2015-11-20 06:33:11 UTC
I am seeing the above mentioned issue with build glusterfs-3.7.5-6.el7rhgs.x86_64.

Following are the steps i performed:

1) Had a tiered volume in the system.

2) stopped the volume.

3) started the volume again.

4) when i check the gluster vol tier <vol_name> status , it displays the following output.

[root@rhs-client2 ~]# gluster vol tier vol_tier status
Node                 Promoted files       Demoted files        Status              
---------            ---------            ---------            ---------           
localhost            1                    0                    failed              
10.70.36.62          0                    1                    in progress         
Tiering Migration Functionality: vol_tier: success

Tier daemon fails to start on the node from where the volume is stopped.

I do not see the pid under the folder "/var/lib/glusterd/vols/vol_tier/tier" 

[root@rhs-client2 tier]# ls -l
total 0


once the volume is started forcefully, i can see that tier daemon starts to run.

So, reopening this bug.

Comment 7 RamaKasturi 2015-11-20 07:24:33 UTC
output of gluster volume info :
===============================

[root@rhs-client2 tier]# gluster vol info
 
Volume Name: vol_tier
Type: Tier
Volume ID: 0093a2a0-7ac1-4319-9a57-f125190db6a9
Status: Started
Number of Bricks: 14
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick6/b14
Brick2: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick6/b13
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (4 + 2) = 12
Brick3: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick0/b1
Brick4: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick0/b2
Brick5: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick1/b3
Brick6: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick1/b4
Brick7: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick2/b5
Brick8: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick2/b6
Brick9: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick3/b7
Brick10: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick3/b8
Brick11: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick4/b9
Brick12: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick4/b10
Brick13: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick5/b11
Brick14: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick5/b12
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-promote-frequency: 240
cluster.tier-demote-frequency: 240
features.bitrot: on
features.scrub: Active


output of gluster volume status:
================================

[root@rhs-client2 tier]# gluster volume status
Status of volume: vol_tier
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick6/b14                           49167     0          Y       19767
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick6/b13                            49169     0          Y       20074
Cold Bricks:
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick0/b1                             49163     0          Y       20092
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick0/b2                            49161     0          Y       19785
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick1/b3                             49164     0          Y       20110
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick1/b4                            49162     0          Y       19803
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick2/b5                             49165     0          Y       20128
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick2/b6                            49163     0          Y       19821
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick3/b7                             49166     0          Y       20146
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick3/b8                            49164     0          Y       19839
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick4/b9                             49167     0          Y       20164
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick4/b10                           49165     0          Y       19857
Brick rhs-client2.lab.eng.blr.redhat.com:/b
ricks/brick5/b11                            49168     0          Y       20182
Brick rhs-client38.lab.eng.blr.redhat.com:/
bricks/brick5/b12                           49166     0          Y       19875
NFS Server on localhost                     2049      0          Y       20355
Self-heal Daemon on localhost               N/A       N/A        Y       20363
Bitrot Daemon on localhost                  N/A       N/A        Y       20371
Scrubber Daemon on localhost                N/A       N/A        Y       20383
NFS Server on 10.70.36.62                   2049      0          Y       20041
Self-heal Daemon on 10.70.36.62             N/A       N/A        Y       20049
Bitrot Daemon on 10.70.36.62                N/A       N/A        Y       20057
Scrubber Daemon on 10.70.36.62              N/A       N/A        Y       20068
 
Task Status of Volume vol_tier
------------------------------------------------------------------------------
Task                 : Tier migration      
ID                   : ab8e4cb8-b79b-4b85-b673-1e04e3af42b7
Status               : in progress

Comment 8 RamaKasturi 2015-11-20 07:32:18 UTC
sos reports can be found at the link below:

http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1229270/

Comment 9 Mohammed Rafi KC 2015-11-20 12:08:16 UTC
Tier daemon tried to start during volume start, but failed since the brick was not up at the moment. Will be putting a fix soon.

Comment 10 Mohammed Rafi KC 2015-11-24 06:13:54 UTC

*** This bug has been marked as a duplicate of bug 1276245 ***