+++ This bug was initially created as a clone of Bug #1225330 +++ Description of problem: tier daemon should always run on the node to promote/demote the files, but when volume is stopped , we will stop the daemon, but when start the volume the daemon should also start. Same case for glusterd restart after tier daemon went offline Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1.create a tiered volume 2.stop the volume 3.start the volume 4.check for the tier process Actual results: tier daemon was not running Expected results: volume restart should run the rebalance again Additional info: --- Additional comment from Anand Avati on 2015-05-27 03:14:14 EDT --- REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#1) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Anand Avati on 2015-05-27 03:17:59 EDT --- REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#2) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Anand Avati on 2015-05-29 03:42:45 EDT --- REVIEW: http://review.gluster.org/10933 (glusterd/tier: configure tier daemon during volume restart) posted (#3) for review on master by mohammed rafi kc (rkavunga) --- Additional comment from Mohammed Rafi KC on 2015-06-03 10:52:28 EDT --- apart from http://review.gluster.org/10933, it requires one more fix
*** Bug 1229271 has been marked as a duplicate of this bug. ***
upstream patch : http://review.gluster.org/#/c/10933/
I am seeing the above mentioned issue with build glusterfs-3.7.5-6.el7rhgs.x86_64. Following are the steps i performed: 1) Had a tiered volume in the system. 2) stopped the volume. 3) started the volume again. 4) when i check the gluster vol tier <vol_name> status , it displays the following output. [root@rhs-client2 ~]# gluster vol tier vol_tier status Node Promoted files Demoted files Status --------- --------- --------- --------- localhost 1 0 failed 10.70.36.62 0 1 in progress Tiering Migration Functionality: vol_tier: success Tier daemon fails to start on the node from where the volume is stopped. I do not see the pid under the folder "/var/lib/glusterd/vols/vol_tier/tier" [root@rhs-client2 tier]# ls -l total 0 once the volume is started forcefully, i can see that tier daemon starts to run. So, reopening this bug.
output of gluster volume info : =============================== [root@rhs-client2 tier]# gluster vol info Volume Name: vol_tier Type: Tier Volume ID: 0093a2a0-7ac1-4319-9a57-f125190db6a9 Status: Started Number of Bricks: 14 Transport-type: tcp Hot Tier : Hot Tier Type : Replicate Number of Bricks: 1 x 2 = 2 Brick1: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick6/b14 Brick2: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick6/b13 Cold Tier: Cold Tier Type : Distributed-Disperse Number of Bricks: 2 x (4 + 2) = 12 Brick3: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick0/b1 Brick4: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick0/b2 Brick5: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick1/b3 Brick6: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick1/b4 Brick7: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick2/b5 Brick8: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick2/b6 Brick9: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick3/b7 Brick10: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick3/b8 Brick11: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick4/b9 Brick12: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick4/b10 Brick13: rhs-client2.lab.eng.blr.redhat.com:/bricks/brick5/b11 Brick14: rhs-client38.lab.eng.blr.redhat.com:/bricks/brick5/b12 Options Reconfigured: performance.readdir-ahead: on features.ctr-enabled: on cluster.tier-promote-frequency: 240 cluster.tier-demote-frequency: 240 features.bitrot: on features.scrub: Active output of gluster volume status: ================================ [root@rhs-client2 tier]# gluster volume status Status of volume: vol_tier Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick6/b14 49167 0 Y 19767 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick6/b13 49169 0 Y 20074 Cold Bricks: Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick0/b1 49163 0 Y 20092 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick0/b2 49161 0 Y 19785 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick1/b3 49164 0 Y 20110 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick1/b4 49162 0 Y 19803 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick2/b5 49165 0 Y 20128 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick2/b6 49163 0 Y 19821 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick3/b7 49166 0 Y 20146 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick3/b8 49164 0 Y 19839 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick4/b9 49167 0 Y 20164 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick4/b10 49165 0 Y 19857 Brick rhs-client2.lab.eng.blr.redhat.com:/b ricks/brick5/b11 49168 0 Y 20182 Brick rhs-client38.lab.eng.blr.redhat.com:/ bricks/brick5/b12 49166 0 Y 19875 NFS Server on localhost 2049 0 Y 20355 Self-heal Daemon on localhost N/A N/A Y 20363 Bitrot Daemon on localhost N/A N/A Y 20371 Scrubber Daemon on localhost N/A N/A Y 20383 NFS Server on 10.70.36.62 2049 0 Y 20041 Self-heal Daemon on 10.70.36.62 N/A N/A Y 20049 Bitrot Daemon on 10.70.36.62 N/A N/A Y 20057 Scrubber Daemon on 10.70.36.62 N/A N/A Y 20068 Task Status of Volume vol_tier ------------------------------------------------------------------------------ Task : Tier migration ID : ab8e4cb8-b79b-4b85-b673-1e04e3af42b7 Status : in progress
sos reports can be found at the link below: http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1229270/
Tier daemon tried to start during volume start, but failed since the brick was not up at the moment. Will be putting a fix soon.
*** This bug has been marked as a duplicate of bug 1276245 ***