1395219 – [Eventing]: VOLUME_REBALANCE_COMPLETE event seen as soon as we trigger 'gluster volume tier detach start'

Bug 1395219 - [Eventing]: VOLUME_REBALANCE_COMPLETE event seen as soon as we trigger 'gluster volume tier detach start'

Summary: [Eventing]: VOLUME_REBALANCE_COMPLETE event seen as soon as we trigger 'glust...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	eventsapi
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Milind Changire
QA Contact:
Docs Contact:
URL:
Whiteboard:	eventing, USM-integration
Depends On:	1397881
Blocks:
TreeView+	depends on / blocked

Reported:	2016-11-15 12:47 UTC by Sweta Anandpara
Modified:	2018-11-08 12:41 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1397881 (view as bug list)
Environment:
Last Closed:	2018-11-08 12:41:27 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Sweta Anandpara 2016-11-15 12:47:35 UTC

Description of problem:
======================

In a 4 node cluster, where we have a tiered volume with 1*(4+2) as cold and 2*2 as hot, when we execute a 'gluster volume tier <volname> detach start', we see a VOLUME_REBALANCE_COMPLETE event along with TIER_DETACH_START. The file migration from hot to cold is still in progress, and in spite of that we see a VOLUME_REBALANCE_COMPLETE, leading the consumer-of-events to believe that all files have been moved from hot to cold. 

A couple of concerns here:
Firstly, should we really be giving out a VOLUME_REBALANCE event when we do a tier_detach?
Secondly, shouldn't a REBALANCE_COMPLETE event be seen _after_ the file migration from hot to cold is complete, and not any other time?

Proposal: Maybe we could have a TIER_DETACH_COMPLETE event once file migration is complete, rather than VOLUME_REBALANCE_COMPLETE
That seems more correct, name per se..


Version-Release number of selected component (if applicable):
==========================================================
3.8.4-5


How reproducible:
=================
Always


Steps to Reproduce:
===================
1. Have a cluster with eventing enabled and webhook as a listener
2. Have a tiered volume, with about 2000 files present. 
3. Execute a 'gluster volume tier <volname> detach start' and monitor the events seen


Actual results:
==============
Step3 triggers a VOLUME_REBALANCE_COMPLETE along with TIER_DETACH_START


Expected results:
================
Only TIER_DETACH_START event should be seen and a REBALANCE_COMPLETE should be seen only after all files have been moved from hot to cold.


Additional info:
===================

[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# rpm -qa | grep gluster
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
glusterfs-api-3.8.4-5.el7rhgs.x86_64
python-gluster-3.8.4-5.el7rhgs.noarch
glusterfs-client-xlators-3.8.4-5.el7rhgs.x86_64
glusterfs-server-3.8.4-5.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-5.el7rhgs.x86_64
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-devel-3.8.4-5.el7rhgs.x86_64
gluster-nagios-addons-0.2.8-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-5.el7rhgs.x86_64
glusterfs-fuse-3.8.4-5.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-5.el7rhgs.x86_64
glusterfs-rdma-3.8.4-5.el7rhgs.x86_64
glusterfs-3.8.4-5.el7rhgs.x86_64
glusterfs-cli-3.8.4-5.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-5.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-4.el7rhgs.x86_64
glusterfs-events-3.8.4-5.el7rhgs.x86_64
[root@dhcp46-239 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.46.240
Uuid: 72c4f894-61f7-433e-a546-4ad2d7f0a176
State: Peer in Cluster (Connected)

Hostname: 10.70.46.242
Uuid: 1e8967ae-51b2-4c27-907e-a22a83107fd0
State: Peer in Cluster (Connected)

Hostname: 10.70.46.218
Uuid: 0dea52e0-8c32-4616-8ef8-16db16120eaa
State: Peer in Cluster (Connected)
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# 
[root@dhcp46-239 yum.repos.d]# 
[root@dhcp46-239 yum.repos.d]# gluster v info
 
Volume Name: ozone
Type: Tier
Volume ID: 376cdde0-194f-460a-b273-3904a704a7dd
Status: Started
Snapshot Count: 0
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: 10.70.46.218:/bricks/brick2/ozone_tier3
Brick2: 10.70.46.218:/bricks/brick2/ozone_tier2
Brick3: 10.70.46.218:/bricks/brick2/ozone_tier1
Brick4: 10.70.46.218:/bricks/brick2/ozone_tier0
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.46.239:/bricks/brick0/ozone0
Brick6: 10.70.46.240:/bricks/brick0/ozone2
Brick7: 10.70.46.242:/bricks/brick0/ozone2
Brick8: 10.70.46.239:/bricks/brick1/ozone3
Brick9: 10.70.46.240:/bricks/brick1/ozone4
Brick10: 10.70.46.242:/bricks/brick1/ozone5
Options Reconfigured:
features.scrub-freq: minute
features.scrub: Active
features.bitrot: on
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.enable-shared-storage: disable
[root@dhcp46-239 yum.repos.d]#
[root@dhcp46-239 ~]# gluster v tier ozone detach start
volume detach-tier start: success
ID: 41e86ff1-c890-45d9-a8c3-2672b4694eeb
[root@dhcp46-239 ~]# gluster v tier ozone detach status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.46.218                0        0Bytes             0             0             0          in progress        0:0:0
[root@dhcp46-239 ~]# gluster v tier ozone detach status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.46.218                0        0Bytes             0             0             0          in progress        0:0:0
[root@dhcp46-239 ~]#
[root@dhcp46-239 ~]# gluster v tier ozone detach status
                                    Node Rebalanced-files          size       scanned      failures       skipped               status  run time in h:m:s
                               ---------      -----------   -----------   -----------   -----------   -----------         ------------     --------------
                            10.70.46.218             2131       102.1MB          2131             0             0            completed        0:16:14
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# 



EVENTS
---------

bash-4.3$ grep -v "200" tier_detach_start | grep -v "####" | grep -v "CLIENT_" | grep -v "EC_" | grep -v "SVC" | grep -v "AFR"
{u'message': {u'volume': u'ozone'}, u'event': u'VOLUME_REBALANCE_COMPLETE', u'ts': 1479207699, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}
{u'message': {u'vol': u'ozone'}, u'event': u'TIER_DETACH_START', u'ts': 1479207709, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}

Comment 3 Sweta Anandpara 2016-11-21 05:11:34 UTC

Engineering discussion is still on (in mail chain) to reach an agreement between dev and QE. Also, I suppose RCA is in progress. The bug is fairly easy to reproduce. 

Clearing the need-info for now, as this BZ is not waiting on me.

Comment 4 Atin Mukherjee 2016-11-23 15:07:45 UTC

upstream patch http://review.gluster.org/#/c/15919/ posted for review.

Note You need to log in before you can comment on or make changes to this bug.