Bug 1386472

Summary: [Eventing]: 'VOLUME_REBALANCE' event messages have an incorrect volume name
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: glusterfsAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED ERRATA QA Contact: Sweta Anandpara <sanandpa>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: amukherj, rhinduja, vbellur
Target Milestone: ---   
Target Release: RHGS 3.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.8.4-4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1388010 (view as bug list) Environment:
Last Closed: 2017-03-23 06:12:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1388010, 1388563    
Bug Blocks: 1351528    

Description Sweta Anandpara 2016-10-19 04:56:56 UTC
Description of problem:
======================
Have a 4 node cluster with eventing enabled. VOLUME_REBALANCE_COMPLETE and VOLUME_REBALANCE_FAILED event messages have the attribute 'volume' with the first letter missing. This event no longer remains a distinguishable event, as the consumer (in this case, USM) will not be able to act/respond when it receives one with an incorrect volume name.


Version-Release number of selected component (if applicable):
============================================================
3.8.4-2


How reproducible:
=================
Always


Steps to Reproduce:
==================
1. Have a 4 node cluster with eventing enabled.
2. Create a disperse volume 'disp' and attach a 2*2 hot tier
3. Perform a tier_detach

Step3 internally triggers rebalance, prompting an event to be generated. 


Actual results:
===============
tier_detach_force:{u'message': {u'volume': u'isp'}, u'event': u'VOLUME_REBALANCE_FAILED', u'ts': 1476851588, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'}

tier_detach_start_after_stop:{u'message': {u'volume': u'isp'}, u'event': u'VOLUME_REBALANCE_COMPLETE', u'ts': 1476850670, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}


Expected results:
=================
Volume name in the above 2 events should have been 'disp' and not 'isp'


Additional info:
===============

[root@dhcp46-239 ~]# gluster peer status
Number of Peers: 3

Hostname: 10.70.46.240
Uuid: 72c4f894-61f7-433e-a546-4ad2d7f0a176
State: Peer in Cluster (Connected)

Hostname: 10.70.46.242
Uuid: 1e8967ae-51b2-4c27-907e-a22a83107fd0
State: Peer in Cluster (Connected)

Hostname: 10.70.46.218
Uuid: 0dea52e0-8c32-4616-8ef8-16db16120eaa
State: Peer in Cluster (Connected)
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# rpm -qa | grep gluster
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
glusterfs-3.8.4-2.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-2.el7rhgs.x86_64
glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64
glusterfs-libs-3.8.4-2.el7rhgs.x86_64
glusterfs-api-3.8.4-2.el7rhgs.x86_64
python-gluster-3.8.4-2.el7rhgs.noarch
glusterfs-geo-replication-3.8.4-2.el7rhgs.x86_64
glusterfs-rdma-3.8.4-2.el7rhgs.x86_64
glusterfs-fuse-3.8.4-2.el7rhgs.x86_64
glusterfs-cli-3.8.4-2.el7rhgs.x86_64
glusterfs-server-3.8.4-2.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-2.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-2.el7rhgs.x86_64
glusterfs-devel-3.8.4-2.el7rhgs.x86_64
glusterfs-events-3.8.4-2.el7rhgs.x86_64
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# gluster v info
 
Volume Name: disp
Type: Tier
Volume ID: a9999464-b094-4213-a422-c11fed555674
Status: Started
Snapshot Count: 0
Number of Bricks: 10
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distribute
Number of Bricks: 4
Brick1: 10.70.46.218:/bricks/brick2/disp_tier4
Brick2: 10.70.46.242:/bricks/brick2/disp_tier3
Brick3: 10.70.46.240:/bricks/brick2/disp_tier2
Brick4: 10.70.46.239:/bricks/brick2/disp_tier1
Cold Tier:
Cold Tier Type : Disperse
Number of Bricks: 1 x (4 + 2) = 6
Brick5: 10.70.46.239:/bricks/brick0/disp1
Brick6: 10.70.46.240:/bricks/brick0/disp2
Brick7: 10.70.46.242:/bricks/brick0/disp3
Brick8: 10.70.46.218:/bricks/brick0/disp4
Brick9: 10.70.46.239:/bricks/brick1/disp5
Brick10: 10.70.46.240:/bricks/brick1/disp6
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
transport.address-family: inet
performance.readdir-ahead: on
cluster.enable-shared-storage: enable
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# 
[root@dhcp46-239 ~]# gluster v tier disp detach start
volume detach-tier start: success
ID: b6bd807e-1c0c-4f23-a70c-0134d93506f3
[root@dhcp46-239 ~]#

Comment 4 Nithya Balachandran 2016-10-24 08:31:03 UTC
Upstream patch:
master: http://review.gluster.org/#/c/15712

Comment 5 Nithya Balachandran 2016-10-25 10:32:50 UTC
RCA:

Gluster translators do not store the actual volume name anywhere. Each translator appends a specific string to the volume name and stores this value in this->name.
For dht, the suffix is "-dht" so this->name actually contains <volname>-dht.

The event framework requires the actual volume name to be sent. The rebalance code incorrectly used strtok to parse the volume name by using "-dht" as the delimiter. strtok () treats every char in the delim string as a delimiter. So the parsing fails for a volume which contains 'd', 'h', or 't' in its name.

Fix:
The code was rewritten to use strstr instead.

Comment 6 Nithya Balachandran 2016-10-26 03:57:50 UTC
Upstream patches:

master: http://review.gluster.org/15712
release-3.9: http://review.gluster.org/#/c/15725/

Comment 9 Sweta Anandpara 2016-11-14 10:09:37 UTC
Tested and verified this on the build 3.8.4-5

Followed the steps in the description, triggered a rebalance by doing 'tier detach' and was able to see the correct volume name in the corresponding 'VOLUME_REBALANCE' events. Moving this BZ to verified in 3.2

{u'message': {u'volume': u'ozone'}, u'event': u'VOLUME_REBALANCE_COMPLETE', u'ts': 1479110111, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}
{u'message': {u'volume': u'ozone'}, u'event': u'VOLUME_REBALANCE_FAILED', u'ts': 1479110225, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}

Comment 11 errata-xmlrpc 2017-03-23 06:12:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html