Description of problem: ====================== Have a 4 node cluster with eventing enabled. VOLUME_REBALANCE_COMPLETE and VOLUME_REBALANCE_FAILED event messages have the attribute 'volume' with the first letter missing. This event no longer remains a distinguishable event, as the consumer (in this case, USM) will not be able to act/respond when it receives one with an incorrect volume name. Version-Release number of selected component (if applicable): ============================================================ 3.8.4-2 How reproducible: ================= Always Steps to Reproduce: ================== 1. Have a 4 node cluster with eventing enabled. 2. Create a disperse volume 'disp' and attach a 2*2 hot tier 3. Perform a tier_detach Step3 internally triggers rebalance, prompting an event to be generated. Actual results: =============== tier_detach_force:{u'message': {u'volume': u'isp'}, u'event': u'VOLUME_REBALANCE_FAILED', u'ts': 1476851588, u'nodeid': u'72c4f894-61f7-433e-a546-4ad2d7f0a176'} tier_detach_start_after_stop:{u'message': {u'volume': u'isp'}, u'event': u'VOLUME_REBALANCE_COMPLETE', u'ts': 1476850670, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'} Expected results: ================= Volume name in the above 2 events should have been 'disp' and not 'isp' Additional info: =============== [root@dhcp46-239 ~]# gluster peer status Number of Peers: 3 Hostname: 10.70.46.240 Uuid: 72c4f894-61f7-433e-a546-4ad2d7f0a176 State: Peer in Cluster (Connected) Hostname: 10.70.46.242 Uuid: 1e8967ae-51b2-4c27-907e-a22a83107fd0 State: Peer in Cluster (Connected) Hostname: 10.70.46.218 Uuid: 0dea52e0-8c32-4616-8ef8-16db16120eaa State: Peer in Cluster (Connected) [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# rpm -qa | grep gluster nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64 glusterfs-3.8.4-2.el7rhgs.x86_64 glusterfs-api-devel-3.8.4-2.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-1.el7rhgs.x86_64 glusterfs-libs-3.8.4-2.el7rhgs.x86_64 glusterfs-api-3.8.4-2.el7rhgs.x86_64 python-gluster-3.8.4-2.el7rhgs.noarch glusterfs-geo-replication-3.8.4-2.el7rhgs.x86_64 glusterfs-rdma-3.8.4-2.el7rhgs.x86_64 glusterfs-fuse-3.8.4-2.el7rhgs.x86_64 glusterfs-cli-3.8.4-2.el7rhgs.x86_64 glusterfs-server-3.8.4-2.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-2.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-2.el7rhgs.x86_64 glusterfs-devel-3.8.4-2.el7rhgs.x86_64 glusterfs-events-3.8.4-2.el7rhgs.x86_64 [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster v info Volume Name: disp Type: Tier Volume ID: a9999464-b094-4213-a422-c11fed555674 Status: Started Snapshot Count: 0 Number of Bricks: 10 Transport-type: tcp Hot Tier : Hot Tier Type : Distribute Number of Bricks: 4 Brick1: 10.70.46.218:/bricks/brick2/disp_tier4 Brick2: 10.70.46.242:/bricks/brick2/disp_tier3 Brick3: 10.70.46.240:/bricks/brick2/disp_tier2 Brick4: 10.70.46.239:/bricks/brick2/disp_tier1 Cold Tier: Cold Tier Type : Disperse Number of Bricks: 1 x (4 + 2) = 6 Brick5: 10.70.46.239:/bricks/brick0/disp1 Brick6: 10.70.46.240:/bricks/brick0/disp2 Brick7: 10.70.46.242:/bricks/brick0/disp3 Brick8: 10.70.46.218:/bricks/brick0/disp4 Brick9: 10.70.46.239:/bricks/brick1/disp5 Brick10: 10.70.46.240:/bricks/brick1/disp6 Options Reconfigured: cluster.tier-mode: cache features.ctr-enabled: on transport.address-family: inet performance.readdir-ahead: on cluster.enable-shared-storage: enable [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# [root@dhcp46-239 ~]# gluster v tier disp detach start volume detach-tier start: success ID: b6bd807e-1c0c-4f23-a70c-0134d93506f3 [root@dhcp46-239 ~]#
Upstream patch: master: http://review.gluster.org/#/c/15712
RCA: Gluster translators do not store the actual volume name anywhere. Each translator appends a specific string to the volume name and stores this value in this->name. For dht, the suffix is "-dht" so this->name actually contains <volname>-dht. The event framework requires the actual volume name to be sent. The rebalance code incorrectly used strtok to parse the volume name by using "-dht" as the delimiter. strtok () treats every char in the delim string as a delimiter. So the parsing fails for a volume which contains 'd', 'h', or 't' in its name. Fix: The code was rewritten to use strstr instead.
Upstream patches: master: http://review.gluster.org/15712 release-3.9: http://review.gluster.org/#/c/15725/
Tested and verified this on the build 3.8.4-5 Followed the steps in the description, triggered a rebalance by doing 'tier detach' and was able to see the correct volume name in the corresponding 'VOLUME_REBALANCE' events. Moving this BZ to verified in 3.2 {u'message': {u'volume': u'ozone'}, u'event': u'VOLUME_REBALANCE_COMPLETE', u'ts': 1479110111, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'} {u'message': {u'volume': u'ozone'}, u'event': u'VOLUME_REBALANCE_FAILED', u'ts': 1479110225, u'nodeid': u'ed362eb3-421c-4a25-ad0e-82ef157ea328'}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html