Bug 1517131

Summary: Grafana continues to list/display old bricks subsequent to a snapshot restore
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vinayak Papnoi <vpapnoi>
Component: web-admin-tendrl-monitoring-integrationAssignee: Darshan <dnarayan>
Status: CLOSED ERRATA QA Contact: Rochelle <rallan>
Severity: high Docs Contact:
Priority: unspecified    
Version: rhgs-3.3CC: bmekala, dnarayan, mkarnik, negupta, nthomas, rallan, rhinduja, rhs-bugs, sanandpa, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-18 04:38:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The volume is 2x2 but the Bricks section is showing 8 bricks also the old bricks (/rhs/brick*) are present none

Description Vinayak Papnoi 2017-11-24 09:51:00 UTC
Created attachment 1358578 [details]
The volume is 2x2 but the Bricks section is showing 8 bricks also the old bricks (/rhs/brick*) are present

Description of problem:
=======================

After doing a snapshot restore of a volume the old bricks are not removed from the list of bricks of that volume in Grafana Dashboard.


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.8.4-52.el7rhgs.x86_64
tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch
tendrl-ansible-1.5.4-1.el7rhgs.noarch
tendrl-selinux-1.5.3-2.el7rhgs.noarch
tendrl-node-agent-1.5.4-5.el7rhgs.noarch
tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch
tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch
tendrl-commons-1.5.4-4.el7rhgs.noarch
tendrl-api-1.5.4-2.el7rhgs.noarch
tendrl-api-httpd-1.5.4-2.el7rhgs.noarch
tendrl-notifier-1.5.4-3.el7rhgs.noarch
tendrl-ui-1.5.4-4.el7rhgs.noarch


How reproducible:
=================
1/1


Steps to Reproduce:
===================

1. Create a volume
2. Create a snapshot of the volume
3. Stop the volume and Activate and restore the snapshot
4. Start the volume

Actual results:
===============

Old bricks present.


Expected results:
=================

Old bricks shouldn't be present.


Additional info:
================

[root@dhcp43-93 glusterfs]# gluster v status speedster
Status of volume: speedster
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp43-93.lab.eng.blr.redhat.com:/run
/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf038
73b4/brick1/b1                              49152     0          Y       7721 
Brick dhcp41-170.lab.eng.blr.redhat.com:/ru
n/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03
873b4/brick2/b2                             49152     0          Y       27235
Brick dhcp43-93.lab.eng.blr.redhat.com:/run
/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf038
73b4/brick3/b3                              49153     0          Y       7754 
Brick dhcp41-170.lab.eng.blr.redhat.com:/ru
n/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03
873b4/brick4/b4                             49153     0          Y       27256
Self-heal Daemon on localhost               N/A       N/A        Y       2882 
Self-heal Daemon on dhcp41-170.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       21118
 
Task Status of Volume speedster
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp43-93 glusterfs]# gluster v info
Volume Name: speedster
Type: Distributed-Replicate
Volume ID: c4a9eacd-1c97-4c44-97fb-8619bf348dde
Status: Started
Snapshot Count: 254
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: dhcp43-93.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick1/b1
Brick2: dhcp41-170.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick2/b2
Brick3: dhcp43-93.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick3/b3
Brick4: dhcp41-170.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick4/b4
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
transport.address-family: inet
nfs.disable: on
features.quota: off
features.inode-quota: off
features.quota-deem-statfs: off
snap-activate-on-create: enable
auto-delete: enable

Comment 4 Darshan 2017-11-24 12:52:44 UTC
The fix for this would involve following steps:

1. listen to "snapshot_restore" gluster API event in tendrl. This just provides the volume name.

2. When above event is received, tendrl has to make another get state call to get the list of current bricks for restored volume.

3. Read the bricks for this volume from data store(etcd).

4. Check for bricks that are in data store but not in latest get-state call, These are the bricks to be removed.

5. Remove the bricks from data store(etcd).

6. Submit job for montoring-integration to remove bricks from graphite.

Comment 6 Bala Konda Reddy M 2017-12-05 16:08:03 UTC
Verified with tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch

On a successfully imported cluster, created volumes and snapshot. Stopped the volume and performed snapshot restore.

I am able to see new bricks on the bricks dashboard after snapshot restore.

Hence marking it as verified

Comment 8 errata-xmlrpc 2017-12-18 04:38:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3478