Created attachment 1358578 [details] The volume is 2x2 but the Bricks section is showing 8 bricks also the old bricks (/rhs/brick*) are present Description of problem: ======================= After doing a snapshot restore of a volume the old bricks are not removed from the list of bricks of that volume in Grafana Dashboard. Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.8.4-52.el7rhgs.x86_64 tendrl-grafana-plugins-1.5.4-5.el7rhgs.noarch tendrl-ansible-1.5.4-1.el7rhgs.noarch tendrl-selinux-1.5.3-2.el7rhgs.noarch tendrl-node-agent-1.5.4-5.el7rhgs.noarch tendrl-monitoring-integration-1.5.4-5.el7rhgs.noarch tendrl-grafana-selinux-1.5.3-2.el7rhgs.noarch tendrl-commons-1.5.4-4.el7rhgs.noarch tendrl-api-1.5.4-2.el7rhgs.noarch tendrl-api-httpd-1.5.4-2.el7rhgs.noarch tendrl-notifier-1.5.4-3.el7rhgs.noarch tendrl-ui-1.5.4-4.el7rhgs.noarch How reproducible: ================= 1/1 Steps to Reproduce: =================== 1. Create a volume 2. Create a snapshot of the volume 3. Stop the volume and Activate and restore the snapshot 4. Start the volume Actual results: =============== Old bricks present. Expected results: ================= Old bricks shouldn't be present. Additional info: ================ [root@dhcp43-93 glusterfs]# gluster v status speedster Status of volume: speedster Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp43-93.lab.eng.blr.redhat.com:/run /gluster/snaps/4f853d0c5e5c4fb0ad31b1edf038 73b4/brick1/b1 49152 0 Y 7721 Brick dhcp41-170.lab.eng.blr.redhat.com:/ru n/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03 873b4/brick2/b2 49152 0 Y 27235 Brick dhcp43-93.lab.eng.blr.redhat.com:/run /gluster/snaps/4f853d0c5e5c4fb0ad31b1edf038 73b4/brick3/b3 49153 0 Y 7754 Brick dhcp41-170.lab.eng.blr.redhat.com:/ru n/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03 873b4/brick4/b4 49153 0 Y 27256 Self-heal Daemon on localhost N/A N/A Y 2882 Self-heal Daemon on dhcp41-170.lab.eng.blr. redhat.com N/A N/A Y 21118 Task Status of Volume speedster ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp43-93 glusterfs]# gluster v info Volume Name: speedster Type: Distributed-Replicate Volume ID: c4a9eacd-1c97-4c44-97fb-8619bf348dde Status: Started Snapshot Count: 254 Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: dhcp43-93.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick1/b1 Brick2: dhcp41-170.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick2/b2 Brick3: dhcp43-93.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick3/b3 Brick4: dhcp41-170.lab.eng.blr.redhat.com:/run/gluster/snaps/4f853d0c5e5c4fb0ad31b1edf03873b4/brick4/b4 Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on transport.address-family: inet nfs.disable: on features.quota: off features.inode-quota: off features.quota-deem-statfs: off snap-activate-on-create: enable auto-delete: enable
The fix for this would involve following steps: 1. listen to "snapshot_restore" gluster API event in tendrl. This just provides the volume name. 2. When above event is received, tendrl has to make another get state call to get the list of current bricks for restored volume. 3. Read the bricks for this volume from data store(etcd). 4. Check for bricks that are in data store but not in latest get-state call, These are the bricks to be removed. 5. Remove the bricks from data store(etcd). 6. Submit job for montoring-integration to remove bricks from graphite.
Verified with tendrl-monitoring-integration-1.5.4-11.el7rhgs.noarch On a successfully imported cluster, created volumes and snapshot. Stopped the volume and performed snapshot restore. I am able to see new bricks on the bricks dashboard after snapshot restore. Hence marking it as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3478