Bug 1559421

Summary:	Sometimes delete flag for the deleted volumes is changed to False
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	gowtham <gshanmug>
Component:	web-admin-tendrl-gluster-integration	Assignee:	gowtham <gshanmug>
Status:	CLOSED ERRATA	QA Contact:	Filip Balák <fbalak>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.4	CC:	fbalak, gshanmug, mbukatov, nthomas, rhs-bugs, sankarshan
Target Milestone:	---
Target Release:	RHGS 3.4.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	tendrl-monitoring-integration-1.6.1-3.el7rhgs, tendrl-gluster-integration-1.6.1-3.el7rhgs	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-09-04 07:02:28 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1503137

Description gowtham 2018-03-22 14:11:50 UTC

Description of problem:
When the volume is deleted then deleted flag is updated as True for the particular volume. But sometimes again the deleted flag is updated by False.

Version-Release number of selected component (if applicable):


How reproducible:
Deleted some volume from CLI and check deleted flag is True for that volume in the central store. But after few seconds again the flag is updated as False. Because of this problem monitoring-integration again creating a panel in the dashboard for deleted volume.

Steps to Reproduce:
1. Delete volume from CLI
2. Keep checking deleted flag for that volume in the central store
3. After few seconds deleted flag is updated as False from True

Actual results:
The deleted flag is False for deleted volumes

Expected results:
The deleted flag should always show True for deleted volumes

Additional info:

Comment 2 Martin Bukatovic 2018-03-28 08:17:18 UTC

Could you provide more details about:

* How to inspect value of flag in question
* What does sometimes means here? once in how many retries?
* Link to upstream merge request.
* Version where the bug is present
* Is it possible to reproduce this on previously released RHGS WA?

Comment 3 gowtham 2018-03-28 16:28:48 UTC

It is reproducible in rhgs 1.6.1-1, In central store we are storing a volume information in /cluster/{cid}/Volumes/{vid}/deleted. Each volume object has a member variable called deleted with a default value is False. When the volume is deleted then deleted flag is modified as True. But after few minutes again the deleted flag is updated as False. So monitoring-integration creating an alert dashbaord panel for each volume based on this deleted flag only. If volume deleted then alert panel for that volume is deleted and the flag is marked as True. but if after few minutes of volume deletion if the flag is marked by some thread so alert panel for the deleted volume is created again in grafana.


To see alert dashbaord in grafana:
1. do sign-in in grafna with a valid credential 
2. switch organization to "Alert Dashbaord" organization
3. Press Home to list dashboards.

Comment 4 gowtham 2018-03-28 16:31:32 UTC

but it won't happen frequently, it is actually happening because of race condition.

Comment 6 gowtham 2018-03-28 16:39:11 UTC

sorry, In version 3.3.1 it can be reproducible

Comment 7 gowtham 2018-03-28 17:13:48 UTC

I meant 1.6.1-1 is actually RHGS-WA NVR 1.6.1-1

Comment 8 Martin Bukatovic 2018-03-29 11:42:41 UTC

Based on the information provided here I'm assuming that:

* it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new
  feature of RHGS WA 3.4

* qe team will verify this by running scenario (using additional details in
  comment 3) multiple times, but can't approach it as standard bug verification
  as it's not reproducible in older builds

* alert dashbaord in grafana is mentioned only as a hint for qe team and this
  feature is still not supported (as this is not documented, there is no feature
  BZ for it and my understanding was that this is internal implementation details
  to support alerts[1])

[1] see eg. this note from Mrugesh:

> Alerts org will contain the panels created for alert callbacks and will be
> hidden from the end users.

from https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800

Is my understanding correct? If yes, I'm going to provide the qe ack.

Comment 9 Nishanth Thomas 2018-03-29 11:53:54 UTC


(In reply to Martin Bukatovic from comment #8)
> Based on the information provided here I'm assuming that:
> 
> * it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new
>   feature of RHGS WA 3.4
> 

Out of band deletion of volume is supported from 3.3.1 so you might see this issue in 3.3.1 as well unless it is introduced during 3.4.0 development.

> * qe team will verify this by running scenario (using additional details in
>   comment 3) multiple times, but can't approach it as standard bug
> verification
>   as it's not reproducible in older builds
> 
> * alert dashbaord in grafana is mentioned only as a hint for qe team and this
>   feature is still not supported (as this is not documented, there is no
> feature
>   BZ for it and my understanding was that this is internal implementation
> details
>   to support alerts[1])
> 
> [1] see eg. this note from Mrugesh:
> 
> > Alerts org will contain the panels created for alert callbacks and will be
> > hidden from the end users.
> 
> from
> https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800
> 
> Is my understanding correct? If yes, I'm going to provide the qe ack.

Yes, alert dashbaord is not supposed to be used by the end-users and hidden

Comment 12 Filip Balák 2018-07-26 13:30:53 UTC

When volume is deleted in current build the record of volume is erased from etcd (/clusters/<cluster-id>/Volumes) and from alert dashboard. Before it is erased there are fields in /clusters/<cluster-id>/Volumes/<volume-id>/data `deleted` and `deleted_at` that are set to "" by default and are filled with data ("deleted_at": "<timestamp>", "deleted": true) when the volume is deleted.

When these records related to the volume in etcd are erased, there is no way how WA can restore them for deleted volume so the issue can not happen in current build, right?

Tested with:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

Comment 13 gowtham 2018-07-27 06:58:02 UTC

In old builds also we have TTL for volumes to delete when it is deleted from CLI.
Why we are using deleted flag is it will take some time to delete by TTL so in gluster-integration sync and monitoring-integration sync we need some flag to omit deleted volume to calculate some volume details and creating panels in alert dashbaord. So we used this deleted flag for that purpose. 

The problem in an old build is if a volume is deleted we are capturing gluster-event for delete volume and we are deleting it from grafana alert dashboard. But if it again marked as deleted is "", then again panel for that volume is created in alert dashbaord and it remains forever. 

Now in the new build, we fixed this problem like deleted flag marked as properly after deleted. And also even any problem in deletion flag then monitoring-integration will have a intelligence to remove volume panel when a volume is deleted by TTL. So it won't occur.

Comment 14 gowtham 2018-07-27 07:02:39 UTC

The logic in monitoring-integration is like if sync collect list of volume like A, B, C then it will create volume panels in alert-dashboard for all A, B, C then next sync if sync collects only A, B then it will compare dashboard panels with newly collected data so C is missing so it will remove panel for C. Even if monitoring-integration down when volume is deleted and Volume detail is deleted by TTL form etcd then it can remove deleted volume detail from grafana dashbaord when monitoring-integration comes up.

Comment 15 Filip Balák 2018-07-27 10:57:10 UTC

Tested 10 times and it seems to be fixed. Records from etcd and alert dashboards were deleted every time. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

Comment 17 errata-xmlrpc 2018-09-04 07:02:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616