1559421 – Sometimes delete flag for the deleted volumes is changed to False

Bug 1559421 - Sometimes delete flag for the deleted volumes is changed to False

Summary: Sometimes delete flag for the deleted volumes is changed to False

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	web-admin-tendrl-gluster-integration
Sub Component:
Version:	rhgs-3.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.4.0
Assignee:	gowtham
QA Contact:	Filip Balák
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1503137
TreeView+	depends on / blocked

Reported:	2018-03-22 14:11 UTC by gowtham
Modified:	2018-09-04 07:03 UTC (History)
CC List:	6 users (show)
Fixed In Version:	tendrl-monitoring-integration-1.6.1-3.el7rhgs, tendrl-gluster-integration-1.6.1-3.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-04 07:02:28 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	Tendrl gluster-integration issues 575	None	None	None	2018-03-28 16:29:47 UTC
Github	Tendrl monitoring-integration issues 375	None	None	None	2018-03-28 16:28:47 UTC
Red Hat Product Errata	RHSA-2018:2616	None	None	None	2018-09-04 07:03:21 UTC

Description gowtham 2018-03-22 14:11:50 UTC

Description of problem:
When the volume is deleted then deleted flag is updated as True for the particular volume. But sometimes again the deleted flag is updated by False.

Version-Release number of selected component (if applicable):


How reproducible:
Deleted some volume from CLI and check deleted flag is True for that volume in the central store. But after few seconds again the flag is updated as False. Because of this problem monitoring-integration again creating a panel in the dashboard for deleted volume.

Steps to Reproduce:
1. Delete volume from CLI
2. Keep checking deleted flag for that volume in the central store
3. After few seconds deleted flag is updated as False from True

Actual results:
The deleted flag is False for deleted volumes

Expected results:
The deleted flag should always show True for deleted volumes

Additional info:

Comment 2 Martin Bukatovic 2018-03-28 08:17:18 UTC

Could you provide more details about:

* How to inspect value of flag in question
* What does sometimes means here? once in how many retries?
* Link to upstream merge request.
* Version where the bug is present
* Is it possible to reproduce this on previously released RHGS WA?

Comment 3 gowtham 2018-03-28 16:28:48 UTC

It is reproducible in rhgs 1.6.1-1, In central store we are storing a volume information in /cluster/{cid}/Volumes/{vid}/deleted. Each volume object has a member variable called deleted with a default value is False. When the volume is deleted then deleted flag is modified as True. But after few minutes again the deleted flag is updated as False. So monitoring-integration creating an alert dashbaord panel for each volume based on this deleted flag only. If volume deleted then alert panel for that volume is deleted and the flag is marked as True. but if after few minutes of volume deletion if the flag is marked by some thread so alert panel for the deleted volume is created again in grafana.


To see alert dashbaord in grafana:
1. do sign-in in grafna with a valid credential 
2. switch organization to "Alert Dashbaord" organization
3. Press Home to list dashboards.

Comment 4 gowtham 2018-03-28 16:31:32 UTC

but it won't happen frequently, it is actually happening because of race condition.

Comment 6 gowtham 2018-03-28 16:39:11 UTC

sorry, In version 3.3.1 it can be reproducible

Comment 7 gowtham 2018-03-28 17:13:48 UTC

I meant 1.6.1-1 is actually RHGS-WA NVR 1.6.1-1

Comment 8 Martin Bukatovic 2018-03-29 11:42:41 UTC

Based on the information provided here I'm assuming that:

* it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new
  feature of RHGS WA 3.4

* qe team will verify this by running scenario (using additional details in
  comment 3) multiple times, but can't approach it as standard bug verification
  as it's not reproducible in older builds

* alert dashbaord in grafana is mentioned only as a hint for qe team and this
  feature is still not supported (as this is not documented, there is no feature
  BZ for it and my understanding was that this is internal implementation details
  to support alerts[1])

[1] see eg. this note from Mrugesh:

> Alerts org will contain the panels created for alert callbacks and will be
> hidden from the end users.

from https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800

Is my understanding correct? If yes, I'm going to provide the qe ack.

Comment 9 Nishanth Thomas 2018-03-29 11:53:54 UTC


(In reply to Martin Bukatovic from comment #8)
> Based on the information provided here I'm assuming that:
> 
> * it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new
>   feature of RHGS WA 3.4
> 

Out of band deletion of volume is supported from 3.3.1 so you might see this issue in 3.3.1 as well unless it is introduced during 3.4.0 development.

> * qe team will verify this by running scenario (using additional details in
>   comment 3) multiple times, but can't approach it as standard bug
> verification
>   as it's not reproducible in older builds
> 
> * alert dashbaord in grafana is mentioned only as a hint for qe team and this
>   feature is still not supported (as this is not documented, there is no
> feature
>   BZ for it and my understanding was that this is internal implementation
> details
>   to support alerts[1])
> 
> [1] see eg. this note from Mrugesh:
> 
> > Alerts org will contain the panels created for alert callbacks and will be
> > hidden from the end users.
> 
> from
> https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800
> 
> Is my understanding correct? If yes, I'm going to provide the qe ack.

Yes, alert dashbaord is not supposed to be used by the end-users and hidden

Comment 12 Filip Balák 2018-07-26 13:30:53 UTC

When volume is deleted in current build the record of volume is erased from etcd (/clusters/<cluster-id>/Volumes) and from alert dashboard. Before it is erased there are fields in /clusters/<cluster-id>/Volumes/<volume-id>/data `deleted` and `deleted_at` that are set to "" by default and are filled with data ("deleted_at": "<timestamp>", "deleted": true) when the volume is deleted.

When these records related to the volume in etcd are erased, there is no way how WA can restore them for deleted volume so the issue can not happen in current build, right?

Tested with:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

Comment 13 gowtham 2018-07-27 06:58:02 UTC

In old builds also we have TTL for volumes to delete when it is deleted from CLI.
Why we are using deleted flag is it will take some time to delete by TTL so in gluster-integration sync and monitoring-integration sync we need some flag to omit deleted volume to calculate some volume details and creating panels in alert dashbaord. So we used this deleted flag for that purpose. 

The problem in an old build is if a volume is deleted we are capturing gluster-event for delete volume and we are deleting it from grafana alert dashboard. But if it again marked as deleted is "", then again panel for that volume is created in alert dashbaord and it remains forever. 

Now in the new build, we fixed this problem like deleted flag marked as properly after deleted. And also even any problem in deletion flag then monitoring-integration will have a intelligence to remove volume panel when a volume is deleted by TTL. So it won't occur.

Comment 14 gowtham 2018-07-27 07:02:39 UTC

The logic in monitoring-integration is like if sync collect list of volume like A, B, C then it will create volume panels in alert-dashboard for all A, B, C then next sync if sync collects only A, B then it will compare dashboard panels with newly collected data so C is missing so it will remove panel for C. Even if monitoring-integration down when volume is deleted and Volume detail is deleted by TTL form etcd then it can remove deleted volume detail from grafana dashbaord when monitoring-integration comes up.

Comment 15 Filip Balák 2018-07-27 10:57:10 UTC

Tested 10 times and it seems to be fixed. Records from etcd and alert dashboards were deleted every time. --> VERIFIED

Tested with:
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch

Comment 17 errata-xmlrpc 2018-09-04 07:02:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2616

Note You need to log in before you can comment on or make changes to this bug.