Bug 1559421
| Summary: | Sometimes delete flag for the deleted volumes is changed to False | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | gowtham <gshanmug> |
| Component: | web-admin-tendrl-gluster-integration | Assignee: | gowtham <gshanmug> |
| Status: | CLOSED ERRATA | QA Contact: | Filip Balák <fbalak> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | rhgs-3.4 | CC: | fbalak, gshanmug, mbukatov, nthomas, rhs-bugs, sankarshan |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.4.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | tendrl-monitoring-integration-1.6.1-3.el7rhgs, tendrl-gluster-integration-1.6.1-3.el7rhgs | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-04 07:02:28 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1503137 | ||
|
Description
gowtham
2018-03-22 14:11:50 UTC
Could you provide more details about: * How to inspect value of flag in question * What does sometimes means here? once in how many retries? * Link to upstream merge request. * Version where the bug is present * Is it possible to reproduce this on previously released RHGS WA? It is reproducible in rhgs 1.6.1-1, In central store we are storing a volume information in /cluster/{cid}/Volumes/{vid}/deleted. Each volume object has a member variable called deleted with a default value is False. When the volume is deleted then deleted flag is modified as True. But after few minutes again the deleted flag is updated as False. So monitoring-integration creating an alert dashbaord panel for each volume based on this deleted flag only. If volume deleted then alert panel for that volume is deleted and the flag is marked as True. but if after few minutes of volume deletion if the flag is marked by some thread so alert panel for the deleted volume is created again in grafana.
To see alert dashbaord in grafana:
1. do sign-in in grafna with a valid credential
2. switch organization to "Alert Dashbaord" organization
3. Press Home to list dashboards.
but it won't happen frequently, it is actually happening because of race condition. sorry, In version 3.3.1 it can be reproducible I meant 1.6.1-1 is actually RHGS-WA NVR 1.6.1-1 Based on the information provided here I'm assuming that: * it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new feature of RHGS WA 3.4 * qe team will verify this by running scenario (using additional details in comment 3) multiple times, but can't approach it as standard bug verification as it's not reproducible in older builds * alert dashbaord in grafana is mentioned only as a hint for qe team and this feature is still not supported (as this is not documented, there is no feature BZ for it and my understanding was that this is internal implementation details to support alerts[1]) [1] see eg. this note from Mrugesh: > Alerts org will contain the panels created for alert callbacks and will be > hidden from the end users. from https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800 Is my understanding correct? If yes, I'm going to provide the qe ack. (In reply to Martin Bukatovic from comment #8) > Based on the information provided here I'm assuming that: > > * it's not reproducible in RHGS WA 3.3.1, and as such it's connected to a new > feature of RHGS WA 3.4 > Out of band deletion of volume is supported from 3.3.1 so you might see this issue in 3.3.1 as well unless it is introduced during 3.4.0 development. > * qe team will verify this by running scenario (using additional details in > comment 3) multiple times, but can't approach it as standard bug > verification > as it's not reproducible in older builds > > * alert dashbaord in grafana is mentioned only as a hint for qe team and this > feature is still not supported (as this is not documented, there is no > feature > BZ for it and my understanding was that this is internal implementation > details > to support alerts[1]) > > [1] see eg. this note from Mrugesh: > > > Alerts org will contain the panels created for alert callbacks and will be > > hidden from the end users. > > from > https://github.com/Tendrl/specifications/issues/191#issuecomment-326197800 > > Is my understanding correct? If yes, I'm going to provide the qe ack. Yes, alert dashbaord is not supposed to be used by the end-users and hidden When volume is deleted in current build the record of volume is erased from etcd (/clusters/<cluster-id>/Volumes) and from alert dashboard. Before it is erased there are fields in /clusters/<cluster-id>/Volumes/<volume-id>/data `deleted` and `deleted_at` that are set to "" by default and are filled with data ("deleted_at": "<timestamp>", "deleted": true) when the volume is deleted.
When these records related to the volume in etcd are erased, there is no way how WA can restore them for deleted volume so the issue can not happen in current build, right?
Tested with:
tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch
tendrl-ansible-1.6.3-5.el7rhgs.noarch
tendrl-api-1.6.3-4.el7rhgs.noarch
tendrl-api-httpd-1.6.3-4.el7rhgs.noarch
tendrl-commons-1.6.3-9.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch
tendrl-node-agent-1.6.3-9.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-8.el7rhgs.noarch
In old builds also we have TTL for volumes to delete when it is deleted from CLI. Why we are using deleted flag is it will take some time to delete by TTL so in gluster-integration sync and monitoring-integration sync we need some flag to omit deleted volume to calculate some volume details and creating panels in alert dashbaord. So we used this deleted flag for that purpose. The problem in an old build is if a volume is deleted we are capturing gluster-event for delete volume and we are deleting it from grafana alert dashboard. But if it again marked as deleted is "", then again panel for that volume is created in alert dashbaord and it remains forever. Now in the new build, we fixed this problem like deleted flag marked as properly after deleted. And also even any problem in deletion flag then monitoring-integration will have a intelligence to remove volume panel when a volume is deleted by TTL. So it won't occur. The logic in monitoring-integration is like if sync collect list of volume like A, B, C then it will create volume panels in alert-dashboard for all A, B, C then next sync if sync collects only A, B then it will compare dashboard panels with newly collected data so C is missing so it will remove panel for C. Even if monitoring-integration down when volume is deleted and Volume detail is deleted by TTL form etcd then it can remove deleted volume detail from grafana dashbaord when monitoring-integration comes up. Tested 10 times and it seems to be fixed. Records from etcd and alert dashboards were deleted every time. --> VERIFIED Tested with: tendrl-ansible-1.6.3-5.el7rhgs.noarch tendrl-api-1.6.3-4.el7rhgs.noarch tendrl-api-httpd-1.6.3-4.el7rhgs.noarch tendrl-commons-1.6.3-9.el7rhgs.noarch tendrl-gluster-integration-1.6.3-7.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-7.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-7.el7rhgs.noarch tendrl-node-agent-1.6.3-9.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-8.el7rhgs.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2616 |