Description of problem: With the batch update 1 for RHGSWA, the tendrl-monotoring-integration provides updates to the grafana dashboards. These updates have to be done without any interference to the clusters being managed and the whole WA stack to be re-installed. Version-Release number of selected component (if applicable): tendrl-monitoring-integration-1.6.3-11.el7rhgs.noarch How reproducible: 100 % Procedure: 1.Stop tendrl-monitoring-integration service. 2. Do a yum update to install latest version of the tendrl-monitoring-integration package. This way the settings and configuration files remain preserved and would require no change. 3. Run the upgrade script provided. The script would require 3 inputs from the user : a) grafana admin_user (default : /etc/tendrl/monitoring-integration/grafana/grafana.ini line 143) b) grafana admin_password (default : /etc/tendrl/monitoring-integration/grafana/grafana.ini line 146) c) WA server-node ip. 4. Restart tendrl-monitoring-integration service. How the script would work: The script will be used only to delete existing dashboards, so that on restart of tendrl-monitoring-integration service the new dashboards could replace the old ones. The script will call tha grafana api's using REST calls to delete the dashboards. Reference to delete API : http://docs.grafana.org/http_api/dashboard/#delete-dashboard-by-slug The dashboards which will be get deleted are: "cluster-dashboard", "brick-dashboard", "host-dashboard", "volume-dashboard". Along with above dashboards the Alerts dashboards also have to be deleted. This can be achieved by deleting the Alert_dashboard organization(as the Alert_dashboard organization is created by monitoring-integration). The script will find the Alert_dashboard organization id by using a grafana Api ( http://docs.grafana.org/http_api/org/#get-organisation-by-name ). It will then delete the Alert_dashboard organization by its id ( http://docs.grafana.org/http_api/org/#delete-organisation ). The script is planned to be shipped along with the new builds of tendrl-monitoring-integration package. However the location of the script on the file system and its execution, on install of the package needs to be discussed by the stakeholders here. Actual results: Expected results: The above procedure should upgrade the tendrl-monitoring-integration package and the grafana dashboards should be updated. Additional info:
Assuming the user runs the script on server node itself then user inputs for admin_user and admin_password can be made optional and can be loaded from default paths and there would be no need of the server-node-ip.
@anmol, The scripts to be placed in /usr/local/bin/tendrl @rahul, Martin, comments?
Fixed via https://github.com/Tendrl/monitoring-integration/pull/574
I am assuming that changes to dashboard in 3.4.1 are mainly descriptions in titles and help ? Correct ? It would be bad experience if 3.4.1 simply deletes certain metrics/ panels after upgrading to it...
(In reply to Anand Paladugu from comment #5) > I am assuming that changes to dashboard in 3.4.1 are mainly descriptions in > titles and help ? Correct ? It would be bad experience if 3.4.1 simply > deletes certain metrics/ panels after upgrading to it... Yes,changes to dashboard in 3.4.1 are mainly descriptions in titles and help. Since it is assumed that the user is already using WA, the dashboards will already be present there. To update them, we have to actually remove them and replace them with the updated dashboards. It will take effect immediately as the tendrl-monitoring-integration service starts, and would just require a page refresh on the UI.
For QE to validate the build tendrl-monitoring-integration-1.6.3-13.el7rhgs, here are the steps. 1) Stop service tendrl-monitoring-integration. 2) Upgrade tendrl-monitoring-integration package. 3) Run the file /usr/bin/tendrl-upgrade on command-line as : ` python tendrl-upgrade --username=username --password=password ` The username and password arguments are optional, and if not supplied, the script will take the default ones from grafana.ini file. 4) Restart the tendrl-monitoring-integration service. 5)Refresh Grafana UI page
(In reply to anmol sachan from comment #8) > For QE to validate the build tendrl-monitoring-integration-1.6.3-13.el7rhgs, > here are the steps. > > 1) Stop service tendrl-monitoring-integration. > > 2) Upgrade tendrl-monitoring-integration package. > > 3) Run the file /usr/bin/tendrl-upgrade on command-line as : > ` python tendrl-upgrade --username=username --password=password ` ` tendrl-upgrade --username=username --password=password ` use the above, it was a mistake in previous comment > The username and password arguments are optional, and if not supplied, > the script will take the default ones from grafana.ini file. > > 4) Restart the tendrl-monitoring-integration service. > > 5)Refresh Grafana UI page
The description drafted in comment 8 is incomplete, as we are trying to align this with upgrade of whole RHGS BU1, which includes both tendrl server and gluster storage machines, and both gluster and WA components. Here are the missing pieces I was able to identify so far: * How is this connected to tendrl-ansible, would we need to rerun tendrl ansible during BU1 upgrade? * Procedure to upgrade storage machines, including both offline and inservice gluster upgrade paths. * How is WA server update aligned with updating storage machines? * Is anything else required during upgrade of WA server (compared to what is provided in comment 8)? * How is the upgrade expected to work, when BZ 1515276 hasn't been considered? That said, we can check at least the draft suggested in comment 8 in isolation, in a hope that such feedback will be helpful. But we can't verify this BZ without the missing details, hence moving back to assigned.
(In reply to Nishanth Thomas from comment #3) > @anmol, > The scripts to be placed in /usr/local/bin/tendrl > > @rahul, Martin, comments? Using /usr/local for rpm package would be wrong. I would suggest to use just /usr/bin. Also make sure that the script shown help only when executed without any arguments, to prevent deleting dashboards by mistake.
As Martin mentioned above for full validation we will need documentation for the whole process including Gluster Storage upgrade (if it will be related). But for now, I can verify, that the process described in comment 8 performs some changes on the Grafana dashboards - so I consider it, that the dashboards were updated.
Martin/Daniel, Let me detail the steps to be followed for BU1 (from 3.4.0 to 3.4.1) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On WA Server Node --------------------- 1. Stop the tendrl-monitoring-integration servive #systemctl stop tendrl-monitoring-integration 2. Update the RHGS-WA packages #yum update tendrl-* 3. Run the migration script #tendrl-upgrade --username=username --password=password 4. Restart the RHGS-WA services #systemctl restart tendrl-node-agent #systemctl restart tendrl-monitoring-integration #systemctl restart tendrl-notifier #systemctl restart tendrl-api 5. Restart httpd service #systemctl restart httpd On Storage Nodes ---------------- 1. Update the RHGS-WA packages #yum update tendrl-* 2. Restart the RHGS-WA services #systemctl restart tendrl-node-agent #systemctl restart tendrl-gluster-integration ++++++++++++++++++++++++++++++++++++++++++++++ Also regarding the questions from Martin, >> How is this connected to tendrl-ansible, would we need to rerun tendrl >> ansible during BU1 upgrade? No. Its not related to tendrl-ansible. The only packages affected as part batch update-1 are tendrl-monitoring-integration, tendrl-gluster-integration and tendrl-api >> Procedure to upgrade storage machines, including both offline and in service >> gluster upgrade paths. The above comments talk about updation of tendrl-gluster-integration packages as part of updating RHGS-WA packages on storage nodes. Upgrade of RHGS bits to be taken care as per guidelines by RHGS team. For RHGS-WA components after upgrade of the rpms, we need to re-start the services. >> How is WA server update aligned with updating storage machines? The upgrades of RHGS-WA packages should be done independently for storage nodes and RHGS-WA server node. >> Is anything else required during upgrade of WA server (compared to what is >> provided in comment 8)? A detailed list of steps provided above. >> How is the upgrade expected to work, when BZ 1515276 hasn't been considered? There are no changes in tendrl-notifier package in this batch update, so no effect. ----------------------------- Martin, hope this makes it clearer as you expected.
(In reply to Martin Bukatovic from comment #10) > The description drafted in comment 8 is incomplete, as we are trying to > align this with upgrade of whole RHGS BU1, which includes both tendrl server > and gluster storage machines, and both gluster and WA components. This Bug is related to tendrl-monitoring integration package, and not the complete WA upgrade. That is why the rest of the packages weren't mentioned here. There is already a bug present in DOC-BZ ( https://bugzilla.redhat.com/show_bug.cgi?id=1631264 ) to discuss the whole RHGSWA upgrade. > > Here are the missing pieces I was able to identify so far: > > * How is this connected to tendrl-ansible, would we need to rerun tendrl > ansible during BU1 upgrade? > * Procedure to upgrade storage machines, including both offline and > inservice > gluster upgrade paths. > * How is WA server update aligned with updating storage machines? > * Is anything else required during upgrade of WA server (compared to what is > provided in comment 8)? > * How is the upgrade expected to work, when BZ 1515276 hasn't been > considered? > > That said, we can check at least the draft suggested in comment 8 in > isolation, > in a hope that such feedback will be helpful. But we can't verify this BZ > without > the missing details, hence moving back to assigned. The procedure given in comment 8 is just for tendrl-monitoring-integration and ideally that should only be tested. Failed QA is not a correct resolution here. Since it is already done I would wait for @Daniel to verity https://bugzilla.redhat.com/show_bug.cgi?id=1631260#c12 . Shubhendu has already provided the complete steps in https://bugzilla.redhat.com/show_bug.cgi?id=1631260#c13 . If there is any more to be discussed regarding the completed WA upgrade it can be discussed here : https://bugzilla.redhat.com/show_bug.cgi?id=163126 .
> here : https://bugzilla.redhat.com/show_bug.cgi?id=163126 . Please ignore this. WA upgrade to be discussed here : https://bugzilla.redhat.com/show_bug.cgi?id=1631264
The process described in comment 8 and comment 13 seems to work as expected and Grafana Dashboards are correctly upgraded. There might be needed some additional information related to the alignment of this process with general RHGS Update and also update or base RHEL system (7.5->7.6). I have just few questions/notes related to the upgrade script tendrl-upgrade: 1) Is there real possibility, that the script will not be able to automatically detect the Grafana username and password? If not, we probably don't need to highlight the possibility of specifying those parameters from command line and suggest just execution of the command without any additional parameters. It will be easier, than documenting additional steps, where to find those information. 2) It might be worth, to firstly print some information message - what is the purpose of the script, what is the prerequisite (e.g. stopped tendrl-monitoring service) and what it will do and then ask the user for Yes/No agreement if he want to continue. Of course, there might be some command line parameter, which will automatically except yes for the question (something like -y, --assumeyes in yum). 3) Currently the tendrl-update script is specifically related to monitoring integration. Is it possible, that in some future version it might have wider scope (for example update of ETCD db schema or something else)? If that might be the case, is it ok, to include it into monitoring integration package (instead of some more tendrl general package)?
> Of course, there might be some command line parameter, which will automatically > except yes for the question (something like -y, --assumeyes in yum).which will ...automatically expect yes...
(In reply to Daniel Horák from comment #16) > The process described in comment 8 and comment 13 seems to work as expected > and Grafana Dashboards are correctly upgraded. > There might be needed some additional information related to the alignment of > this process with general RHGS Update and also update or base RHEL system > (7.5->7.6). > > I have just few questions/notes related to the upgrade script tendrl-upgrade: > > 1) Is there real possibility, that the script will not be able to > automatically > detect the Grafana username and password? > If not, we probably don't need to highlight the possibility of specifying > those parameters from command line and suggest just execution of the command > without any additional parameters. It will be easier, than documenting > additional steps, where to find those information. > > 2) It might be worth, to firstly print some information message - what is > the purpose of the script, what is the prerequisite (e.g. stopped > tendrl-monitoring service) and what it will do and then ask the user for > Yes/No agreement if he want to continue. > Of course, there might be some command line parameter, which will > automatically > except yes for the question (something like -y, --assumeyes in yum). > > 3) Currently the tendrl-update script is specifically related to monitoring > integration. Is it possible, that in some future version it might have wider > scope (for example update of ETCD db schema or something else)? If that might > be the case, is it ok, to include it into monitoring integration package > (instead of some more tendrl general package)? To me, none of these points get to me any justification on why this engineering BZ needs to be failedQA. The above points can be taken as additional reference for enhancements to be done in the commit message but in no way I see that this script is functionally behaving wrong. If there're any documentation gaps, then same needs to be tracked through the doc bug which we already have for documenting the upgrade steps for 3.4.1 I believe. Nishanth/Shubhendu - Please move this BZ to ON_QA with clearing out the failedQA tag. In case QE has additional justification, feel free to add here.
(In reply to Atin Mukherjee from comment #18) > To me, none of these points get to me any justification on why this > engineering BZ needs to be failedQA. The above points can be taken as > additional reference for enhancements to be done in the commit message but > in no way I see that this script is functionally behaving wrong. Atin, my understanding is, that the tendrl-upgrade script is main part of this BZ and at least the second point is directly related to the upgrade script. So if it will be decided to incorporate my suggestions, this BZ will be moved to MODIFIED once the changes will be done. Of course, if it will be declined, it can be moved to MODIFIED immediately. Also if the change looks too big to be considered as separated BZ (including the 3 ack process), fell free to mention it here and move this to MODIFIED and I'll create new BZ for that (but from my point of view, it will be more efficient to solve it as part of this one bug). In this case, the failedQA doesn't meant anything and is set automatically for any BZ moved back to ASSIGNED.
(In reply to Daniel Horák from comment #19) > (In reply to Atin Mukherjee from comment #18) > > To me, none of these points get to me any justification on why this > > engineering BZ needs to be failedQA. The above points can be taken as > > additional reference for enhancements to be done in the commit message but > > in no way I see that this script is functionally behaving wrong. > > Atin, my understanding is, that the tendrl-upgrade script is main part of > this BZ and at least the second point is directly related to the upgrade > script. So if it will be decided to incorporate my suggestions, this BZ will > be moved to MODIFIED once the changes will be done. Of course, if it will be > declined, it can be moved to MODIFIED immediately. Also if the change > looks too big to be considered as separated BZ (including the 3 ack process), > fell free to mention it here and move this to MODIFIED and I'll create > new BZ for that (but from my point of view, it will be more efficient to > solve > it as part of this one bug). Daniel, I would prefer a separate BZ for adding more meaningful information to be printed in the start of the script and then asking for continuing. As such there is no technical issue with the script at the moment and we can go ahead with verification. Any suggestion for improvement, can be taken as separate work on this. Kindly do the needful. > > In this case, the failedQA doesn't meant anything and is set automatically > for any BZ moved back to ASSIGNED.
(In reply to Shubhendu Tripathi from comment #20) > Daniel, I would prefer a separate BZ for adding more meaningful information > to be printed in the start of the script and then asking for continuing. As > such there is no technical issue with the script at the moment and we can go > ahead with verification. Any suggestion for improvement, can be taken as > separate work on this. New Bug 1633125 created.
Moving this back to ON_QA.
(In reply to Atin Mukherjee from comment #22) > Moving this back to ON_QA. Note that without complete draft of upgrade path, this BZ can't be moved to VERIFIED state. With this in mind, ON_QA state is not correct, as this usually means that the BZ is ready for testing, which is not the case here.
(In reply to Martin Bukatovic from comment #23) > (In reply to Atin Mukherjee from comment #22) > > Moving this back to ON_QA. > > Note that without complete draft of upgrade path, this BZ can't be moved to > VERIFIED state. > > With this in mind, ON_QA state is not correct, as this usually means that > the BZ is ready for testing, which is not the case here. Martin, the complete upgrade procedure is present on DOC BZ here : https://bugzilla.redhat.com/show_bug.cgi?id=1631264#c8 . I hope this is enough. Please comment if you expect any other type of documentation to verfiy this BZ.
* Script tendrl-upgrade is part of new version of package tendrl-monitoring-integration. * The script in default configuration and environment (when used accordingly to documentation) correctly delete all grafana dashboards, and the dashboards are correctly recreated (from updated specification) during next startup of tendrl-monitoring-integration service. Base on those two points, I'm verifying this Bug. There are some issues related to the new script, some of them are described in Bug 1633125 and others are valid only for non-default usage, which is out of scope for this release and will be prospectively covered in further Bugs. Tested and VERIFIED with: grafana-4.3.2-3.el7rhgs.x86_64 graphite-web-0.9.15-1.el7rhgs.noarch tendrl-ansible-1.6.3-8.el7rhgs.noarch tendrl-api-1.6.3-7.el7rhgs.noarch tendrl-api-httpd-1.6.3-7.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch tendrl-node-agent-1.6.3-10.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-11.el7rhgs.noarch >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3427