Bug 1631260 - Provide automated way for upgrading RHGS WA to 3.4.1 from earlier GAed releases
Summary: Provide automated way for upgrading RHGS WA to 3.4.1 from earlier GAed releases
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: web-admin-tendrl-monitoring-integration
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.4.z Batch Update 1
Assignee: Anmol Sachan
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-09-20 10:26 UTC by Anmol Sachan
Modified: 2018-10-31 08:45 UTC (History)
13 users (show)

Fixed In Version: tendrl-monitoring-integration-1.6.3-13.el7rhgs
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-31 08:45:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github Tendrl monitoring-integration issues 575 0 None None None 2018-09-21 09:47:32 UTC
Red Hat Bugzilla 1515276 0 unspecified CLOSED notifier - configuration files are not set in rpm 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1631264 0 unspecified CLOSED Update procedure - RHGSWA Version 3.4.0 to 3.4.1 (WA batch update 1) 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1633125 0 unspecified CLOSED tendrl-upgrade script should print additional information and request confirmation before any action 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2018:3427 0 None None None 2018-10-31 08:45:49 UTC

Internal Links: 1515276 1631264 1633125

Description Anmol Sachan 2018-09-20 10:26:00 UTC
Description of problem: With the batch update 1 for RHGSWA, the tendrl-monotoring-integration provides updates to the grafana dashboards. These updates have to be done without any interference to the clusters being managed and the whole WA stack to be re-installed.


Version-Release number of selected component (if applicable): tendrl-monitoring-integration-1.6.3-11.el7rhgs.noarch


How reproducible: 100 %

Procedure:

1.Stop tendrl-monitoring-integration service.

2. Do a yum update to install latest version of the tendrl-monitoring-integration package. This way the settings and configuration files remain preserved and would require no change.

3. Run the upgrade script provided.
   The script would require 3 inputs from the user : 
       a) grafana admin_user (default : /etc/tendrl/monitoring-integration/grafana/grafana.ini line 143)
       b) grafana admin_password  (default : /etc/tendrl/monitoring-integration/grafana/grafana.ini line 146)
       c) WA server-node ip.

4. Restart tendrl-monitoring-integration service.


How the script would work:

The script will be used only to delete existing dashboards, so that on restart of tendrl-monitoring-integration service the new dashboards could replace the old ones.

The script will call tha grafana api's using REST calls to delete the dashboards.
Reference to delete API : http://docs.grafana.org/http_api/dashboard/#delete-dashboard-by-slug
The dashboards which will be get deleted are: "cluster-dashboard", "brick-dashboard", "host-dashboard", "volume-dashboard".

Along with above dashboards the Alerts dashboards also have to be deleted. 
This can be achieved by deleting the Alert_dashboard organization(as the Alert_dashboard organization is created by monitoring-integration). The script will find the Alert_dashboard organization id by using a grafana Api ( http://docs.grafana.org/http_api/org/#get-organisation-by-name ). It will then delete the Alert_dashboard organization by its id ( http://docs.grafana.org/http_api/org/#delete-organisation ).

The script is planned to be shipped along with the new builds of tendrl-monitoring-integration package. However the location of the script on the file system and its execution, on install of the package needs to be discussed by the stakeholders here. 

Actual results:


Expected results: The above procedure should upgrade the tendrl-monitoring-integration package and the grafana dashboards should be updated.


Additional info:

Comment 2 Anmol Sachan 2018-09-20 13:04:59 UTC
Assuming the user runs the script on server node itself then user inputs for admin_user and admin_password can be made optional and can be loaded from default paths and there would be no need of the server-node-ip.

Comment 3 Nishanth Thomas 2018-09-21 06:10:51 UTC
@anmol,
The scripts to be placed in /usr/local/bin/tendrl

@rahul, Martin, comments?

Comment 4 Nishanth Thomas 2018-09-21 10:19:25 UTC
Fixed via https://github.com/Tendrl/monitoring-integration/pull/574

Comment 5 Anand Paladugu 2018-09-21 13:48:07 UTC
I am assuming that changes to dashboard in 3.4.1 are mainly descriptions in titles and help ? Correct ?   It would be bad experience if 3.4.1 simply deletes certain metrics/ panels after upgrading to it...

Comment 7 Anmol Sachan 2018-09-24 10:03:01 UTC
(In reply to Anand Paladugu from comment #5)
> I am assuming that changes to dashboard in 3.4.1 are mainly descriptions in
> titles and help ? Correct ?   It would be bad experience if 3.4.1 simply
> deletes certain metrics/ panels after upgrading to it...

Yes,changes to dashboard in 3.4.1 are mainly descriptions in titles and help.

Since it is assumed that the user is already using WA, the dashboards will already be present there. To update them, we have to actually remove them and replace them with the updated dashboards. It will take effect immediately as the tendrl-monitoring-integration service starts, and would just require a page refresh on the UI.

Comment 8 Anmol Sachan 2018-09-24 10:09:15 UTC
For QE to validate the build tendrl-monitoring-integration-1.6.3-13.el7rhgs, here are the steps.

1) Stop service tendrl-monitoring-integration.

2) Upgrade tendrl-monitoring-integration package.

3) Run the file /usr/bin/tendrl-upgrade on command-line as :
   ` python tendrl-upgrade --username=username --password=password `
   The username and password arguments are optional, and if not supplied, the script will take the default ones from grafana.ini file.

4) Restart the tendrl-monitoring-integration service.

5)Refresh Grafana UI page

Comment 9 Anmol Sachan 2018-09-24 10:15:50 UTC
(In reply to anmol sachan from comment #8)
> For QE to validate the build tendrl-monitoring-integration-1.6.3-13.el7rhgs,
> here are the steps.
> 
> 1) Stop service tendrl-monitoring-integration.
> 
> 2) Upgrade tendrl-monitoring-integration package.
> 
> 3) Run the file /usr/bin/tendrl-upgrade on command-line as :

>    ` python tendrl-upgrade --username=username --password=password `


` tendrl-upgrade --username=username --password=password `

use the above, it was a mistake in previous comment

>    The username and password arguments are optional, and if not supplied,
> the script will take the default ones from grafana.ini file.
> 
> 4) Restart the tendrl-monitoring-integration service.
> 
> 5)Refresh Grafana UI page

Comment 10 Martin Bukatovic 2018-09-24 13:00:13 UTC
The description drafted in comment 8 is incomplete, as we are trying to
align this with upgrade of whole RHGS BU1, which includes both tendrl server
and gluster storage machines, and both gluster and WA components.

Here are the missing pieces I was able to identify so far:

 * How is this connected to tendrl-ansible, would we need to rerun tendrl
   ansible during BU1 upgrade?
 * Procedure to upgrade storage machines, including both offline and inservice
   gluster upgrade paths.
 * How is WA server update aligned with updating storage machines?
 * Is anything else required during upgrade of WA server (compared to what is
   provided in comment 8)?
 * How is the upgrade expected to work, when BZ 1515276 hasn't been considered?

That said, we can check at least the draft suggested in comment 8 in isolation,
in a hope that such feedback will be helpful. But we can't verify this BZ without
the missing details, hence moving back to assigned.

Comment 11 Martin Bukatovic 2018-09-24 13:02:54 UTC
(In reply to Nishanth Thomas from comment #3)
> @anmol,
> The scripts to be placed in /usr/local/bin/tendrl
> 
> @rahul, Martin, comments?

Using /usr/local for rpm package would be wrong. I would suggest to use
just /usr/bin. Also make sure that the script shown help only when executed
without any arguments, to prevent deleting dashboards by mistake.

Comment 12 Daniel Horák 2018-09-24 14:10:00 UTC
As Martin mentioned above for full validation we will need documentation for the whole process including Gluster Storage upgrade (if it will be related).
But for now, I can verify, that the process described in comment 8 performs some changes on the Grafana dashboards - so I consider it, that the dashboards were updated.

Comment 13 Shubhendu Tripathi 2018-09-25 03:23:56 UTC
Martin/Daniel,

Let me detail the steps to be followed for BU1 (from 3.4.0 to 3.4.1)

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
On WA Server Node
---------------------
1. Stop the tendrl-monitoring-integration servive
  #systemctl stop tendrl-monitoring-integration

2. Update the RHGS-WA packages
  #yum update tendrl-*

3. Run the migration script
  #tendrl-upgrade --username=username --password=password

4. Restart the RHGS-WA services
  #systemctl restart tendrl-node-agent
  #systemctl restart tendrl-monitoring-integration
  #systemctl restart tendrl-notifier
  #systemctl restart tendrl-api

5. Restart httpd service
  #systemctl restart httpd

On Storage Nodes
----------------
1. Update the RHGS-WA packages
  #yum update tendrl-*

2. Restart the RHGS-WA services
  #systemctl restart tendrl-node-agent
  #systemctl restart tendrl-gluster-integration

++++++++++++++++++++++++++++++++++++++++++++++


Also regarding the questions from Martin,

>> How is this connected to tendrl-ansible, would we need to rerun tendrl
>>   ansible during BU1 upgrade?

No. Its not related to tendrl-ansible. The only packages affected as part batch update-1 are tendrl-monitoring-integration, tendrl-gluster-integration and tendrl-api


>> Procedure to upgrade storage machines, including both offline and in service
>> gluster upgrade paths.

The above comments talk about updation of tendrl-gluster-integration packages as part of updating RHGS-WA packages on storage nodes. Upgrade of RHGS bits to be taken care as per guidelines by RHGS team. For RHGS-WA components after upgrade of the rpms, we need to re-start the services.

>> How is WA server update aligned with updating storage machines?

The upgrades of RHGS-WA packages should be done independently for storage nodes and RHGS-WA server node.

>> Is anything else required during upgrade of WA server (compared to what is
>> provided in comment 8)?

A detailed list of steps provided above.

>> How is the upgrade expected to work, when BZ 1515276 hasn't been considered?

There are no changes in tendrl-notifier package in this batch update, so no effect.

-----------------------------

Martin, hope this makes it clearer as you expected.

Comment 14 Anmol Sachan 2018-09-25 07:37:49 UTC
(In reply to Martin Bukatovic from comment #10)
> The description drafted in comment 8 is incomplete, as we are trying to
> align this with upgrade of whole RHGS BU1, which includes both tendrl server
> and gluster storage machines, and both gluster and WA components.

This Bug is related to tendrl-monitoring integration package, and not the complete WA upgrade. That is why the rest of the packages weren't mentioned here. There is already a bug present in DOC-BZ ( https://bugzilla.redhat.com/show_bug.cgi?id=1631264 ) to discuss the whole RHGSWA upgrade.

> 
> Here are the missing pieces I was able to identify so far:
> 
>  * How is this connected to tendrl-ansible, would we need to rerun tendrl
>    ansible during BU1 upgrade?
>  * Procedure to upgrade storage machines, including both offline and
> inservice
>    gluster upgrade paths.
>  * How is WA server update aligned with updating storage machines?
>  * Is anything else required during upgrade of WA server (compared to what is
>    provided in comment 8)?
>  * How is the upgrade expected to work, when BZ 1515276 hasn't been
> considered?
> 
> That said, we can check at least the draft suggested in comment 8 in
> isolation,
> in a hope that such feedback will be helpful. But we can't verify this BZ
> without
> the missing details, hence moving back to assigned.

The procedure given in comment 8 is just for tendrl-monitoring-integration and ideally that should only be tested. Failed QA is not a correct resolution here. Since it is already done I would wait for @Daniel to verity https://bugzilla.redhat.com/show_bug.cgi?id=1631260#c12 . Shubhendu has already provided the complete steps in https://bugzilla.redhat.com/show_bug.cgi?id=1631260#c13 . If there is any more to be discussed regarding the completed WA upgrade it can be discussed here : https://bugzilla.redhat.com/show_bug.cgi?id=163126 .

Comment 15 Anmol Sachan 2018-09-25 07:40:40 UTC
> here : https://bugzilla.redhat.com/show_bug.cgi?id=163126 . 
Please ignore this.


WA upgrade to be discussed here : https://bugzilla.redhat.com/show_bug.cgi?id=1631264

Comment 16 Daniel Horák 2018-09-25 09:44:42 UTC
The process described in comment 8 and comment 13 seems to work as expected
and Grafana Dashboards are correctly upgraded.
There might be needed some additional information related to the alignment of
this process with general RHGS Update and also update or base RHEL system
(7.5->7.6).

I have just few questions/notes related to the upgrade script tendrl-upgrade:

1) Is there real possibility, that the script will not be able to automatically
detect the Grafana username and password?
If not, we probably don't need to highlight the possibility of specifying
those parameters from command line and suggest just execution of the command
without any additional parameters. It will be easier, than documenting
additional steps, where to find those information.

2) It might be worth, to firstly print some information message - what is
the purpose of the script, what is the prerequisite (e.g. stopped
tendrl-monitoring service) and what it will do and then ask the user for
Yes/No agreement if he want to continue.
Of course, there might be some command line parameter, which will automatically
except yes for the question (something like -y, --assumeyes in yum).

3) Currently the tendrl-update script is specifically related to monitoring
integration. Is it possible, that in some future version it might have wider
scope (for example update of ETCD db schema or something else)? If that might
be the case, is it ok, to include it into monitoring integration package
(instead of some more tendrl general package)?

Comment 17 Daniel Horák 2018-09-25 09:48:29 UTC
> Of course, there might be some command line parameter, which will automatically
> except yes for the question (something like -y, --assumeyes in yum).which will 

...automatically expect yes...

Comment 18 Atin Mukherjee 2018-09-26 06:44:55 UTC
(In reply to Daniel Horák from comment #16)
> The process described in comment 8 and comment 13 seems to work as expected
> and Grafana Dashboards are correctly upgraded.
> There might be needed some additional information related to the alignment of
> this process with general RHGS Update and also update or base RHEL system
> (7.5->7.6).
> 
> I have just few questions/notes related to the upgrade script tendrl-upgrade:
> 
> 1) Is there real possibility, that the script will not be able to
> automatically
> detect the Grafana username and password?
> If not, we probably don't need to highlight the possibility of specifying
> those parameters from command line and suggest just execution of the command
> without any additional parameters. It will be easier, than documenting
> additional steps, where to find those information.
> 
> 2) It might be worth, to firstly print some information message - what is
> the purpose of the script, what is the prerequisite (e.g. stopped
> tendrl-monitoring service) and what it will do and then ask the user for
> Yes/No agreement if he want to continue.
> Of course, there might be some command line parameter, which will
> automatically
> except yes for the question (something like -y, --assumeyes in yum).
> 
> 3) Currently the tendrl-update script is specifically related to monitoring
> integration. Is it possible, that in some future version it might have wider
> scope (for example update of ETCD db schema or something else)? If that might
> be the case, is it ok, to include it into monitoring integration package
> (instead of some more tendrl general package)?

To me, none of these points get to me any justification on why this engineering BZ needs to be failedQA. The above points can be taken as additional reference for enhancements to be done in the commit message but in no way I see that this script is functionally behaving wrong.

If there're any documentation gaps, then same needs to be tracked through the doc bug which we already have for documenting the upgrade steps for 3.4.1 I believe.

Nishanth/Shubhendu - Please move this BZ to ON_QA with clearing out the failedQA tag. In case QE has additional justification, feel free to add here.

Comment 19 Daniel Horák 2018-09-26 07:21:31 UTC
(In reply to Atin Mukherjee from comment #18)
> To me, none of these points get to me any justification on why this
> engineering BZ needs to be failedQA. The above points can be taken as
> additional reference for enhancements to be done in the commit message but
> in no way I see that this script is functionally behaving wrong.

Atin, my understanding is, that the tendrl-upgrade script is main part of
this BZ and at least the second point is directly related to the upgrade
script. So if it will be decided to incorporate my suggestions, this BZ will
be moved to MODIFIED once the changes will be done. Of course, if it will be
declined, it can be moved to MODIFIED immediately. Also if the change
looks too big to be considered as separated BZ (including the 3 ack process),
fell free to mention it here and move this to MODIFIED and I'll create
new BZ for that (but from my point of view, it will be more efficient to solve
it as part of this one bug).

In this case, the failedQA doesn't meant anything and is set automatically for any BZ moved back to ASSIGNED.

Comment 20 Shubhendu Tripathi 2018-09-26 08:41:04 UTC
(In reply to Daniel Horák from comment #19)
> (In reply to Atin Mukherjee from comment #18)
> > To me, none of these points get to me any justification on why this
> > engineering BZ needs to be failedQA. The above points can be taken as
> > additional reference for enhancements to be done in the commit message but
> > in no way I see that this script is functionally behaving wrong.
> 
> Atin, my understanding is, that the tendrl-upgrade script is main part of
> this BZ and at least the second point is directly related to the upgrade
> script. So if it will be decided to incorporate my suggestions, this BZ will
> be moved to MODIFIED once the changes will be done. Of course, if it will be
> declined, it can be moved to MODIFIED immediately. Also if the change
> looks too big to be considered as separated BZ (including the 3 ack process),
> fell free to mention it here and move this to MODIFIED and I'll create
> new BZ for that (but from my point of view, it will be more efficient to
> solve
> it as part of this one bug).

Daniel, I would prefer a separate BZ for adding more meaningful information to be printed in the start of the script and then asking for continuing. As such there is no technical issue with the script at the moment and we can go ahead with verification. Any suggestion for improvement, can be taken as separate work on this. 

Kindly do the needful.

> 
> In this case, the failedQA doesn't meant anything and is set automatically
> for any BZ moved back to ASSIGNED.

Comment 21 Daniel Horák 2018-09-26 09:00:34 UTC
(In reply to Shubhendu Tripathi from comment #20)
> Daniel, I would prefer a separate BZ for adding more meaningful information
> to be printed in the start of the script and then asking for continuing. As
> such there is no technical issue with the script at the moment and we can go
> ahead with verification. Any suggestion for improvement, can be taken as
> separate work on this. 

New Bug 1633125 created.

Comment 22 Atin Mukherjee 2018-09-26 12:40:05 UTC
Moving this back to ON_QA.

Comment 23 Martin Bukatovic 2018-09-26 12:57:14 UTC
(In reply to Atin Mukherjee from comment #22)
> Moving this back to ON_QA.

Note that without complete draft of upgrade path, this BZ can't be moved to
VERIFIED state.

With this in mind, ON_QA state is not correct, as this usually means that
the BZ is ready for testing, which is not the case here.

Comment 24 Anmol Sachan 2018-09-26 14:19:03 UTC
(In reply to Martin Bukatovic from comment #23)
> (In reply to Atin Mukherjee from comment #22)
> > Moving this back to ON_QA.
> 
> Note that without complete draft of upgrade path, this BZ can't be moved to
> VERIFIED state.
> 
> With this in mind, ON_QA state is not correct, as this usually means that
> the BZ is ready for testing, which is not the case here.

Martin, the complete upgrade procedure is present on DOC BZ here : https://bugzilla.redhat.com/show_bug.cgi?id=1631264#c8 . I hope this is enough. Please comment if you expect any other type of documentation to verfiy this BZ.

Comment 26 Daniel Horák 2018-10-23 16:36:42 UTC
* Script tendrl-upgrade is part of new version of package
  tendrl-monitoring-integration.

* The script in default configuration and environment (when used accordingly to
  documentation) correctly delete all grafana dashboards, and the dashboards
  are correctly recreated (from updated specification) during next startup of
  tendrl-monitoring-integration service.

Base on those two points, I'm verifying this Bug.

There are some issues related to the new script, some of them are described in
Bug 1633125 and others are valid only for non-default usage, which is out of
scope for this release and will be prospectively covered in further Bugs.

Tested and VERIFIED with:
  grafana-4.3.2-3.el7rhgs.x86_64
  graphite-web-0.9.15-1.el7rhgs.noarch
  tendrl-ansible-1.6.3-8.el7rhgs.noarch
  tendrl-api-1.6.3-7.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-7.el7rhgs.noarch
  tendrl-commons-1.6.3-13.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch
  tendrl-node-agent-1.6.3-10.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-11.el7rhgs.noarch

>> VERIFIED

Comment 28 errata-xmlrpc 2018-10-31 08:45:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3427


Note You need to log in before you can comment on or make changes to this bug.