Bug 1658245

Summary: graphite data migration process from graphite-web-0.X.X to graphite-web-1.X.X should done from tendrl-upgrade script
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: gowtham <gshanmug>
Component: web-admin-tendrl-monitoring-integrationAssignee: gowtham <gshanmug>
Status: CLOSED ERRATA QA Contact: Daniel Horák <dahorak>
Severity: medium Docs Contact:
Priority: unspecified    
Version: rhgs-3.4CC: dahorak, fbalak, mbukatov, nthomas, rhs-bugs, sankarshan
Target Milestone: ---Keywords: ZStream
Target Release: RHGS 3.4.z Batch Update 3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tendrl-monitoring-integration-1.6.3-20.el7rhgs Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-02-04 07:43:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1665030    

Description gowtham 2018-12-11 15:31:50 UTC
Description of problem:

While switching from graphite-web-0.X.X to graphite-web-1.X.X tendrl-server needs to migrate graphite-data. It needs some extra steps to do a complete migration. This should be done from the tendrl-upgrade script. 
  
Version-Release number of selected component (if applicable):

tendrl-monitoring-integration-1.6.3-16.el7rhgs.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
grafana should display all monitoring data after updating a new version of graphite-web

Additional info:

Comment 2 gowtham 2018-12-12 15:19:47 UTC
PR is under review: https://github.com/Tendrl/monitoring-integration/pull/583

Comment 4 Martin Bukatovic 2018-12-12 18:35:16 UTC
Providing QA ack.

Note that the upgrade script should not break RHGSWA if run again after migration.

Comment 8 Daniel Horák 2019-01-10 09:35:52 UTC
The migration process from graphite-web 0.9.15 to 1.1.4 (between RHGS 3.4.2 and
3.4.3) seems to be quite unclear.

The tendrl-upgrade script performs following two commands:
# django-admin migrate --fake dashboard --settings=graphite.settings --run-syncdb
# django-admin migrate --fake-initial --settings=graphite.settings --run-syncdb

But it is quite questionable, if it is the correct migration process, because
Graphite documentation mentions slightly different command in Upgrading
section[1]. Unfortunately that command seems to not work correctly.

Also when I've tried to compare the dump of /var/lib/graphite-web/graphite.db
from freshly installed cluster with another dump from cluster upgraded from
previous version, there were some differences, which looks, like the migration
process wasn't completed correctly. For example completely missing following
two tables:
  "dashboard_template" and
  "dashboard_template_owners".

Also the description for the '--fake' argument seems quite worryingly (from
command line help, or documentation[2]):
  --fake  Mark migrations as run without actually running them.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

From another point of view, we didn't find any obvious issue on the updated
cluster - all Grafana Dashboards seems to show correct data.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So my question is, if we can accept/approve this approach, without deep
understanding of the migration process, with the risk, that something might be
broken (and which we might even miss during our testing)?

Because to the ambiguity, I'm moving this Bug back to ASSIGNED.
If the migration process will be approved without any change, please switch it
back to ON_QA.
>> ASSIGNED

[1] https://graphite.readthedocs.io/en/latest/releases/1_0_0.html#upgrading
[2] https://docs.djangoproject.com/en/2.1/ref/django-admin/#django-admin-migrate

Comment 9 Daniel Horák 2019-01-10 09:44:22 UTC
Just suggesting for consideration, I might miss some important facts.

Based on examination of the content of graphite.db, there seems to be no really
relevant data which should be preserved between the old and new version.
What about simply deleting the database file during the upgrade process and
initializing it freshly the same way as during fresh installation?

Comment 10 gowtham 2019-01-10 13:58:18 UTC
Actually, the fake command is just marking migration is done without actually migration database schema. If we use the same initialization command then it gives error table is already exist. Regarding this, I have raised an upstream issue in a graphite-web repo. They actually closed that issue with a comment like it is not possible to migrate and create a new one https://github.com/graphite-project/graphite-web/issues/2389


Even I tried in different ways but I am still not able to find route case for migrating to the new schema.

Comment 11 gowtham 2019-01-10 13:59:08 UTC
We that ok to delete the graphiteDB and recreate it?

Comment 12 Daniel Horák 2019-01-11 09:54:51 UTC
I've tried the scenario with deleting /var/lib/graphite-web/graphite.db and
there seems to be one possible issue, we have to take care about:

If graphite.db is deleted while httpd service is running, it might be recreated
without correct initialization. Then tendrl-ansible skip the initialization
step, because the db file already exists.
In other words, httpd service have to be stopped in the time when graphite.db
file will be deleted and reinitialized.

We also have to consider, that for the other task of tendrl-upgrade script
(Clearing grafana dashboards), httpd service have to be running.

Following suggestion is really not clear and nice solution, but with other
approaches there seems to be more problems than with this one:

  So I think, that tendrl-upgrade script should do all the required steps:
  1) stop httpd service
  2) delete graphite.db
  3) initialize the graphite.db
  4) start httpd service

Or do you see any other better option?

Comment 14 Martin Bukatovic 2019-01-17 09:11:46 UTC
Testing of this BZ should include use case described in BZ 1665030,
because httpd is now restarted in the update script.

Comment 15 Daniel Horák 2019-01-18 13:27:31 UTC
I've tested the scenario of update from RHGS WA 3.4.2 to RHGS WA 3.4.3:
Updating from:
  Red Hat Enterprise Linux Server release 7.6 (Maipo)
  carbon-selinux-1.5.4-2.el7rhgs.noarch
  grafana-4.6.4-1.el7rhgs.x86_64
  graphite-web-0.9.15-1.el7rhgs.noarch
  python-carbon-0.9.15-2.1.el7rhgs.noarch
  python-django-1.6.11-7.el7rhgs.noarch
  python-django-bash-completion-1.6.11-7.el7rhgs.noarch
  python-django-tagging-0.3.1-11.1.el7rhgs.noarch
  tendrl-ansible-1.6.3-10.el7rhgs.noarch
  tendrl-api-1.6.3-8.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-8.el7rhgs.noarch
  tendrl-commons-1.6.3-13.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-16.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-16.el7rhgs.noarch
  tendrl-node-agent-1.6.3-11.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-2.el7rhgs.noarch
  tendrl-ui-1.6.3-14.el7rhgs.noarch

Updating to:
  Red Hat Enterprise Linux Server release 7.6 (Maipo)
  carbon-selinux-1.5.4-3.el7rhgs.noarch
  grafana-4.6.4-1.el7rhgs.x86_64
  graphite-web-1.1.4-1.el7rhgs.noarch
  python2-django-1.11.15-3.el7rhgs.noarch
  python-carbon-1.1.4-1.el7rhgs.noarch
  python-django-bash-completion-1.11.15-3.el7rhgs.noarch
  python-django-tagging-0.4.6-1.el7rhgs.noarch
  tendrl-ansible-1.6.3-11.el7rhgs.noarch
  tendrl-api-1.6.3-8.el7rhgs.noarch
  tendrl-api-httpd-1.6.3-8.el7rhgs.noarch
  tendrl-commons-1.6.3-14.el7rhgs.noarch
  tendrl-grafana-plugins-1.6.3-20.el7rhgs.noarch
  tendrl-grafana-selinux-1.5.4-3.el7rhgs.noarch
  tendrl-monitoring-integration-1.6.3-20.el7rhgs.noarch
  tendrl-node-agent-1.6.3-13.el7rhgs.noarch
  tendrl-notifier-1.6.3-4.el7rhgs.noarch
  tendrl-selinux-1.5.4-3.el7rhgs.noarch
  tendrl-ui-1.6.3-14.el7rhgs.noarch

The tendrl-upgrade script correctly perform all the steps required for
migration to new graphite (stop all related services, remove old database,
initialize new database, set proper ownership, and start previously stopped
services).
After the whole update process is finished, Grafana dashboards shows proper
data.

For the full verification of this bug, it is necessary to validate scenario
from Bug 1665030, as mentioned Martin in previous comment.

Comment 16 Filip Balák 2019-01-18 14:24:32 UTC
I have tested the scenario of update from RHGS WA 3.4.1 to RHGS WA 3.4.3:
Updating from:
carbon-selinux-1.5.4-2.el7rhgs.noarch
grafana-4.3.2-3.el7rhgs.x86_64
graphite-web-0.9.15-1.el7rhgs.noarch
python-django-1.6.11-7.el7rhgs.noarch
python-django-bash-completion-1.6.11-7.el7rhgs.noarch
python-django-tagging-0.3.1-11.1.el7rhgs.noarch
tendrl-ansible-1.6.3-8.el7rhgs.noarch
tendrl-api-1.6.3-7.el7rhgs.noarch
tendrl-api-httpd-1.6.3-7.el7rhgs.noarch
tendrl-commons-1.6.3-13.el7rhgs.noarch
tendrl-grafana-plugins-1.6.3-14.el7rhgs.noarch
tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch
tendrl-monitoring-integration-1.6.3-14.el7rhgs.noarch
tendrl-node-agent-1.6.3-10.el7rhgs.noarch
tendrl-notifier-1.6.3-4.el7rhgs.noarch
tendrl-selinux-1.5.4-2.el7rhgs.noarch
tendrl-ui-1.6.3-11.el7rhgs.noarch

The tendrl-upgrade script correctly performed all the steps required for migration to new graphite and all dashboards are showing data. Links from tendrl point to correct grafana dashboards.

Comment 17 Daniel Horák 2019-01-21 07:49:01 UTC
Verifying based on comment 15 and comment 16.

>> VERIFIED

Comment 19 errata-xmlrpc 2019-02-04 07:43:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0265