Description of problem:
While switching from graphite-web-0.X.X to graphite-web-1.X.X tendrl-server needs to migrate graphite-data. It needs some extra steps to do a complete migration. This should be done from the tendrl-upgrade script.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
grafana should display all monitoring data after updating a new version of graphite-web
PR is under review: https://github.com/Tendrl/monitoring-integration/pull/583
Providing QA ack.
Note that the upgrade script should not break RHGSWA if run again after migration.
The migration process from graphite-web 0.9.15 to 1.1.4 (between RHGS 3.4.2 and
3.4.3) seems to be quite unclear.
The tendrl-upgrade script performs following two commands:
# django-admin migrate --fake dashboard --settings=graphite.settings --run-syncdb
# django-admin migrate --fake-initial --settings=graphite.settings --run-syncdb
But it is quite questionable, if it is the correct migration process, because
Graphite documentation mentions slightly different command in Upgrading
section. Unfortunately that command seems to not work correctly.
Also when I've tried to compare the dump of /var/lib/graphite-web/graphite.db
from freshly installed cluster with another dump from cluster upgraded from
previous version, there were some differences, which looks, like the migration
process wasn't completed correctly. For example completely missing following
Also the description for the '--fake' argument seems quite worryingly (from
command line help, or documentation):
--fake Mark migrations as run without actually running them.
From another point of view, we didn't find any obvious issue on the updated
cluster - all Grafana Dashboards seems to show correct data.
So my question is, if we can accept/approve this approach, without deep
understanding of the migration process, with the risk, that something might be
broken (and which we might even miss during our testing)?
Because to the ambiguity, I'm moving this Bug back to ASSIGNED.
If the migration process will be approved without any change, please switch it
back to ON_QA.
Just suggesting for consideration, I might miss some important facts.
Based on examination of the content of graphite.db, there seems to be no really
relevant data which should be preserved between the old and new version.
What about simply deleting the database file during the upgrade process and
initializing it freshly the same way as during fresh installation?
Actually, the fake command is just marking migration is done without actually migration database schema. If we use the same initialization command then it gives error table is already exist. Regarding this, I have raised an upstream issue in a graphite-web repo. They actually closed that issue with a comment like it is not possible to migrate and create a new one https://github.com/graphite-project/graphite-web/issues/2389
Even I tried in different ways but I am still not able to find route case for migrating to the new schema.
We that ok to delete the graphiteDB and recreate it?
I've tried the scenario with deleting /var/lib/graphite-web/graphite.db and
there seems to be one possible issue, we have to take care about:
If graphite.db is deleted while httpd service is running, it might be recreated
without correct initialization. Then tendrl-ansible skip the initialization
step, because the db file already exists.
In other words, httpd service have to be stopped in the time when graphite.db
file will be deleted and reinitialized.
We also have to consider, that for the other task of tendrl-upgrade script
(Clearing grafana dashboards), httpd service have to be running.
Following suggestion is really not clear and nice solution, but with other
approaches there seems to be more problems than with this one:
So I think, that tendrl-upgrade script should do all the required steps:
1) stop httpd service
2) delete graphite.db
3) initialize the graphite.db
4) start httpd service
Or do you see any other better option?
Testing of this BZ should include use case described in BZ 1665030,
because httpd is now restarted in the update script.
I've tested the scenario of update from RHGS WA 3.4.2 to RHGS WA 3.4.3:
Red Hat Enterprise Linux Server release 7.6 (Maipo)
Red Hat Enterprise Linux Server release 7.6 (Maipo)
The tendrl-upgrade script correctly perform all the steps required for
migration to new graphite (stop all related services, remove old database,
initialize new database, set proper ownership, and start previously stopped
After the whole update process is finished, Grafana dashboards shows proper
For the full verification of this bug, it is necessary to validate scenario
from Bug 1665030, as mentioned Martin in previous comment.
I have tested the scenario of update from RHGS WA 3.4.1 to RHGS WA 3.4.3:
The tendrl-upgrade script correctly performed all the steps required for migration to new graphite and all dashboards are showing data. Links from tendrl point to correct grafana dashboards.
Verifying based on comment 15 and comment 16.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.