Bug 1656806 - Monitoring-integration unable to upload default dashboard into grafana
Summary: Monitoring-integration unable to upload default dashboard into grafana
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: web-admin-tendrl-ansible
Version: rhgs-3.4
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Nishanth Thomas
QA Contact: sds-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-06 11:20 UTC by gowtham
Modified: 2019-05-09 08:40 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-09 08:40:03 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description gowtham 2018-12-06 11:20:05 UTC
Description of problem:
After introduced apache vhost for grafana, monitoring-integration can only talk with grafana using apache vhost only. problem is during tendrl-ansible installation flow monitoring-integration is started first and then HTTPD is restarted. So when monitoring-integration start it tries to initialize default dashboard in grafana, but is unable to upload because still vhost is not ready. Monitoring-integration won't try it again. So no dashboard is present in grafana. If I restart monitoring-integration manually then I can see dashboard in grafana.

So we need to restart http between tendrl-api and monitoring-integration installation tasks.

Version-Release number of selected component (if applicable):
tendrl-monitoring-integration-1.6.3-15.el7rhgs
tendrl-ansible-1.6.3-10.el7rhgs

How reproducible:
100%

Steps to Reproduce:
1.Run tendrl-ansible playbook
2. open grafana using url: http://{ip}/grafana
3.No grafana dashboards are created.

Actual results:
Monitoring-integration unable to upload default dashboard into grafana

Expected results:
when monitoring-integration starts it should create a default dashboards

Additional info:

Comment 2 Martin Bukatovic 2018-12-06 15:23:55 UTC
This needs further investigation. Especially I'm interested in the reason why
this hasn't been noticed during testing so far, as we did installation and
update testing without noticing this issue.

Comment 3 gowtham 2018-12-07 09:57:47 UTC
Ya, it is not occurring always, why it happens in my setup is I installed and started httpd manually before firing tendrl-ansible.


So in that case httpd is already active so tendrl-ansible is not restarted, so apache configuration changes are not taken.

Comment 4 Daniel Horák 2018-12-07 11:01:28 UTC
I've tried to reproduce the scenario, but default fresh installation works
correctly and as you can see from the truncated output from tendrl-ansible
site.yml playbook: httpd service is enabled and started, before
tendrl-monitoring-integration is restarted.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ ansible-playbook -i cluster.hosts -u root site.yml 

PLAY [tendrl_server] ********************************************************************

<truncated>

TASK [tendrl-ansible.tendrl-server : Enable tendrl-notifier service] ********************
changed: [server.example.com]

TASK [tendrl-ansible.tendrl-server : Start tendrl-notifier service] *********************
changed: [server.example.com]

TASK [tendrl-ansible.tendrl-server : Enable httpd service] ******************************
changed: [server.example.com]

TASK [tendrl-ansible.tendrl-server : Start httpd service] *******************************
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart tendrl-api] *********************
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart rsyslog] ************************
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart tendrl-node-agent] **************
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart tendrl-monitoring-integration] **
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart grafana-server] *****************
changed: [server.example.com]

RUNNING HANDLER [tendrl-ansible.tendrl-server : restart tendrl-notifier] ****************
changed: [server.example.com]

<truncated>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So as Gowtham specify it in comment 3, the problem will arise only if httpd
service is already installed and running before tendrl-ansible site.yml is
launched. (Then it will only ensure, that httpd service is running, but do not
restart it, so the new configuration is not loaded.)

I haven't strong opinion, if this might cause real problem and what is the best
place where to fix it:
* Postinstall/postupdate script in tendrl-api-httpd package, which will ensure
  that httpd will be reloaded (if running), because the new configuration is
  provided by this package.
* Another handler in tendrl-ansible which will restart the httpd service,
  before the rest of the handlers.
* Documentation note, that if there will be such problem, user should restart
  httpd service and then tendrl-monitoring-integration service.

We didn't spotted this issue during update scenario testing, but it might be
worth to change the order of restarted services at the end of the update
scenario[1] - the last step for WA Server. Now httpd service is at the end of
the list of restarted services, so we might consider to put it at first place.

[1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/quick_start_guide/#red_hat_gluster_storage_web_administration_3_4_x_to_3_4_y

Comment 6 Nishanth Thomas 2018-12-10 05:03:33 UTC
This is not a blocker for 3.4.2 as this will not occur in normal use cases(qe team haven't noticed this in their fresh install/Upgrade testing). 

To avoid this:
Stop httpd service before installing/upgrading tendrl

If this happens:
user should restart httpd service and then tendrl-monitoring-integration service

My suggestion is to add a note in documentation and the fix can be targeted for a future BU

Comment 7 Daniel Horák 2018-12-10 09:20:05 UTC
I've reviewed the upgrade scenario and there seems to be possibility, that this
issue might occur under some circumstances. The reason, why we didn't spotted
it earlier is, that we launch the `systemctl restart ...` commands[1] quickly
in a row, so even when httpd was restarted as last command, it was sufficiently
shortly after tendrl-monitoring-integration.

I would propose to fix it in the update documentation, by reordering the list
of restarted services (put httpd at first place). 

[1] last step in section "On Web Administration Server"
    https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/quick_start_guide/#red_hat_gluster_storage_web_administration_3_4_x_to_3_4_y

Comment 8 Daniel Horák 2018-12-10 09:24:00 UTC
(In reply to Daniel Horák from comment #7)
> I would propose to fix it in the update documentation, by reordering the list
> of restarted services (put httpd at first place)[1]. 
>
> [1] last step in section "On Web Administration Server"
> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/quick_start_guide/#red_hat_gluster_storage_web_administration_3_4_x_to_3_4_y

Gowtham and Nishanth, would it be sufficient solution for now?

Comment 9 Daniel Horák 2018-12-10 10:06:07 UTC
Created Doc Bug 1657698, as discussed on the stanup in today.

Comment 10 Martin Bukatovic 2018-12-10 14:01:22 UTC
Based on comment 6 and comment 9, I'm retracting blocker? flag.

Comment 11 Martin Bukatovic 2018-12-10 17:15:51 UTC
Resolution for BU2 via doc update reported as bz 1657698.


Note You need to log in before you can comment on or make changes to this bug.