Created attachment 1506414 [details] grafana-server journal log Description of problem: grafana-server service is failing and tendrl-monitoring-integration is inactive when WA is upgraded according to [1] with repositories that contain BU2 packages. Version-Release number of selected component (if applicable): tendrl-ansible-1.6.3-9.el7rhgs.noarch tendrl-api-1.6.3-8.el7rhgs.noarch tendrl-api-httpd-1.6.3-8.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-15.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-15.el7rhgs.noarch tendrl-node-agent-1.6.3-11.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-12.el7rhgs.noarch grafana-4.6.4-1.el7rhgs.x86_64 How reproducible: 100% (tested 2x times) Steps to Reproduce: 1. Install WA 3.4.0. 2. Import cluster with some volumes. 3. Update WA according to [1]. Use repositories with packages listed in section Version-Release number of selected component of this BZ. Actual results: UI works as expected but all links to grafana are not working and grafana-server service is failed. In journal logs from grafana-server I see lines like: (...) Nov 16 14:04:53 wa-server systemd[1]: Started Grafana instance. Nov 16 14:04:53 wa-server grafana-server[6500]: t=2018-11-16T14:04:53+0100 lvl=info msg="Starting Grafana" logger=server version=4.6.4 commit=unknown-dev compiled=2018-11-05T08:43:17+0100 Nov 16 14:04:53 wa-server grafana-server[6500]: t=2018-11-16T14:04:53+0100 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini Nov 16 14:04:53 wa-server grafana-server[6500]: t=2018-11-16T14:04:53+0100 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini Nov 16 14:04:53 wa-server grafana-server[6500]: t=2018-11-16T14:04:53+0100 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana" Nov 16 14:04:53 wa-server grafana-server[6500]: t=2018-11-16T14:04:53+0100 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana" Nov 16 14:04:53 wa-server systemd[1]: grafana-server.service: main process exited, code=exited, status=1/FAILURE Nov 16 14:04:53 wa-server systemd[1]: Unit grafana-server.service entered failed state. (...) Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Starting Grafana" logger=server version=4.6.4 commit=unknown-dev compiled=2018-11-05T08:43:17+0100 Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Config loaded from" logger=settings file=/usr/share/grafana/conf/defaults.ini Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Config loaded from" logger=settings file=/etc/grafana/grafana.ini Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.data=/var/lib/grafana" Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.logs=/var/log/grafana" Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Config overridden from command line" logger=settings arg="default.paths.plugins=/var/lib/grafana/plug Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Path Home" logger=settings path=/usr/share/grafana Nov 16 14:04:54 wa-server grafana-server[6548]: t=2018-11-16T14:04:54+0100 lvl=info msg="Path Data" logger=settings path=/usr/share/grafana/data Nov 16 14:04:54 wa-server systemd[1]: grafana-server.service: main process exited, code=exited, status=1/FAILURE Nov 16 14:04:54 wa-server systemd[1]: Unit grafana-server.service entered failed state. Nov 16 14:04:54 wa-server systemd[1]: grafana-server.service failed. Nov 16 14:04:54 wa-server systemd[1]: grafana-server.service holdoff time over, scheduling restart. Nov 16 14:04:54 wa-server systemd[1]: Stopped Grafana instance. Nov 16 14:04:54 wa-server systemd[1]: start request repeated too quickly for grafana-server.service Nov 16 14:04:54 wa-server systemd[1]: Failed to start Grafana instance. Nov 16 14:04:54 wa-server systemd[1]: Unit grafana-server.service entered failed state. Nov 16 14:04:54 wa-server systemd[1]: grafana-server.service failed. In tendrl-monitoring-integration journal log are only lines related to periodical starting/stopping tendrl-monitoring-integration except two lines: (...) Nov 16 14:04:54 wa-server systemd[1]: Dependency failed for Monitoring Integration. Nov 16 14:04:54 wa-server systemd[1]: Job tendrl-monitoring-integration.service/start failed with result 'dependency'. (...) Expected results: Grafana should work. Additional info: [1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/installation_guide/index#Updating_Red_Hat_Storage_in_the_Offline_Mode
Gowtham, I understand this happened as grafana configuration was not proper. Please update your findings here.
I have debug in filip machine there what I found is, grafana configuration file path is not updated. It is referring its default configuration file only. in /etc/sysconfig/grafana-server: CONF_DIR=/etc/grafana CONF_FILE=/etc/grafana/grafana.ini this is not correct path correct path is : CONF_DIR=/etc/tendrl/monitoring-integration/grafana/ CONF_FILE=/etc/tendrl/monitoring-integration/grafana/grafana.ini
I have tracked file /etc/sysconfig/grafana-server during the whole process of upgrade as described in documentation and I found that the file is changed after step 2 in [1] in section `On Web Administration Server:` which is: # yum update Before this step: # cat /etc/sysconfig/grafana-server| grep CONF CONF_DIR=/etc/tendrl/monitoring-integration/grafana/ CONF_FILE=/etc/tendrl/monitoring-integration/grafana/grafana.ini After updating all packages on WA server: # cat /etc/sysconfig/grafana-server| grep CONF CONF_DIR=/etc/grafana CONF_FILE=/etc/grafana/grafana.ini Next step where is called: # tendrl-upgrade is already executed with changed configuration. [1] https://access.qa.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/quick_start_guide/#red_hat_gluster_storage_web_administration_3_4_x_to_3_4_y
I didn't tried it, so I'm just guessing now, but wouldn't be sufficient to just run tendrl-ansible after the `yum update`? It shouldn't break anything (if there are no additional manual changes in the configuration) and it might fix this issue (and prospectively any similar issue in the future). If it will work, I would consider it as the best and most systematic solution for such issue (rather then manual steps to update the particular config file).
I agree with Daniel here. Can you please try this out and update here? Then we need to update the documentation as well.
I agree with Daniel, it won't break anything, it will work fine.
Created attachment 1507647 [details] Grafana screen with inspected element after update
I tried running tendrl-ansible after upgrade is done. It fixed grafana configuration and started grafana-server service but grafana for WA is still not working correctly. As seen in attachment 1507647 [details] there are not loaded any js or css libraries. I inspected source of this page that was updated and also of page that was freshly installed to current version without updating and main difference is in base element where is currently set href="/" but probably should be href="/grafana/". Url also seems wrong when linked from WA. Tested with: tendrl-ansible-1.6.3-9.el7rhgs.noarch tendrl-api-1.6.3-8.el7rhgs.noarch tendrl-api-httpd-1.6.3-8.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-15.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-15.el7rhgs.noarch tendrl-node-agent-1.6.3-11.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-13.el7rhgs.noarch
The root cause of this issue seems to be, that grafana configuration files are not properly marked as configuration files in RPM. There are three files in /etc/* which should be probably considered as *configuration" files in grafana package: # rpm -ql grafana | grep /etc /etc/grafana /etc/grafana/grafana.ini /etc/grafana/ldap.toml /etc/sysconfig/grafana-server But neither of them is marked as "configuration" file in RPM: # rpm -qc grafana | grep /etc # This leads to the situation, when package update rewrite the changes in the configuration files to the default values. Without fixing this, anything else will be just workaround with possible additional consequences in the future. So the main question now is: Are we able to fix this problem in grafana package spec file?
(In reply to Filip Balák from comment #10) > I tried running tendrl-ansible after upgrade is done. It fixed grafana > configuration and started grafana-server service but grafana for WA is still > not working correctly. As seen in attachment 1507647 [details] there are not > loaded any js or css libraries. > > I inspected source of this page that was updated and also of page that was > freshly installed to current version without updating and main difference is > in base element where is currently set href="/" but probably should be > href="/grafana/". Url also seems wrong when linked from WA. The root cause of this issue is, that existing configuration file: /etc/tendrl/monitoring-integration/grafana/grafana.ini is not updated by the new version from the update monitoring-integration package (mainly the line "root_url = http://localhost:3000/grafana" is missing). This behaviour is correct, because: 1) the grafana.ini file is correctly marked as configuration file in tendrl-monitoring-integration package: # rpm -qc tendrl-monitoring-integration | grep grafana.ini /etc/tendrl/monitoring-integration/grafana/grafana.ini 2) And the configuration file was changed in comparison with the version from the rpm package (value for admin_password was changed by tendrl-ansible during installation process). Because of the two points above, when the tendrl-monitoring-integration package is updated, the existing configuration file is not overwritten and the new version is is only saved with .rpmnew suffix: # ls /etc/tendrl/monitoring-integration/grafana/grafana.ini* /etc/tendrl/monitoring-integration/grafana/grafana.ini /etc/tendrl/monitoring-integration/grafana/grafana.ini.rpmnew # diff /etc/tendrl/monitoring-integration/grafana/grafana.ini \ /etc/tendrl/monitoring-integration/grafana/grafana.ini.rpmnew 47c47 < ;root_url = http://localhost:3000 --- > root_url = http://localhost:3000/grafana 146c146 < admin_password = IPLmZggPtIagGxlejRyqrhGXoPFlTb --- > admin_password = admin Generally in this situation it is up to the admin, to check the difference between the two configuration files and update the existing configuration where necessary. In our specific case, I see two options: a) document this as known issue and let the user manually edit the configuration file and add the line: root_url = http://localhost:3000/grafana b) add new task to tendrl-ansible, to ensure, that this line will be present in the configuration file.
We will add a new task to tendrl-ansible, to ensure, the correct root_url is present in the configuration file(grafana.ini). Also we need to add an additional step in the updgrade guide to add an additional step to run tendrl-ansible after the upgrade steps are completed. @Daniel/Filip, Please create a documentation bug for the same.
I have tested the update with solution described in comment 14. tendrl-ansible was executed right after tendrl-upgrade script is finished. Grafana is working as expected after this but with verification of this bz I will wait for documentation (bz 1652561). Tested with: tendrl-ansible-1.6.3-10.el7rhgs.noarch tendrl-api-1.6.3-8.el7rhgs.noarch tendrl-api-httpd-1.6.3-8.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-15.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-15.el7rhgs.noarch tendrl-node-agent-1.6.3-11.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-14.el7rhgs.noarch
I have retested update to BU2 with documentation from bz 1652561. Now it is configured correctly and grafana seems to work ok. --> VERIFIED During testing of this bz there were found issues with web cache and one of these issues affects links to grafana as described in https://bugzilla.redhat.com/show_bug.cgi?id=1654331#c3 so user needs to remove web cache to see correct links to grafana but this is tracked in bz 1654331. Tested with: tendrl-ansible-1.6.3-10.el7rhgs.noarch tendrl-api-1.6.3-8.el7rhgs.noarch tendrl-api-httpd-1.6.3-8.el7rhgs.noarch tendrl-commons-1.6.3-13.el7rhgs.noarch tendrl-grafana-plugins-1.6.3-15.el7rhgs.noarch tendrl-grafana-selinux-1.5.4-2.el7rhgs.noarch tendrl-monitoring-integration-1.6.3-15.el7rhgs.noarch tendrl-node-agent-1.6.3-11.el7rhgs.noarch tendrl-notifier-1.6.3-4.el7rhgs.noarch tendrl-selinux-1.5.4-2.el7rhgs.noarch tendrl-ui-1.6.3-14.el7rhgs.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:3829