Bug 1984150

Summary: metrics role: Grafana dashboard not working after metrics role run unless services manually restarted
Product: Red Hat Enterprise Linux 9 Reporter: Rich Megginson <rmeggins>
Component: rhel-system-rolesAssignee: Rich Megginson <rmeggins>
Status: CLOSED CURRENTRELEASE QA Contact: Jan Kurik <jkurik>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.0CC: agerstmayr, briasmit, djez, mcermak, mgoodwin, myllynen, nathans, nhosoi, rhel-cs-system-management-subsystem-qe, spetrosi
Target Milestone: betaKeywords: Triaged
Target Release: 9.0 Beta   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: role:metrics
Fixed In Version: rhel-system-roles-1.7.5-1.el9 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1978357 Environment:
Last Closed: 2021-12-07 22:04:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1978357    
Bug Blocks:    

Description Rich Megginson 2021-07-20 20:01:50 UTC
+++ This bug was initially created as a clone of Bug #1978357 +++

Description of problem:
If I run the metrics role to setup a grafana host and server clients to be monitored, the playbook completes successfully, but when I pull up the "PCP Redis: Host Overview" dashboard in Grafana I get these error messages:

Templating [host]
Error updating options: timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'
Error
timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'


Version-Release number of selected component (if applicable):
$ rpm -qa | egrep "rhel-system-roles|pcp|grafana"
pcp-conf-5.2.5-4.el8.x86_64
pcp-pmda-nfsclient-5.2.5-4.el8.x86_64
pcp-zeroconf-5.2.5-4.el8.x86_64
grafana-pcp-3.0.2-1.el8.x86_64
pcp-doc-5.2.5-4.el8.noarch
pcp-libs-5.2.5-4.el8.x86_64
python3-pcp-5.2.5-4.el8.x86_64
pcp-pmda-openmetrics-5.2.5-4.el8.x86_64
pcp-pmda-dm-5.2.5-4.el8.x86_64
grafana-7.3.6-2.el8.x86_64
rhel-system-roles-1.0.1-1.el8.noarch
pcp-selinux-5.2.5-4.el8.x86_64
pcp-5.2.5-4.el8.x86_64
pcp-system-tools-5.2.5-4.el8.x86_64


How reproducible:
Every time


Steps to Reproduce:
1. Start with 5 newly built RHEL hosts
2. Manually apply fix for https://github.com/linux-system-roles/metrics/pull/87
3. Run metrics role to setup PCP on the hosts and setup Grafana on one of the hosts
4. Import the "PCP Redis: Host Overview" dashboard in Grafana
5. View the "PCP Redis: Host Overview" dashboard

Actual results:
Templating [host]
Error updating options: timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'
Error
timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'


Expected results:
Dashboard works 


Additional info:
If I run "sudo systemctl restart pmcd pmlogger pmproxy" on the grafana host after the metrics role completes, everything works as expected.

...

--- Additional comment from Brian Smith on 2021-07-20 15:36:20 UTC ---

Hi Nathan,
I did some more testing this morning.  With the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd patch and also moving the firewall tasks to the top of the playbook, everything worked properly on two runs of the playbook (restoring the VM's back to a clean snapshot each time).  

The https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd is definitely needed.  I tried running the playbook with the firewall tasks at the top of the playbook, and without this patch, and ran in to the original issue in the description of the BZ.  

I'm working on getting the blog post updated so that it shows the firewall tasks at the top of the playbook, and will also still have the workaround of restarting pmlogger and pmproxy to workaround this BZ.  

Thanks for your help getting this resolved!