RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1978357 - metrics role: Grafana dashboard not working after metrics role run unless services manually restarted
Summary: metrics role: Grafana dashboard not working after metrics role run unless ser...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: rhel-system-roles
Version: 8.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: beta
: 8.5
Assignee: Rich Megginson
QA Contact: Jan Kurik
URL:
Whiteboard: role:metrics
Depends On:
Blocks: 1984150
TreeView+ depends on / blocked
 
Reported: 2021-07-01 16:10 UTC by Brian Smith
Modified: 2022-08-02 18:06 UTC (History)
8 users (show)

Fixed In Version: rhel-system-roles-1.6.4-1.el8
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1984150 (view as bug list)
Environment:
Last Closed: 2021-11-09 17:46:02 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4159 0 None None None 2021-11-09 17:46:15 UTC

Description Brian Smith 2021-07-01 16:10:25 UTC
Description of problem:
If I run the metrics role to setup a grafana host and server clients to be monitored, the playbook completes successfully, but when I pull up the "PCP Redis: Host Overview" dashboard in Grafana I get these error messages:

Templating [host]
Error updating options: timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'
Error
timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'


Version-Release number of selected component (if applicable):
$ rpm -qa | egrep "rhel-system-roles|pcp|grafana"
pcp-conf-5.2.5-4.el8.x86_64
pcp-pmda-nfsclient-5.2.5-4.el8.x86_64
pcp-zeroconf-5.2.5-4.el8.x86_64
grafana-pcp-3.0.2-1.el8.x86_64
pcp-doc-5.2.5-4.el8.noarch
pcp-libs-5.2.5-4.el8.x86_64
python3-pcp-5.2.5-4.el8.x86_64
pcp-pmda-openmetrics-5.2.5-4.el8.x86_64
pcp-pmda-dm-5.2.5-4.el8.x86_64
grafana-7.3.6-2.el8.x86_64
rhel-system-roles-1.0.1-1.el8.noarch
pcp-selinux-5.2.5-4.el8.x86_64
pcp-5.2.5-4.el8.x86_64
pcp-system-tools-5.2.5-4.el8.x86_64


How reproducible:
Every time


Steps to Reproduce:
1. Start with 5 newly built RHEL hosts
2. Manually apply fix for https://github.com/linux-system-roles/metrics/pull/87
3. Run metrics role to setup PCP on the hosts and setup Grafana on one of the hosts
4. Import the "PCP Redis: Host Overview" dashboard in Grafana
5. View the "PCP Redis: Host Overview" dashboard

Actual results:
Templating [host]
Error updating options: timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'
Error
timeout while connecting to 'http://localhost:44322/series/labels?names=hostname'


Expected results:
Dashboard works 


Additional info:
If I run "sudo systemctl restart pmcd pmlogger pmproxy" on the grafana host after the metrics role completes, everything works as expected.

Comment 1 Rich Megginson 2021-07-01 16:28:35 UTC
Nathan?

Comment 2 Nathan Scott 2021-07-06 00:28:35 UTC
| If I run "sudo systemctl restart pmcd pmlogger pmproxy" on the grafana host after the metrics role completes, everything works as expected.

Just back from PTO and reading this one now Brian  (thanks for the bug report), while reviewing your draft blog post (thanks for that too!).

I'll have to set aside a bigger chunk of time to analyze it (its all running fine on my Fedora laptop, so I'll need to setup some RHEL hosts), but until then - do you have to restart all three of those services?  Or does just restarting pmproxy suffice?  Definitely pmproxy is not yet accepting connections (the "timeout while connecting" is Grafana trying to talk to pmproxy).  What does the pcp(1) command say for you right after running the role - does it give "timeout"/"connection refused"?  (pcp client tool talks to pmcd).  Thanks.

Mark and Andreas, are there any known RHEL 8.4 issues with pcp systemd service starting that might explain this behaviour?  (was there a zstream update helping along those lines, from a vague memory?)  Thanks.

Comment 3 Andreas Gerstmayr 2021-07-06 09:33:15 UTC
(In reply to Nathan Scott from comment #2)
> Mark and Andreas, are there any known RHEL 8.4 issues with pcp systemd
> service starting that might explain this behaviour?  (was there a zstream
> update helping along those lines, from a vague memory?)  Thanks.

Sounds like a timing issue to me.
I'm wondering what can be the cause that pmproxy is not running - first thing I'd do is check the pmproxy logs and journal entry.

Another thing to watch out:
In the linux-system-roles, the Redis role is setup after the pcp role: https://github.com/linux-system-roles/metrics/blob/main/tasks/main.yml#L56-L73
So when pmproxy starts up for the first time, Redis is not available, and it will disable time-series functionality.

Also, after running the role for the first time, all services are started because of

- name: ...
  service:
    name: ...
    state: started
    enabled: yes

and then (because of the updated config files) afaics all handlers are running as well, restarting all services again. I'm not sure how deterministic the order of restarts is there, and if it can cause any conflict.

For the systemd unit, we had one recent change, making sure that pmproxy starts *after* Redis: https://github.com/performancecopilot/pcp/commit/44a3ecaa8b1dc5ab518d26d00bcafd5ebb29b3ef
However, that only applies if both are started at the same time/in the same transaction (e.g. at boot), Redis is *not* a dependency of pmproxy and doesn't get started automatically when starting pmproxy.

Comment 4 Brian Smith 2021-07-06 21:31:06 UTC
Hi Nathan and Andreas, 
Here are the answer to your questions.  Starting from fresh RHEL systems, I run the metrics role, login to grafana, import the "PCP Redis: Host Overview" dashboard, and get the errors in the dashboard mentioned in comment #1.  

At this point, the output of pcp:

[ansible@controlnode ~]$ pcp
Performance Co-Pilot configuration on controlnode.example.com:

 platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM
 timezone: UTC
 services: pmcd pmproxy
     pmcd: Version 5.2.5-4, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
 pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210706.20.30
           rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210706.20.30
           rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210706.20.30
           rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210706.20.30
           rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210706.20.30
     pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log
           rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log
           rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log
           rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log
           rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log


-----------------------------

[ansible@controlnode ~]$ cat /var/log/pcp/pmproxy/pmproxy.log
Log for pmproxy on controlnode.example.com started Tue Jul  6 20:23:12 2021

pmproxy: PID = 30921, PDU version = 2, user = pcp (993)
pmproxy request port(s):
  sts fd   port  family address
  === ==== ===== ====== =======
  ok    14       unix   /run/pcp/pmproxy.socket
  ok    15 44322 inet   INADDR_ANY
  ok    19 44322 ipv6   INADDR_ANY
  ok    20 44323 inet   INADDR_ANY
  ok    21 44323 ipv6   INADDR_ANY
[Tue Jul  6 20:23:12] pmproxy(30921) Info: OpenSSL 1.1.1g FIPS  21 Apr 2020 - no certificates found
[Tue Jul  6 20:23:12] pmproxy(30921) Info: Redis slots, command keys, schema version setup
Tue Jul  6 20:23:14 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:23:18 2021 discovery callback: finished log-rolling
Tue Jul  6 20:23:19 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:23:27 2021 discovery callback: finished log-rolling
Tue Jul  6 20:23:27 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:23:34 2021 discovery callback: finished log-rolling
Tue Jul  6 20:23:34 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:23:39 2021 discovery callback: finished log-rolling
Tue Jul  6 20:23:39 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:23:44 2021 discovery callback: finished log-rolling
Tue Jul  6 20:25:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:25:30 2021 discovery callback: finished log-rolling
Tue Jul  6 20:25:30 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:25:30 2021 discovery callback: finished log-rolling
Tue Jul  6 20:25:30 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:25:30 2021 discovery callback: finished log-rolling
Tue Jul  6 20:25:30 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:25:31 2021 discovery callback: finished log-rolling
Tue Jul  6 20:30:17 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:30:20 2021 discovery callback: finished log-rolling
Tue Jul  6 20:30:21 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:30:27 2021 discovery callback: finished log-rolling
Tue Jul  6 20:30:27 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:30:33 2021 discovery callback: finished log-rolling
Tue Jul  6 20:30:33 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:30:38 2021 discovery callback: finished log-rolling
Tue Jul  6 20:30:38 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:30:42 2021 discovery callback: finished log-rolling


-----------------------------

[ansible@controlnode ~]$ sudo journalctl -u pmproxy
-- Logs begin at Mon 2021-06-28 18:20:06 UTC, end at Tue 2021-07-06 20:44:24 UTC. --
Jul 06 20:22:02 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Jul 06 20:22:02 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon...
Jul 06 20:23:12 controlnode.example.com systemd[1]: pmproxy.service: Succeeded.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Jul 06 20:23:12 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon.


-----------------------------

At this point, I tried to only restart pmproxy, and refresh the "PCP Redis: Host Overview" dashboard, and the dashboard is working at this point (can see graphs from all 5 hosts). 

-----------------------------

Checking the pmproxy.log at this point, now that the dashboard is working:

[ansible@controlnode ~]$ cat /var/log/pcp/pmproxy/pmproxy.log
Log for pmproxy on controlnode.example.com started Tue Jul  6 20:46:11 2021

pmproxy: PID = 36478, PDU version = 2, user = pcp (993)
pmproxy request port(s):
  sts fd   port  family address
  === ==== ===== ====== =======
  ok    14       unix   /run/pcp/pmproxy.socket
  ok    15 44322 inet   INADDR_ANY
  ok    19 44322 ipv6   INADDR_ANY
  ok    20 44323 inet   INADDR_ANY
  ok    21 44323 ipv6   INADDR_ANY
[Tue Jul  6 20:46:11] pmproxy(36478) Info: OpenSSL 1.1.1g FIPS  21 Apr 2020 - no certificates found
[Tue Jul  6 20:46:11] pmproxy(36478) Info: Redis slots, command keys, schema version setup
Tue Jul  6 20:55:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:24 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:24 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:24 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:25 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:25 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:25 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:25 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:25 2021 discovery callback: finished log-rolling
Tue Jul  6 20:55:26 2021 discovery callback: log-rolling in progress
Tue Jul  6 20:55:26 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:24 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:24 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:24 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:24 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:25 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:25 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:25 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:25 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:25 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:25 2021 discovery callback: finished log-rolling
Tue Jul  6 21:25:26 2021 discovery callback: log-rolling in progress
Tue Jul  6 21:25:26 2021 discovery callback: finished log-rolling

--------------------------------

And the journal: 

$ sudo journalctl -u pmproxy
-- Logs begin at Mon 2021-06-28 18:20:06 UTC, end at Tue 2021-07-06 21:28:53 UTC. --
Jul 06 20:22:02 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Jul 06 20:22:02 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon...
Jul 06 20:23:12 controlnode.example.com systemd[1]: pmproxy.service: Succeeded.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon.
Jul 06 20:23:12 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Jul 06 20:23:12 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon.
Jul 06 20:46:11 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon...
Jul 06 20:46:11 controlnode.example.com systemd[1]: pmproxy.service: Succeeded.
Jul 06 20:46:11 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon.
Jul 06 20:46:11 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon...
Jul 06 20:46:11 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon.

Comment 5 Nathan Scott 2021-07-07 00:27:03 UTC
| [...] making sure that pmproxy starts *after* Redis

I've tweaked the metrics role to ensure this ordering is enforced - maybe this is the issue, on a fresh install (good thinking, Batman^WAndreas).  Brian, any chance you can make the trivial change locally that I've committed here, to try it out?

https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd

cheers.

Comment 6 Brian Smith 2021-07-07 13:14:16 UTC
Hi Nathan,
I tried out the changes in the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd commit, and it fixed the initial issue, but appeared to introduce a different issue.  

I tried this out two times (both times restoring my 5 VM's from a clean snapshot), and got these exact results both times.

After the metrics role ran with your patch, I was able to access the "PCP Redis: Host Overview" grafana dashboard with no errors

However, in my environment, I have the controlnode, two RHEL 8 clients, and two RHEL 7 clients.  The RHEL 8 clients showed up in Grafana, but the RHEL 7 clients did not.  

PCP command output is also missing the RHEL 7 servers under 'pmlogger':

[ansible@controlnode pcp]$ pcp
Performance Co-Pilot configuration on controlnode.example.com:

 platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM
 timezone: UTC
 services: pmcd pmproxy
     pmcd: Version 5.2.5-4, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
 pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.12.59
           rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.13.00
           rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.13.00
     pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log
           rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log
           rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log
           rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log
           rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log

---------------------------

Log files:

[ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server1/pmlogger.log 
Log for pmlogger on controlnode.example.com started Wed Jul  7 12:59:58 2021

pmlogger: Cannot connect to PMCD on host "rhel7-server1": Connection refused

Log finished Wed Jul  7 12:59:59 2021

[ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server2/pmlogger.log 
Log for pmlogger on controlnode.example.com started Wed Jul  7 13:00:05 2021

pmlogger: Cannot open config file "config.rhel7-server2": No such file or directory

Log finished Wed Jul  7 13:00:05 2021

---------------------------

[ansible@controlnode ~]$ ls -al /etc/pcp/pmlogger/control.d/
total 20
drwxr-xr-x. 2 root root 103 Jul  7 12:58 .
drwxr-xr-x. 4 root root  74 Jul  7 12:57 ..
-rw-r--r--. 1 root root 695 Feb 19 09:11 local
-rw-r--r--. 1 root root 136 Jul  7 12:58 rhel7-server1
-rw-r--r--. 1 root root 136 Jul  7 12:58 rhel7-server2
-rw-r--r--. 1 root root 136 Jul  7 12:58 rhel8-server1
-rw-r--r--. 1 root root 136 Jul  7 12:58 rhel8-server2

---------------------------

If I restart pmlogger, the RHEL 7 servers show up in pcp output and grafana dashboard, and everything is working:


[ansible@controlnode ~]$ sudo systemctl restart pmlogger

<wait a minute>

[ansible@controlnode ~]$ pcp
Performance Co-Pilot configuration on controlnode.example.com:

 platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM
 timezone: UTC
 services: pmcd pmproxy
     pmcd: Version 5.2.5-4, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
 pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.13.09
           rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210707.13.09
           rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.13.09
           rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.13.09
           rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.13.09
     pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log
           rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log
           rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log
           rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log
           rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log



----------------------------

If it helps, here is the output from the original metrics playbook run:


[ansible@controlnode metrics]$ ansible-playbook metrics.yml -i inventory/ -b

PLAY [Use metrics system role to configure PCP metrics recording] ************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************************************************************************
ok: [rhel7-server2]
ok: [rhel7-server1]
ok: [rhel8-server2]
ok: [rhel8-server1]

TASK [redhat.rhel_system_roles.metrics : Add Elasticsearch to metrics domain list] *******************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [redhat.rhel_system_roles.metrics : Add SQL Server to metrics domain list] **********************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [redhat.rhel_system_roles.metrics : Add bpftrace to metrics domain list] ************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [redhat.rhel_system_roles.metrics : Setup metrics access for roles] *****************************************************************************************************************************************************************************************************************************************************
ok: [rhel8-server1]
ok: [rhel8-server2]
ok: [rhel7-server1]
ok: [rhel7-server2]

TASK [Configure Elasticsearch metrics] ***************************************************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [Configure SQL Server metrics.] *****************************************************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [Setup bpftrace metrics.] ***********************************************************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [Setup metric querying service.] ****************************************************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [Setup metric collection service.] **************************************************************************************************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set platform/version specific variables] ************************************************************************************************************************************************************************************************************************
ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml)
skipping: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) 
ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml)
skipping: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) 
ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.yml)
skipping: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.9.yml) 
ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.yml)
skipping: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.9.yml) 

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install Performance Co-Pilot packages] **************************************************************************************************************************************************************************************************************************
changed: [rhel8-server2]
changed: [rhel8-server1]
changed: [rhel7-server2]
changed: [rhel7-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install authentication packages] ********************************************************************************************************************************************************************************************************************************
changed: [rhel8-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]
changed: [rhel7-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmcd.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : List optional metric collection agents to be enabled] ***********************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Extract metric collection configuration file content] ***********************************************************************************************************************************************************************************************************
ok: [rhel7-server1]
ok: [rhel7-server2]
ok: [rhel8-server1]
ok: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure optional metric collection agents are enabled] ***********************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure explicit metric label path exists] ***********************************************************************************************************************************************************************************************************************
ok: [rhel7-server2]
ok: [rhel7-server1]
ok: [rhel8-server1]
ok: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure implicit metric label path exists] ***********************************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
ok: [rhel8-server1]
ok: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any explicit metric labels are configured] ***************************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
changed: [rhel8-server1]
changed: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any implicit metric labels are configured] ***************************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel8-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is configured] **************************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel8-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector system accounts are configured] *********************************************************************************************************************************************************************************************
changed: [rhel7-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
changed: [rhel7-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
changed: [rhel8-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
changed: [rhel8-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector SASL accounts are configured] ***********************************************************************************************************************************************************************************************
ok: [rhel7-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
ok: [rhel8-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
ok: [rhel7-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})
ok: [rhel8-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector authentication is configured] ***********************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]
changed: [rhel8-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set variable to do pmcd restart if needed] **********************************************************************************************************************************************************************************************************************
ok: [rhel8-server1]
ok: [rhel8-server2]
ok: [rhel7-server1]
ok: [rhel7-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Report performance metric collector restart state] **************************************************************************************************************************************************************************************************************
ok: [rhel8-server1] => {
    "msg": [
        "optional_agents: False",
        "explicit_labels: True",
        "implicit_labels: True",
        "defaults_config: True",
        "authentication: True",
        "restart_pmcd: True"
    ]
}
ok: [rhel8-server2] => {
    "msg": [
        "optional_agents: False",
        "explicit_labels: True",
        "implicit_labels: True",
        "defaults_config: True",
        "authentication: True",
        "restart_pmcd: True"
    ]
}
ok: [rhel7-server1] => {
    "msg": [
        "optional_agents: False",
        "explicit_labels: True",
        "implicit_labels: True",
        "defaults_config: True",
        "authentication: True",
        "restart_pmcd: True"
    ]
}
ok: [rhel7-server2] => {
    "msg": [
        "optional_agents: False",
        "explicit_labels: True",
        "implicit_labels: True",
        "defaults_config: True",
        "authentication: True",
        "restart_pmcd: True"
    ]
}

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is running and enabled on boot] *********************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is restarted and enabled on boot] *******************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]
changed: [rhel8-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmie.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group directories exist] **********************************************************************************************************************************************************************************************************
changed: [rhel7-server1] => (item=network)
changed: [rhel7-server2] => (item=network)
ok: [rhel8-server1] => (item=network)
ok: [rhel8-server2] => (item=network)
changed: [rhel7-server2] => (item=zeroconf)
changed: [rhel7-server1] => (item=zeroconf)
ok: [rhel8-server1] => (item=zeroconf)
ok: [rhel8-server2] => (item=zeroconf)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group link directories exist] *****************************************************************************************************************************************************************************************************
changed: [rhel7-server2] => (item=network)
changed: [rhel7-server1] => (item=network)
ok: [rhel8-server1] => (item=network)
ok: [rhel8-server2] => (item=network)
ok: [rhel7-server2] => (item=zeroconf)
ok: [rhel7-server1] => (item=zeroconf)
ok: [rhel8-server1] => (item=zeroconf)
ok: [rhel8-server2] => (item=zeroconf)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rules are installed for targeted hosts] ************************************************************************************************************************************************************************************************
changed: [rhel7-server1] => (item=network/tcplistenoverflows)
changed: [rhel7-server2] => (item=network/tcplistenoverflows)
changed: [rhel8-server2] => (item=network/tcplistenoverflows)
changed: [rhel8-server1] => (item=network/tcplistenoverflows)
changed: [rhel7-server1] => (item=network/tcpqfulldocookies)
changed: [rhel7-server2] => (item=network/tcpqfulldocookies)
changed: [rhel8-server2] => (item=network/tcpqfulldocookies)
changed: [rhel8-server1] => (item=network/tcpqfulldocookies)
changed: [rhel7-server1] => (item=network/tcpqfulldrops)
changed: [rhel7-server2] => (item=network/tcpqfulldrops)
changed: [rhel8-server2] => (item=network/tcpqfulldrops)
changed: [rhel8-server1] => (item=network/tcpqfulldrops)
changed: [rhel7-server1] => (item=zeroconf/all_threads)
changed: [rhel7-server2] => (item=zeroconf/all_threads)
changed: [rhel8-server2] => (item=zeroconf/all_threads)
changed: [rhel8-server1] => (item=zeroconf/all_threads)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra rules symlinks have been created for targeted hosts] ***********************************************************************************************************************************************************************************************
changed: [rhel7-server1] => (item=network/tcplistenoverflows)
changed: [rhel7-server2] => (item=network/tcplistenoverflows)
changed: [rhel8-server1] => (item=network/tcplistenoverflows)
changed: [rhel8-server2] => (item=network/tcplistenoverflows)
changed: [rhel7-server2] => (item=network/tcpqfulldocookies)
changed: [rhel7-server1] => (item=network/tcpqfulldocookies)
changed: [rhel8-server1] => (item=network/tcpqfulldocookies)
changed: [rhel8-server2] => (item=network/tcpqfulldocookies)
changed: [rhel7-server2] => (item=network/tcpqfulldrops)
changed: [rhel7-server1] => (item=network/tcpqfulldrops)
changed: [rhel8-server1] => (item=network/tcpqfulldrops)
changed: [rhel8-server2] => (item=network/tcpqfulldrops)
changed: [rhel7-server2] => (item=zeroconf/all_threads)
changed: [rhel7-server1] => (item=zeroconf/all_threads)
changed: [rhel8-server2] => (item=zeroconf/all_threads)
changed: [rhel8-server1] => (item=zeroconf/all_threads)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is enabled for targeted hosts] **********************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is running and enabled on boot] *********************************************************************************************************************************************************************************************
ok: [rhel7-server1]
ok: [rhel7-server2]
ok: [rhel8-server2]
ok: [rhel8-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmlogger.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure metric log location is configured] ***********************************************************************************************************************************************************************************************************************
changed: [rhel7-server2]
changed: [rhel7-server1]
ok: [rhel8-server1]
ok: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is configured] ****************************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
changed: [rhel8-server1]
changed: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging retention period is set] ******************************************************************************************************************************************************************************************************
changed: [rhel7-server1]
changed: [rhel7-server2]
changed: [rhel8-server2]
changed: [rhel8-server1]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is enabled for targeted hosts] ************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is running and enabled on boot] ***********************************************************************************************************************************************************************************************
ok: [rhel7-server1]
ok: [rhel8-server1]
ok: [rhel7-server2]
ok: [rhel8-server2]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

TASK [Setup metric graphing service.] ****************************************************************************************************************************************************************************************************************************************************************************************
skipping: [rhel8-server1]
skipping: [rhel8-server2]
skipping: [rhel7-server1]
skipping: [rhel7-server2]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmproxy] *************************************************************************************************************************************************************************************************************************************
changed: [rhel7-server2]
changed: [rhel7-server1]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmlogger] ************************************************************************************************************************************************************************************************************************************
changed: [rhel8-server1]
changed: [rhel8-server2]
changed: [rhel7-server1]
changed: [rhel7-server2]

PLAY [Use metrics system role to configure Grafana] **************************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.metrics : Add Elasticsearch to metrics domain list] *******************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [redhat.rhel_system_roles.metrics : Add SQL Server to metrics domain list] **********************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [redhat.rhel_system_roles.metrics : Add bpftrace to metrics domain list] ************************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [redhat.rhel_system_roles.metrics : Setup metrics access for roles] *****************************************************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [Configure Elasticsearch metrics] ***************************************************************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [Configure SQL Server metrics.] *****************************************************************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [Setup bpftrace metrics.] ***********************************************************************************************************************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [Setup metric querying service.] ****************************************************************************************************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Set platform/version specific variables] **********************************************************************************************************************************************************************************************************************
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat_8.yml)
skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat_8.4.yml) 

TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Install Redis packages] ***************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Ensure Redis service is configured] ***************************************************************************************************************************************************************************************************************************
changed: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/templates/RedHat_8_redis.conf.j2)

TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Ensure Redis service is running and enabled on boot] **********************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [Setup metric collection service.] **************************************************************************************************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set platform/version specific variables] ************************************************************************************************************************************************************************************************************************
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml)
skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) 

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install Performance Co-Pilot packages] **************************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install authentication packages] ********************************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmcd.yml for controlnode

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : List optional metric collection agents to be enabled] ***********************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Extract metric collection configuration file content] ***********************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure optional metric collection agents are enabled] ***********************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure explicit metric label path exists] ***********************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure implicit metric label path exists] ***********************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any explicit metric labels are configured] ***************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any implicit metric labels are configured] ***************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is configured] **************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector system accounts are configured] *********************************************************************************************************************************************************************************************
changed: [controlnode] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector SASL accounts are configured] ***********************************************************************************************************************************************************************************************
ok: [controlnode] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'})

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector authentication is configured] ***********************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set variable to do pmcd restart if needed] **********************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Report performance metric collector restart state] **************************************************************************************************************************************************************************************************************
ok: [controlnode] => {
    "msg": [
        "optional_agents: False",
        "explicit_labels: True",
        "implicit_labels: True",
        "defaults_config: True",
        "authentication: True",
        "restart_pmcd: True"
    ]
}

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is running and enabled on boot] *********************************************************************************************************************************************************************************************
skipping: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is restarted and enabled on boot] *******************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmie.yml for controlnode

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group directories exist] **********************************************************************************************************************************************************************************************************
ok: [controlnode] => (item=network)
ok: [controlnode] => (item=zeroconf)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group link directories exist] *****************************************************************************************************************************************************************************************************
ok: [controlnode] => (item=network)
ok: [controlnode] => (item=zeroconf)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rules are installed for targeted hosts] ************************************************************************************************************************************************************************************************
changed: [controlnode] => (item=network/tcplistenoverflows)
changed: [controlnode] => (item=network/tcpqfulldocookies)
changed: [controlnode] => (item=network/tcpqfulldrops)
changed: [controlnode] => (item=zeroconf/all_threads)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra rules symlinks have been created for targeted hosts] ***********************************************************************************************************************************************************************************************
changed: [controlnode] => (item=network/tcplistenoverflows)
changed: [controlnode] => (item=network/tcpqfulldocookies)
changed: [controlnode] => (item=network/tcpqfulldrops)
changed: [controlnode] => (item=zeroconf/all_threads)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is enabled for targeted hosts] **********************************************************************************************************************************************************************************************
changed: [controlnode] => (item=rhel8-server1)
changed: [controlnode] => (item=rhel8-server2)
changed: [controlnode] => (item=rhel7-server1)
changed: [controlnode] => (item=rhel7-server2)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is running and enabled on boot] *********************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmlogger.yml for controlnode

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure metric log location is configured] ***********************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is configured] ****************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging retention period is set] ******************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is enabled for targeted hosts] ************************************************************************************************************************************************************************************************
changed: [controlnode] => (item=rhel8-server1)
changed: [controlnode] => (item=rhel8-server2)
changed: [controlnode] => (item=rhel7-server1)
changed: [controlnode] => (item=rhel7-server2)

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is running and enabled on boot] ***********************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] **************************************************************************************************************************************************************************************************************************************************
included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmproxy.yml for controlnode

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure REST API, proxy and metric log discovery is configured] **************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure REST API, proxy and log discovery is running and enabled on boot] ****************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [Setup metric graphing service.] ****************************************************************************************************************************************************************************************************************************************************************************************

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Set platform/version specific variables] ********************************************************************************************************************************************************************************************************************
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat.yml)
ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat_8.yml)
skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat_8.4.yml) 

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Install Grafana packages] ***********************************************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Template Grafana configuration] *****************************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure Grafana configuration directory exists] **************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure Grafana service is configured with datasources] ******************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure graphing service is running and enabled on boot] *****************************************************************************************************************************************************************************************************
changed: [controlnode]

TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure graphing service runtime settings are configured] ****************************************************************************************************************************************************************************************************
ok: [controlnode]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_redis : restart redis] *************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmie] ****************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmproxy] *************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmlogger] ************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_grafana : restart grafana] *********************************************************************************************************************************************************************************************************************************
changed: [controlnode]

PLAY [Open Firewall for pmcd] ************************************************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************************************************************************
ok: [rhel8-server1]
ok: [rhel7-server1]
ok: [rhel8-server2]
ok: [rhel7-server2]

TASK [firewalld] *************************************************************************************************************************************************************************************************************************************************************************************************************
changed: [rhel8-server2]
changed: [rhel8-server1]
changed: [rhel7-server2]
changed: [rhel7-server1]

PLAY [Open Firewall for grafana] *********************************************************************************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] *******************************************************************************************************************************************************************************************************************************************************************************************************
ok: [controlnode]

TASK [firewalld] *************************************************************************************************************************************************************************************************************************************************************************************************************
changed: [controlnode]

PLAY RECAP *******************************************************************************************************************************************************************************************************************************************************************************************************************
controlnode                : ok=52   changed=28   unreachable=0    failed=0    skipped=9    rescued=0    ignored=0   
rhel7-server1              : ok=33   changed=19   unreachable=0    failed=0    skipped=14   rescued=0    ignored=0   
rhel7-server2              : ok=33   changed=19   unreachable=0    failed=0    skipped=14   rescued=0    ignored=0   
rhel8-server1              : ok=32   changed=14   unreachable=0    failed=0    skipped=14   rescued=0    ignored=0   
rhel8-server2              : ok=32   changed=14   unreachable=0    failed=0    skipped=14   rescued=0    ignored=0

Comment 7 Brian Smith 2021-07-07 20:14:26 UTC
Did some more testing, and starting over, without the patch, had one of the RHEL 7 servers not show up after the metrics role completed:

[ansible@controlnode ~]$ pcp
Performance Co-Pilot configuration on controlnode.example.com:

 platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM
 timezone: UTC
 services: pmcd pmproxy
     pmcd: Version 5.2.5-4, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
 pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.20.03
           rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.20.03
           rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.20.03
           rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.20.03
     pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log
           rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log
           rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log
           rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log
           rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log


[ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server1/pmlogger.log 
Log for pmlogger on controlnode.example.com started Wed Jul  7 20:03:19 2021

pmlogger: Cannot open config file "config.rhel7-server1": No such file or directory

Log finished Wed Jul  7 20:03:22 2021

[ansible@controlnode ~]$ ls -al /var/lib/pcp/config/pmlogger/
total 164
drwxrwxr-x.  2 pcp  pcp    133 Jul  7 20:03 .
drwxr-xr-x. 10 root root   127 Jul  7 20:00 ..
-rw-r--r--.  1 pcp  pcp  41113 Jul  7 20:03 config.default
lrwxrwxrwx.  1 root root    45 Feb 19 09:11 config.pmstat -> ../../../../../etc/pcp/pmlogger/config.pmstat
-rw-r--r--.  1 pcp  pcp  39056 Jul  7 20:03 config.rhel7-server2
-rw-r--r--.  1 pcp  pcp  39165 Jul  7 20:03 config.rhel8-server1
-rw-r--r--.  1 pcp  pcp  39165 Jul  7 20:03 config.rhel8-server2

[ansible@controlnode ~]$ sudo systemctl restart pmlogger

[ansible@controlnode ~]$ ls -al /var/lib/pcp/config/pmlogger/
total 204
drwxrwxr-x.  2 pcp  pcp    161 Jul  7 20:12 .
drwxr-xr-x. 10 root root   127 Jul  7 20:00 ..
-rw-r--r--.  1 pcp  pcp  41113 Jul  7 20:03 config.default
lrwxrwxrwx.  1 root root    45 Feb 19 09:11 config.pmstat -> ../../../../../etc/pcp/pmlogger/config.pmstat
-rw-r--r--.  1 pcp  pcp  39056 Jul  7 20:12 config.rhel7-server1
-rw-r--r--.  1 pcp  pcp  39056 Jul  7 20:03 config.rhel7-server2
-rw-r--r--.  1 pcp  pcp  39165 Jul  7 20:03 config.rhel8-server1
-rw-r--r--.  1 pcp  pcp  39165 Jul  7 20:03 config.rhel8-server2

[ansible@controlnode ~]$ pcp
Performance Co-Pilot configuration on controlnode.example.com:

 platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64
 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM
 timezone: UTC
 services: pmcd pmproxy
     pmcd: Version 5.2.5-4, 12 agents, 6 clients
     pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2
           dm openmetrics
 pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.20.12
           rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210707.20.12
           rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.20.12
           rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.20.12
           rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.20.12
     pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log
           rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log
           rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log
           rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log
           rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log




Based on what I have seen, it appears that there are two issues:

1.  The issue for this BZ where the Grafana dashboard doesn't work after the metrics role was run (unless pmproxy is restarted).  This appears to be resolved with the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd

2.  What appears to be a separate issue, that after the metrics role runs, some of the servers don't show up under "pmlogger" section (until pmlogger is restarted).  I've only had this happen with the RHEL 7 clients.

Comment 8 Brian Smith 2021-07-07 21:05:03 UTC
Another data point, restored all my VM's to clean state, ran the metrics role again (without the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd patch), and the 2 RHEL 7 servers didn't show up in the pmlogger section again.  

The issue in this BZ appears to be resolved with https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd 

And I appear to have another unrelated issue with RHEL 7 servers intermittently not showing up in the "pmlogger" section after the metrics role is run, until pmlogger is restarted.  

Should I go ahead and create another BZ for this other issue? 

Thanks for all the help with this.

Comment 9 Nathan Scott 2021-07-20 04:19:14 UTC
Hi Brian,

Finally I've found time to setup some 8.5 and 7.9 VMs to attempt to test some of the problems here.  I hit the first problem of course, but then everything worked perfectly.  I suspect what we may be seeing here is a race condition with the pmcd service becoming fully available on the RHEL-7 nodes.  There's a few things that could be contributing.

Firstly, re-reading the blog post it occurs to me that the firewall change to open the pmcd port in particular is done *after* we run the metrics role ... I recommend re-ordering that part of the blog post to rule out that being a factor here (we definitely want the pmcd port 44321 unblocked before we attempt remote access).  It could certainly explain some of what we're seeing here, because of the async nature of pmlogger service start in particular (see below).

There are some subtleties to the way the pmlogger and pmie services are started.  Both services use a model of 'immediate completion' such that at the end of the 'pmlogger service start', for example, the services may still be in the process of starting (and as we add more monitored hosts, we are more likely to see some hosts not yet started immediately after 'pmlogger service start' - this process is continuing asynchronously in the background.  It has to be this way in order to scale up to large numbers of hosts (else, we'd "hang" bootup).

Further, in the case of pmlogger (but not pmie), it will generate the pmlogger configuration file as part of starting up each hosts logger by *probing* the remote host for available metrics when no config file exists (this is the default).  If that part of pmlogger service startup (probe) cannot connect, we cannot create a config file and the pmlogger for the host will (temporarily) fail.  This is not the end of the world, as the pmlogger_check service will ensure the pmlogger is started within the next few minutes.

Can you let me know if the firewall setup ordering change has any effect on that first remote pmlogger?  In my testing here everything else is working fine (I have no firewalls setup here though, which gives me hope this may be our smoking gun).

Thanks!

Comment 10 Brian Smith 2021-07-20 15:36:20 UTC
Hi Nathan,
I did some more testing this morning.  With the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd patch and also moving the firewall tasks to the top of the playbook, everything worked properly on two runs of the playbook (restoring the VM's back to a clean snapshot each time).  

The https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd is definitely needed.  I tried running the playbook with the firewall tasks at the top of the playbook, and without this patch, and ran in to the original issue in the description of the BZ.  

I'm working on getting the blog post updated so that it shows the firewall tasks at the top of the playbook, and will also still have the workaround of restarting pmlogger and pmproxy to workaround this BZ.  

Thanks for your help getting this resolved!

Comment 13 Nathan Scott 2021-07-20 23:02:56 UTC
> [...]
> I'm working on getting the blog post updated so that it shows the firewall tasks at the top of the playbook, and will also still have the workaround of restarting pmlogger and pmproxy to workaround this BZ.  

Perfect.

> Thanks for your help getting this resolved!

No problem.

Rich, Andreas is on PTO for two weeks - we wont hear from him in the ITM time frame set here.  I'll answer for him and say my earlier testing showed the change to have a positive impact (as Brian's observing too).

cheers.

Comment 23 errata-xmlrpc 2021-11-09 17:46:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (rhel-system-roles bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4159


Note You need to log in before you can comment on or make changes to this bug.