Bug 1978357
Summary: | metrics role: Grafana dashboard not working after metrics role run unless services manually restarted | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Brian Smith <briasmit> | |
Component: | rhel-system-roles | Assignee: | Rich Megginson <rmeggins> | |
Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> | |
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 8.4 | CC: | agerstmayr, djez, mcermak, mgoodwin, myllynen, nathans, nhosoi, spetrosi | |
Target Milestone: | beta | Keywords: | Triaged | |
Target Release: | 8.5 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | role:metrics | |||
Fixed In Version: | rhel-system-roles-1.6.4-1.el8 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1984150 (view as bug list) | Environment: | ||
Last Closed: | 2021-11-09 17:46:02 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1984150 |
Description
Brian Smith
2021-07-01 16:10:25 UTC
Nathan? | If I run "sudo systemctl restart pmcd pmlogger pmproxy" on the grafana host after the metrics role completes, everything works as expected. Just back from PTO and reading this one now Brian (thanks for the bug report), while reviewing your draft blog post (thanks for that too!). I'll have to set aside a bigger chunk of time to analyze it (its all running fine on my Fedora laptop, so I'll need to setup some RHEL hosts), but until then - do you have to restart all three of those services? Or does just restarting pmproxy suffice? Definitely pmproxy is not yet accepting connections (the "timeout while connecting" is Grafana trying to talk to pmproxy). What does the pcp(1) command say for you right after running the role - does it give "timeout"/"connection refused"? (pcp client tool talks to pmcd). Thanks. Mark and Andreas, are there any known RHEL 8.4 issues with pcp systemd service starting that might explain this behaviour? (was there a zstream update helping along those lines, from a vague memory?) Thanks. (In reply to Nathan Scott from comment #2) > Mark and Andreas, are there any known RHEL 8.4 issues with pcp systemd > service starting that might explain this behaviour? (was there a zstream > update helping along those lines, from a vague memory?) Thanks. Sounds like a timing issue to me. I'm wondering what can be the cause that pmproxy is not running - first thing I'd do is check the pmproxy logs and journal entry. Another thing to watch out: In the linux-system-roles, the Redis role is setup after the pcp role: https://github.com/linux-system-roles/metrics/blob/main/tasks/main.yml#L56-L73 So when pmproxy starts up for the first time, Redis is not available, and it will disable time-series functionality. Also, after running the role for the first time, all services are started because of - name: ... service: name: ... state: started enabled: yes and then (because of the updated config files) afaics all handlers are running as well, restarting all services again. I'm not sure how deterministic the order of restarts is there, and if it can cause any conflict. For the systemd unit, we had one recent change, making sure that pmproxy starts *after* Redis: https://github.com/performancecopilot/pcp/commit/44a3ecaa8b1dc5ab518d26d00bcafd5ebb29b3ef However, that only applies if both are started at the same time/in the same transaction (e.g. at boot), Redis is *not* a dependency of pmproxy and doesn't get started automatically when starting pmproxy. Hi Nathan and Andreas, Here are the answer to your questions. Starting from fresh RHEL systems, I run the metrics role, login to grafana, import the "PCP Redis: Host Overview" dashboard, and get the errors in the dashboard mentioned in comment #1. At this point, the output of pcp: [ansible@controlnode ~]$ pcp Performance Co-Pilot configuration on controlnode.example.com: platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM timezone: UTC services: pmcd pmproxy pmcd: Version 5.2.5-4, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210706.20.30 rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210706.20.30 rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210706.20.30 rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210706.20.30 rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210706.20.30 pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log ----------------------------- [ansible@controlnode ~]$ cat /var/log/pcp/pmproxy/pmproxy.log Log for pmproxy on controlnode.example.com started Tue Jul 6 20:23:12 2021 pmproxy: PID = 30921, PDU version = 2, user = pcp (993) pmproxy request port(s): sts fd port family address === ==== ===== ====== ======= ok 14 unix /run/pcp/pmproxy.socket ok 15 44322 inet INADDR_ANY ok 19 44322 ipv6 INADDR_ANY ok 20 44323 inet INADDR_ANY ok 21 44323 ipv6 INADDR_ANY [Tue Jul 6 20:23:12] pmproxy(30921) Info: OpenSSL 1.1.1g FIPS 21 Apr 2020 - no certificates found [Tue Jul 6 20:23:12] pmproxy(30921) Info: Redis slots, command keys, schema version setup Tue Jul 6 20:23:14 2021 discovery callback: log-rolling in progress Tue Jul 6 20:23:18 2021 discovery callback: finished log-rolling Tue Jul 6 20:23:19 2021 discovery callback: log-rolling in progress Tue Jul 6 20:23:27 2021 discovery callback: finished log-rolling Tue Jul 6 20:23:27 2021 discovery callback: log-rolling in progress Tue Jul 6 20:23:34 2021 discovery callback: finished log-rolling Tue Jul 6 20:23:34 2021 discovery callback: log-rolling in progress Tue Jul 6 20:23:39 2021 discovery callback: finished log-rolling Tue Jul 6 20:23:39 2021 discovery callback: log-rolling in progress Tue Jul 6 20:23:44 2021 discovery callback: finished log-rolling Tue Jul 6 20:25:24 2021 discovery callback: log-rolling in progress Tue Jul 6 20:25:30 2021 discovery callback: finished log-rolling Tue Jul 6 20:25:30 2021 discovery callback: log-rolling in progress Tue Jul 6 20:25:30 2021 discovery callback: finished log-rolling Tue Jul 6 20:25:30 2021 discovery callback: log-rolling in progress Tue Jul 6 20:25:30 2021 discovery callback: finished log-rolling Tue Jul 6 20:25:30 2021 discovery callback: log-rolling in progress Tue Jul 6 20:25:31 2021 discovery callback: finished log-rolling Tue Jul 6 20:30:17 2021 discovery callback: log-rolling in progress Tue Jul 6 20:30:20 2021 discovery callback: finished log-rolling Tue Jul 6 20:30:21 2021 discovery callback: log-rolling in progress Tue Jul 6 20:30:27 2021 discovery callback: finished log-rolling Tue Jul 6 20:30:27 2021 discovery callback: log-rolling in progress Tue Jul 6 20:30:33 2021 discovery callback: finished log-rolling Tue Jul 6 20:30:33 2021 discovery callback: log-rolling in progress Tue Jul 6 20:30:38 2021 discovery callback: finished log-rolling Tue Jul 6 20:30:38 2021 discovery callback: log-rolling in progress Tue Jul 6 20:30:42 2021 discovery callback: finished log-rolling ----------------------------- [ansible@controlnode ~]$ sudo journalctl -u pmproxy -- Logs begin at Mon 2021-06-28 18:20:06 UTC, end at Tue 2021-07-06 20:44:24 UTC. -- Jul 06 20:22:02 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jul 06 20:22:02 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon. Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon... Jul 06 20:23:12 controlnode.example.com systemd[1]: pmproxy.service: Succeeded. Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon. Jul 06 20:23:12 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jul 06 20:23:12 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon. ----------------------------- At this point, I tried to only restart pmproxy, and refresh the "PCP Redis: Host Overview" dashboard, and the dashboard is working at this point (can see graphs from all 5 hosts). ----------------------------- Checking the pmproxy.log at this point, now that the dashboard is working: [ansible@controlnode ~]$ cat /var/log/pcp/pmproxy/pmproxy.log Log for pmproxy on controlnode.example.com started Tue Jul 6 20:46:11 2021 pmproxy: PID = 36478, PDU version = 2, user = pcp (993) pmproxy request port(s): sts fd port family address === ==== ===== ====== ======= ok 14 unix /run/pcp/pmproxy.socket ok 15 44322 inet INADDR_ANY ok 19 44322 ipv6 INADDR_ANY ok 20 44323 inet INADDR_ANY ok 21 44323 ipv6 INADDR_ANY [Tue Jul 6 20:46:11] pmproxy(36478) Info: OpenSSL 1.1.1g FIPS 21 Apr 2020 - no certificates found [Tue Jul 6 20:46:11] pmproxy(36478) Info: Redis slots, command keys, schema version setup Tue Jul 6 20:55:24 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:24 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:24 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:24 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:24 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:24 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:24 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:25 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:25 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:25 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:25 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:25 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:25 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:25 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:25 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:25 2021 discovery callback: finished log-rolling Tue Jul 6 20:55:26 2021 discovery callback: log-rolling in progress Tue Jul 6 20:55:26 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:24 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:24 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:24 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:24 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:24 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:24 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:24 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:25 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:25 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:25 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:25 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:25 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:25 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:25 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:25 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:25 2021 discovery callback: finished log-rolling Tue Jul 6 21:25:26 2021 discovery callback: log-rolling in progress Tue Jul 6 21:25:26 2021 discovery callback: finished log-rolling -------------------------------- And the journal: $ sudo journalctl -u pmproxy -- Logs begin at Mon 2021-06-28 18:20:06 UTC, end at Tue 2021-07-06 21:28:53 UTC. -- Jul 06 20:22:02 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jul 06 20:22:02 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon. Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon... Jul 06 20:23:12 controlnode.example.com systemd[1]: pmproxy.service: Succeeded. Jul 06 20:23:12 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon. Jul 06 20:23:12 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jul 06 20:23:12 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon. Jul 06 20:46:11 controlnode.example.com systemd[1]: Stopping Proxy for Performance Metrics Collector Daemon... Jul 06 20:46:11 controlnode.example.com systemd[1]: pmproxy.service: Succeeded. Jul 06 20:46:11 controlnode.example.com systemd[1]: Stopped Proxy for Performance Metrics Collector Daemon. Jul 06 20:46:11 controlnode.example.com systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Jul 06 20:46:11 controlnode.example.com systemd[1]: Started Proxy for Performance Metrics Collector Daemon. | [...] making sure that pmproxy starts *after* Redis I've tweaked the metrics role to ensure this ordering is enforced - maybe this is the issue, on a fresh install (good thinking, Batman^WAndreas). Brian, any chance you can make the trivial change locally that I've committed here, to try it out? https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd cheers. Hi Nathan, I tried out the changes in the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd commit, and it fixed the initial issue, but appeared to introduce a different issue. I tried this out two times (both times restoring my 5 VM's from a clean snapshot), and got these exact results both times. After the metrics role ran with your patch, I was able to access the "PCP Redis: Host Overview" grafana dashboard with no errors However, in my environment, I have the controlnode, two RHEL 8 clients, and two RHEL 7 clients. The RHEL 8 clients showed up in Grafana, but the RHEL 7 clients did not. PCP command output is also missing the RHEL 7 servers under 'pmlogger': [ansible@controlnode pcp]$ pcp Performance Co-Pilot configuration on controlnode.example.com: platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM timezone: UTC services: pmcd pmproxy pmcd: Version 5.2.5-4, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.12.59 rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.13.00 rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.13.00 pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log --------------------------- Log files: [ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server1/pmlogger.log Log for pmlogger on controlnode.example.com started Wed Jul 7 12:59:58 2021 pmlogger: Cannot connect to PMCD on host "rhel7-server1": Connection refused Log finished Wed Jul 7 12:59:59 2021 [ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server2/pmlogger.log Log for pmlogger on controlnode.example.com started Wed Jul 7 13:00:05 2021 pmlogger: Cannot open config file "config.rhel7-server2": No such file or directory Log finished Wed Jul 7 13:00:05 2021 --------------------------- [ansible@controlnode ~]$ ls -al /etc/pcp/pmlogger/control.d/ total 20 drwxr-xr-x. 2 root root 103 Jul 7 12:58 . drwxr-xr-x. 4 root root 74 Jul 7 12:57 .. -rw-r--r--. 1 root root 695 Feb 19 09:11 local -rw-r--r--. 1 root root 136 Jul 7 12:58 rhel7-server1 -rw-r--r--. 1 root root 136 Jul 7 12:58 rhel7-server2 -rw-r--r--. 1 root root 136 Jul 7 12:58 rhel8-server1 -rw-r--r--. 1 root root 136 Jul 7 12:58 rhel8-server2 --------------------------- If I restart pmlogger, the RHEL 7 servers show up in pcp output and grafana dashboard, and everything is working: [ansible@controlnode ~]$ sudo systemctl restart pmlogger <wait a minute> [ansible@controlnode ~]$ pcp Performance Co-Pilot configuration on controlnode.example.com: platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM timezone: UTC services: pmcd pmproxy pmcd: Version 5.2.5-4, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.13.09 rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210707.13.09 rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.13.09 rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.13.09 rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.13.09 pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log ---------------------------- If it helps, here is the output from the original metrics playbook run: [ansible@controlnode metrics]$ ansible-playbook metrics.yml -i inventory/ -b PLAY [Use metrics system role to configure PCP metrics recording] ************************************************************************************************************************************************************************************************************************************************************ TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************************************************************************************************************* ok: [rhel7-server2] ok: [rhel7-server1] ok: [rhel8-server2] ok: [rhel8-server1] TASK [redhat.rhel_system_roles.metrics : Add Elasticsearch to metrics domain list] ******************************************************************************************************************************************************************************************************************************************* skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [redhat.rhel_system_roles.metrics : Add SQL Server to metrics domain list] ********************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [redhat.rhel_system_roles.metrics : Add bpftrace to metrics domain list] ************************************************************************************************************************************************************************************************************************************************ skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [redhat.rhel_system_roles.metrics : Setup metrics access for roles] ***************************************************************************************************************************************************************************************************************************************************** ok: [rhel8-server1] ok: [rhel8-server2] ok: [rhel7-server1] ok: [rhel7-server2] TASK [Configure Elasticsearch metrics] *************************************************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [Configure SQL Server metrics.] ***************************************************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [Setup bpftrace metrics.] *********************************************************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [Setup metric querying service.] **************************************************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [Setup metric collection service.] ************************************************************************************************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set platform/version specific variables] ************************************************************************************************************************************************************************************************************************ ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml) skipping: [rhel8-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml) skipping: [rhel8-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.yml) skipping: [rhel7-server1] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.9.yml) ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.yml) skipping: [rhel7-server2] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_7.9.yml) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install Performance Co-Pilot packages] ************************************************************************************************************************************************************************************************************************** changed: [rhel8-server2] changed: [rhel8-server1] changed: [rhel7-server2] changed: [rhel7-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install authentication packages] ******************************************************************************************************************************************************************************************************************************** changed: [rhel8-server1] changed: [rhel7-server2] changed: [rhel8-server2] changed: [rhel7-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmcd.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2 TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : List optional metric collection agents to be enabled] *********************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Extract metric collection configuration file content] *********************************************************************************************************************************************************************************************************** ok: [rhel7-server1] ok: [rhel7-server2] ok: [rhel8-server1] ok: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure optional metric collection agents are enabled] *********************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure explicit metric label path exists] *********************************************************************************************************************************************************************************************************************** ok: [rhel7-server2] ok: [rhel7-server1] ok: [rhel8-server1] ok: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure implicit metric label path exists] *********************************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel7-server2] ok: [rhel8-server1] ok: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any explicit metric labels are configured] *************************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel7-server2] changed: [rhel8-server1] changed: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any implicit metric labels are configured] *************************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel8-server1] changed: [rhel7-server2] changed: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is configured] ************************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel8-server1] changed: [rhel7-server2] changed: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector system accounts are configured] ********************************************************************************************************************************************************************************************* changed: [rhel7-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) changed: [rhel7-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) changed: [rhel8-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) changed: [rhel8-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector SASL accounts are configured] *********************************************************************************************************************************************************************************************** ok: [rhel7-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) ok: [rhel8-server1] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) ok: [rhel7-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) ok: [rhel8-server2] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector authentication is configured] *********************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel7-server2] changed: [rhel8-server2] changed: [rhel8-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set variable to do pmcd restart if needed] ********************************************************************************************************************************************************************************************************************** ok: [rhel8-server1] ok: [rhel8-server2] ok: [rhel7-server1] ok: [rhel7-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Report performance metric collector restart state] ************************************************************************************************************************************************************************************************************** ok: [rhel8-server1] => { "msg": [ "optional_agents: False", "explicit_labels: True", "implicit_labels: True", "defaults_config: True", "authentication: True", "restart_pmcd: True" ] } ok: [rhel8-server2] => { "msg": [ "optional_agents: False", "explicit_labels: True", "implicit_labels: True", "defaults_config: True", "authentication: True", "restart_pmcd: True" ] } ok: [rhel7-server1] => { "msg": [ "optional_agents: False", "explicit_labels: True", "implicit_labels: True", "defaults_config: True", "authentication: True", "restart_pmcd: True" ] } ok: [rhel7-server2] => { "msg": [ "optional_agents: False", "explicit_labels: True", "implicit_labels: True", "defaults_config: True", "authentication: True", "restart_pmcd: True" ] } TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is running and enabled on boot] ********************************************************************************************************************************************************************************************* skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is restarted and enabled on boot] ******************************************************************************************************************************************************************************************* changed: [rhel7-server1] changed: [rhel7-server2] changed: [rhel8-server2] changed: [rhel8-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmie.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2 TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group directories exist] ********************************************************************************************************************************************************************************************************** changed: [rhel7-server1] => (item=network) changed: [rhel7-server2] => (item=network) ok: [rhel8-server1] => (item=network) ok: [rhel8-server2] => (item=network) changed: [rhel7-server2] => (item=zeroconf) changed: [rhel7-server1] => (item=zeroconf) ok: [rhel8-server1] => (item=zeroconf) ok: [rhel8-server2] => (item=zeroconf) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group link directories exist] ***************************************************************************************************************************************************************************************************** changed: [rhel7-server2] => (item=network) changed: [rhel7-server1] => (item=network) ok: [rhel8-server1] => (item=network) ok: [rhel8-server2] => (item=network) ok: [rhel7-server2] => (item=zeroconf) ok: [rhel7-server1] => (item=zeroconf) ok: [rhel8-server1] => (item=zeroconf) ok: [rhel8-server2] => (item=zeroconf) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rules are installed for targeted hosts] ************************************************************************************************************************************************************************************************ changed: [rhel7-server1] => (item=network/tcplistenoverflows) changed: [rhel7-server2] => (item=network/tcplistenoverflows) changed: [rhel8-server2] => (item=network/tcplistenoverflows) changed: [rhel8-server1] => (item=network/tcplistenoverflows) changed: [rhel7-server1] => (item=network/tcpqfulldocookies) changed: [rhel7-server2] => (item=network/tcpqfulldocookies) changed: [rhel8-server2] => (item=network/tcpqfulldocookies) changed: [rhel8-server1] => (item=network/tcpqfulldocookies) changed: [rhel7-server1] => (item=network/tcpqfulldrops) changed: [rhel7-server2] => (item=network/tcpqfulldrops) changed: [rhel8-server2] => (item=network/tcpqfulldrops) changed: [rhel8-server1] => (item=network/tcpqfulldrops) changed: [rhel7-server1] => (item=zeroconf/all_threads) changed: [rhel7-server2] => (item=zeroconf/all_threads) changed: [rhel8-server2] => (item=zeroconf/all_threads) changed: [rhel8-server1] => (item=zeroconf/all_threads) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra rules symlinks have been created for targeted hosts] *********************************************************************************************************************************************************************************************** changed: [rhel7-server1] => (item=network/tcplistenoverflows) changed: [rhel7-server2] => (item=network/tcplistenoverflows) changed: [rhel8-server1] => (item=network/tcplistenoverflows) changed: [rhel8-server2] => (item=network/tcplistenoverflows) changed: [rhel7-server2] => (item=network/tcpqfulldocookies) changed: [rhel7-server1] => (item=network/tcpqfulldocookies) changed: [rhel8-server1] => (item=network/tcpqfulldocookies) changed: [rhel8-server2] => (item=network/tcpqfulldocookies) changed: [rhel7-server2] => (item=network/tcpqfulldrops) changed: [rhel7-server1] => (item=network/tcpqfulldrops) changed: [rhel8-server1] => (item=network/tcpqfulldrops) changed: [rhel8-server2] => (item=network/tcpqfulldrops) changed: [rhel7-server2] => (item=zeroconf/all_threads) changed: [rhel7-server1] => (item=zeroconf/all_threads) changed: [rhel8-server2] => (item=zeroconf/all_threads) changed: [rhel8-server1] => (item=zeroconf/all_threads) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is enabled for targeted hosts] ********************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is running and enabled on boot] ********************************************************************************************************************************************************************************************* ok: [rhel7-server1] ok: [rhel7-server2] ok: [rhel8-server2] ok: [rhel8-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmlogger.yml for rhel8-server1, rhel8-server2, rhel7-server1, rhel7-server2 TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure metric log location is configured] *********************************************************************************************************************************************************************************************************************** changed: [rhel7-server2] changed: [rhel7-server1] ok: [rhel8-server1] ok: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is configured] **************************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel7-server2] changed: [rhel8-server1] changed: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging retention period is set] ****************************************************************************************************************************************************************************************************** changed: [rhel7-server1] changed: [rhel7-server2] changed: [rhel8-server2] changed: [rhel8-server1] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is enabled for targeted hosts] ************************************************************************************************************************************************************************************************ TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is running and enabled on boot] *********************************************************************************************************************************************************************************************** ok: [rhel7-server1] ok: [rhel8-server1] ok: [rhel7-server2] ok: [rhel8-server2] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] TASK [Setup metric graphing service.] **************************************************************************************************************************************************************************************************************************************************************************************** skipping: [rhel8-server1] skipping: [rhel8-server2] skipping: [rhel7-server1] skipping: [rhel7-server2] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmproxy] ************************************************************************************************************************************************************************************************************************************* changed: [rhel7-server2] changed: [rhel7-server1] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmlogger] ************************************************************************************************************************************************************************************************************************************ changed: [rhel8-server1] changed: [rhel8-server2] changed: [rhel7-server1] changed: [rhel7-server2] PLAY [Use metrics system role to configure Grafana] ************************************************************************************************************************************************************************************************************************************************************************** TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************************************************************************************************************* ok: [controlnode] TASK [redhat.rhel_system_roles.metrics : Add Elasticsearch to metrics domain list] ******************************************************************************************************************************************************************************************************************************************* skipping: [controlnode] TASK [redhat.rhel_system_roles.metrics : Add SQL Server to metrics domain list] ********************************************************************************************************************************************************************************************************************************************** skipping: [controlnode] TASK [redhat.rhel_system_roles.metrics : Add bpftrace to metrics domain list] ************************************************************************************************************************************************************************************************************************************************ skipping: [controlnode] TASK [redhat.rhel_system_roles.metrics : Setup metrics access for roles] ***************************************************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [Configure Elasticsearch metrics] *************************************************************************************************************************************************************************************************************************************************************************************** skipping: [controlnode] TASK [Configure SQL Server metrics.] ***************************************************************************************************************************************************************************************************************************************************************************************** skipping: [controlnode] TASK [Setup bpftrace metrics.] *********************************************************************************************************************************************************************************************************************************************************************************************** skipping: [controlnode] TASK [Setup metric querying service.] **************************************************************************************************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Set platform/version specific variables] ********************************************************************************************************************************************************************************************************************** ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat_8.yml) skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/vars/RedHat_8.4.yml) TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Install Redis packages] *************************************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Ensure Redis service is configured] *************************************************************************************************************************************************************************************************************************** changed: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_redis/templates/RedHat_8_redis.conf.j2) TASK [redhat.rhel_system_roles.private_metrics_subrole_redis : Ensure Redis service is running and enabled on boot] ********************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [Setup metric collection service.] ************************************************************************************************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set platform/version specific variables] ************************************************************************************************************************************************************************************************************************ ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.yml) skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/vars/RedHat_8.4.yml) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install Performance Co-Pilot packages] ************************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Install authentication packages] ******************************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmcd.yml for controlnode TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : List optional metric collection agents to be enabled] *********************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Extract metric collection configuration file content] *********************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure optional metric collection agents are enabled] *********************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure explicit metric label path exists] *********************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure implicit metric label path exists] *********************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any explicit metric labels are configured] *************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure any implicit metric labels are configured] *************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is configured] ************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector system accounts are configured] ********************************************************************************************************************************************************************************************* changed: [controlnode] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector SASL accounts are configured] *********************************************************************************************************************************************************************************************** ok: [controlnode] => (item={'user': 'metrics', 'sasluser': 'metrics', 'saslpassword': 'metrics'}) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector authentication is configured] *********************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Set variable to do pmcd restart if needed] ********************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Report performance metric collector restart state] ************************************************************************************************************************************************************************************************************** ok: [controlnode] => { "msg": [ "optional_agents: False", "explicit_labels: True", "implicit_labels: True", "defaults_config: True", "authentication: True", "restart_pmcd: True" ] } TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is running and enabled on boot] ********************************************************************************************************************************************************************************************* skipping: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric collector is restarted and enabled on boot] ******************************************************************************************************************************************************************************************* changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmie.yml for controlnode TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group directories exist] ********************************************************************************************************************************************************************************************************** ok: [controlnode] => (item=network) ok: [controlnode] => (item=zeroconf) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rule group link directories exist] ***************************************************************************************************************************************************************************************************** ok: [controlnode] => (item=network) ok: [controlnode] => (item=zeroconf) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra performance rules are installed for targeted hosts] ************************************************************************************************************************************************************************************************ changed: [controlnode] => (item=network/tcplistenoverflows) changed: [controlnode] => (item=network/tcpqfulldocookies) changed: [controlnode] => (item=network/tcpqfulldrops) changed: [controlnode] => (item=zeroconf/all_threads) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure extra rules symlinks have been created for targeted hosts] *********************************************************************************************************************************************************************************************** changed: [controlnode] => (item=network/tcplistenoverflows) changed: [controlnode] => (item=network/tcpqfulldocookies) changed: [controlnode] => (item=network/tcpqfulldrops) changed: [controlnode] => (item=zeroconf/all_threads) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is enabled for targeted hosts] ********************************************************************************************************************************************************************************************** changed: [controlnode] => (item=rhel8-server1) changed: [controlnode] => (item=rhel8-server2) changed: [controlnode] => (item=rhel7-server1) changed: [controlnode] => (item=rhel7-server2) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric inference is running and enabled on boot] ********************************************************************************************************************************************************************************************* ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmlogger.yml for controlnode TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure metric log location is configured] *********************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is configured] **************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging retention period is set] ****************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is enabled for targeted hosts] ************************************************************************************************************************************************************************************************ changed: [controlnode] => (item=rhel8-server1) changed: [controlnode] => (item=rhel8-server2) changed: [controlnode] => (item=rhel7-server1) changed: [controlnode] => (item=rhel7-server2) TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure performance metric logging is running and enabled on boot] *********************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : include_tasks] ************************************************************************************************************************************************************************************************************************************************** included: /usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_pcp/tasks/pmproxy.yml for controlnode TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure REST API, proxy and metric log discovery is configured] ************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_pcp : Ensure REST API, proxy and log discovery is running and enabled on boot] **************************************************************************************************************************************************************************************** changed: [controlnode] TASK [Setup metric graphing service.] **************************************************************************************************************************************************************************************************************************************************************************************** TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Set platform/version specific variables] ******************************************************************************************************************************************************************************************************************** ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat.yml) ok: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat_8.yml) skipping: [controlnode] => (item=/usr/share/ansible/collections/ansible_collections/redhat/rhel_system_roles/roles/private_metrics_subrole_grafana/vars/RedHat_8.4.yml) TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Install Grafana packages] *********************************************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Template Grafana configuration] ***************************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure Grafana configuration directory exists] ************************************************************************************************************************************************************************************************************** ok: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure Grafana service is configured with datasources] ****************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure graphing service is running and enabled on boot] ***************************************************************************************************************************************************************************************************** changed: [controlnode] TASK [redhat.rhel_system_roles.private_metrics_subrole_grafana : Ensure graphing service runtime settings are configured] **************************************************************************************************************************************************************************************************** ok: [controlnode] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_redis : restart redis] ************************************************************************************************************************************************************************************************************************************* changed: [controlnode] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmie] **************************************************************************************************************************************************************************************************************************************** changed: [controlnode] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmproxy] ************************************************************************************************************************************************************************************************************************************* changed: [controlnode] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_pcp : restart pmlogger] ************************************************************************************************************************************************************************************************************************************ changed: [controlnode] RUNNING HANDLER [redhat.rhel_system_roles.private_metrics_subrole_grafana : restart grafana] ********************************************************************************************************************************************************************************************************************************* changed: [controlnode] PLAY [Open Firewall for pmcd] ************************************************************************************************************************************************************************************************************************************************************************************************ TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************************************************************************************************************* ok: [rhel8-server1] ok: [rhel7-server1] ok: [rhel8-server2] ok: [rhel7-server2] TASK [firewalld] ************************************************************************************************************************************************************************************************************************************************************************************************************* changed: [rhel8-server2] changed: [rhel8-server1] changed: [rhel7-server2] changed: [rhel7-server1] PLAY [Open Firewall for grafana] ********************************************************************************************************************************************************************************************************************************************************************************************* TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************************************************************************************************************* ok: [controlnode] TASK [firewalld] ************************************************************************************************************************************************************************************************************************************************************************************************************* changed: [controlnode] PLAY RECAP ******************************************************************************************************************************************************************************************************************************************************************************************************************* controlnode : ok=52 changed=28 unreachable=0 failed=0 skipped=9 rescued=0 ignored=0 rhel7-server1 : ok=33 changed=19 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0 rhel7-server2 : ok=33 changed=19 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0 rhel8-server1 : ok=32 changed=14 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0 rhel8-server2 : ok=32 changed=14 unreachable=0 failed=0 skipped=14 rescued=0 ignored=0 Did some more testing, and starting over, without the patch, had one of the RHEL 7 servers not show up after the metrics role completed: [ansible@controlnode ~]$ pcp Performance Co-Pilot configuration on controlnode.example.com: platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM timezone: UTC services: pmcd pmproxy pmcd: Version 5.2.5-4, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.20.03 rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.20.03 rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.20.03 rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.20.03 pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log [ansible@controlnode ~]$ cat /var/log/pcp/pmlogger/rhel7-server1/pmlogger.log Log for pmlogger on controlnode.example.com started Wed Jul 7 20:03:19 2021 pmlogger: Cannot open config file "config.rhel7-server1": No such file or directory Log finished Wed Jul 7 20:03:22 2021 [ansible@controlnode ~]$ ls -al /var/lib/pcp/config/pmlogger/ total 164 drwxrwxr-x. 2 pcp pcp 133 Jul 7 20:03 . drwxr-xr-x. 10 root root 127 Jul 7 20:00 .. -rw-r--r--. 1 pcp pcp 41113 Jul 7 20:03 config.default lrwxrwxrwx. 1 root root 45 Feb 19 09:11 config.pmstat -> ../../../../../etc/pcp/pmlogger/config.pmstat -rw-r--r--. 1 pcp pcp 39056 Jul 7 20:03 config.rhel7-server2 -rw-r--r--. 1 pcp pcp 39165 Jul 7 20:03 config.rhel8-server1 -rw-r--r--. 1 pcp pcp 39165 Jul 7 20:03 config.rhel8-server2 [ansible@controlnode ~]$ sudo systemctl restart pmlogger [ansible@controlnode ~]$ ls -al /var/lib/pcp/config/pmlogger/ total 204 drwxrwxr-x. 2 pcp pcp 161 Jul 7 20:12 . drwxr-xr-x. 10 root root 127 Jul 7 20:00 .. -rw-r--r--. 1 pcp pcp 41113 Jul 7 20:03 config.default lrwxrwxrwx. 1 root root 45 Feb 19 09:11 config.pmstat -> ../../../../../etc/pcp/pmlogger/config.pmstat -rw-r--r--. 1 pcp pcp 39056 Jul 7 20:12 config.rhel7-server1 -rw-r--r--. 1 pcp pcp 39056 Jul 7 20:03 config.rhel7-server2 -rw-r--r--. 1 pcp pcp 39165 Jul 7 20:03 config.rhel8-server1 -rw-r--r--. 1 pcp pcp 39165 Jul 7 20:03 config.rhel8-server2 [ansible@controlnode ~]$ pcp Performance Co-Pilot configuration on controlnode.example.com: platform: Linux controlnode.example.com 4.18.0-305.el8.x86_64 #1 SMP Thu Apr 29 08:54:30 EDT 2021 x86_64 hardware: 1 cpu, 1 disk, 1 node, 1935MB RAM timezone: UTC services: pmcd pmproxy pmcd: Version 5.2.5-4, 12 agents, 6 clients pmda: root pmcd proc pmproxy xfs linux nfsclient mmv kvm jbd2 dm openmetrics pmlogger: primary logger: /var/log/pcp/pmlogger/controlnode.example.com/20210707.20.12 rhel7-server1.example.com: /var/log/pcp/pmlogger/rhel7-server1/20210707.20.12 rhel7-server2.example.com: /var/log/pcp/pmlogger/rhel7-server2/20210707.20.12 rhel8-server1.example.com: /var/log/pcp/pmlogger/rhel8-server1/20210707.20.12 rhel8-server2.example.com: /var/log/pcp/pmlogger/rhel8-server2/20210707.20.12 pmie: primary engine: /var/log/pcp/pmie/controlnode.example.com/pmie.log rhel7-server1.example.com: /var/log/pcp/pmie/rhel7-server1/pmie.log rhel7-server2.example.com: /var/log/pcp/pmie/rhel7-server2/pmie.log rhel8-server1.example.com: /var/log/pcp/pmie/rhel8-server1/pmie.log rhel8-server2.example.com: /var/log/pcp/pmie/rhel8-server2/pmie.log Based on what I have seen, it appears that there are two issues: 1. The issue for this BZ where the Grafana dashboard doesn't work after the metrics role was run (unless pmproxy is restarted). This appears to be resolved with the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd 2. What appears to be a separate issue, that after the metrics role runs, some of the servers don't show up under "pmlogger" section (until pmlogger is restarted). I've only had this happen with the RHEL 7 clients. Another data point, restored all my VM's to clean state, ran the metrics role again (without the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd patch), and the 2 RHEL 7 servers didn't show up in the pmlogger section again. The issue in this BZ appears to be resolved with https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd And I appear to have another unrelated issue with RHEL 7 servers intermittently not showing up in the "pmlogger" section after the metrics role is run, until pmlogger is restarted. Should I go ahead and create another BZ for this other issue? Thanks for all the help with this. Hi Brian, Finally I've found time to setup some 8.5 and 7.9 VMs to attempt to test some of the problems here. I hit the first problem of course, but then everything worked perfectly. I suspect what we may be seeing here is a race condition with the pmcd service becoming fully available on the RHEL-7 nodes. There's a few things that could be contributing. Firstly, re-reading the blog post it occurs to me that the firewall change to open the pmcd port in particular is done *after* we run the metrics role ... I recommend re-ordering that part of the blog post to rule out that being a factor here (we definitely want the pmcd port 44321 unblocked before we attempt remote access). It could certainly explain some of what we're seeing here, because of the async nature of pmlogger service start in particular (see below). There are some subtleties to the way the pmlogger and pmie services are started. Both services use a model of 'immediate completion' such that at the end of the 'pmlogger service start', for example, the services may still be in the process of starting (and as we add more monitored hosts, we are more likely to see some hosts not yet started immediately after 'pmlogger service start' - this process is continuing asynchronously in the background. It has to be this way in order to scale up to large numbers of hosts (else, we'd "hang" bootup). Further, in the case of pmlogger (but not pmie), it will generate the pmlogger configuration file as part of starting up each hosts logger by *probing* the remote host for available metrics when no config file exists (this is the default). If that part of pmlogger service startup (probe) cannot connect, we cannot create a config file and the pmlogger for the host will (temporarily) fail. This is not the end of the world, as the pmlogger_check service will ensure the pmlogger is started within the next few minutes. Can you let me know if the firewall setup ordering change has any effect on that first remote pmlogger? In my testing here everything else is working fine (I have no firewalls setup here though, which gives me hope this may be our smoking gun). Thanks! Hi Nathan, I did some more testing this morning. With the https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd patch and also moving the firewall tasks to the top of the playbook, everything worked properly on two runs of the playbook (restoring the VM's back to a clean snapshot each time). The https://github.com/linux-system-roles/metrics/commit/06f9aa59739cb421056a30ee3edb5276650943cd is definitely needed. I tried running the playbook with the firewall tasks at the top of the playbook, and without this patch, and ran in to the original issue in the description of the BZ. I'm working on getting the blog post updated so that it shows the firewall tasks at the top of the playbook, and will also still have the workaround of restarting pmlogger and pmproxy to workaround this BZ. Thanks for your help getting this resolved! > [...] > I'm working on getting the blog post updated so that it shows the firewall tasks at the top of the playbook, and will also still have the workaround of restarting pmlogger and pmproxy to workaround this BZ. Perfect. > Thanks for your help getting this resolved! No problem. Rich, Andreas is on PTO for two weeks - we wont hear from him in the ITM time frame set here. I'll answer for him and say my earlier testing showed the change to have a positive impact (as Brian's observing too). cheers. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (rhel-system-roles bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4159 |