Bug 1919950
| Summary: | Remote pmloggers are not all started when many loggers are defined in the configuration | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Renaud Métrich <rmetrich> |
| Component: | pcp | Assignee: | Mark Goodwin <mgoodwin> |
| Status: | CLOSED DUPLICATE | QA Contact: | Jan Kurik <jkurik> |
| Severity: | medium | Docs Contact: | Apurva Bhide <abhide> |
| Priority: | medium | ||
| Version: | 8.3 | CC: | agerstmayr, jkurik, mgoodwin, nathans, patrickm |
| Target Milestone: | rc | Keywords: | Bugfix, Triaged |
| Target Release: | 8.0 | Flags: | pm-rhel:
mirror+
|
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-01 23:00:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I think I'm reproducing what the customer sees.
I defined 20 VMs on a host, pmlogger primary on the host and pmcd listening in the VMs.
System is: dell-r330-13.gsslab.brq.redhat.com (root/redhat)
VMs: vm-client[1-20].libvirt
All are installed with RHEL8.3 latest.
On host: /etc/pcp/pmlogger/control.d/remote
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
#Host P? S? directory args
vm-client1.libvirt n n PCP_LOG_DIR/pmlogger/remote/vm-client1.libvirt -r -T24h10m -c config.remote
vm-client2.libvirt n n PCP_LOG_DIR/pmlogger/remote/vm-client2.libvirt -r -T24h10m -c config.remote
:
vm-client20.libvirt n n PCP_LOG_DIR/pmlogger/remote/vm-client20.libvirt -r -T24h10m -c config.remote
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
TEST 1 - Starting pmlogger without any other tuning: service only gets a vm-client1.libvirt remote logger
# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-01-25 16:55:50 CET; 2min 42s ago
Docs: man:pmlogger(1)
Process: 58109 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
Main PID: 67796 (pmlogger)
Tasks: 2 (limit: 203219)
Memory: 12.0M
CGroup: /system.slice/pmlogger.service
├─67796 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %Y%m%>
└─67918 /usr/libexec/pcp/bin/pmlogger -h vm-client1.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
TEST 2 - Starting pmlogger after adding "$PMLOGGER_CHECK_SKIP_LOGCONF=yes" in /etc/pcp/pmlogger/control: service gets 6 remote loggers out of 20
# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-01-25 17:02:29 CET; 30s ago
Docs: man:pmlogger(1)
Process: 70097 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
Main PID: 70872 (pmlogger)
Tasks: 7 (limit: 203219)
Memory: 23.3M
CGroup: /system.slice/pmlogger.service
├─70872 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %Y%m%>
├─71077 /usr/libexec/pcp/bin/pmlogger -h vm-client1.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─71269 /usr/libexec/pcp/bin/pmlogger -h vm-client2.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─71469 /usr/libexec/pcp/bin/pmlogger -h vm-client3.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─71692 /usr/libexec/pcp/bin/pmlogger -h vm-client4.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─71913 /usr/libexec/pcp/bin/pmlogger -h vm-client5.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
└─72147 /usr/libexec/pcp/bin/pmlogger -h vm-client6.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
TEST 3 - Restarting pmlogger (still with "$PMLOGGER_CHECK_SKIP_LOGCONF=yes" in /etc/pcp/pmlogger/control): service gets ALL remote loggers
# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-01-25 17:03:41 CET; 15s ago
Docs: man:pmlogger(1)
Process: 76511 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
Main PID: 76887 (pmlogger)
Tasks: 21 (limit: 203219)
Memory: 56.5M
CGroup: /system.slice/pmlogger.service
├─76887 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %Y%m%>
├─77003 /usr/libexec/pcp/bin/pmlogger -h vm-client1.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77096 /usr/libexec/pcp/bin/pmlogger -h vm-client2.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77188 /usr/libexec/pcp/bin/pmlogger -h vm-client3.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77283 /usr/libexec/pcp/bin/pmlogger -h vm-client4.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77383 /usr/libexec/pcp/bin/pmlogger -h vm-client5.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77484 /usr/libexec/pcp/bin/pmlogger -h vm-client6.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77588 /usr/libexec/pcp/bin/pmlogger -h vm-client7.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77696 /usr/libexec/pcp/bin/pmlogger -h vm-client8.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77806 /usr/libexec/pcp/bin/pmlogger -h vm-client9.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
├─77919 /usr/libexec/pcp/bin/pmlogger -h vm-client10.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78035 /usr/libexec/pcp/bin/pmlogger -h vm-client11.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78158 /usr/libexec/pcp/bin/pmlogger -h vm-client12.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78280 /usr/libexec/pcp/bin/pmlogger -h vm-client13.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78405 /usr/libexec/pcp/bin/pmlogger -h vm-client14.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78536 /usr/libexec/pcp/bin/pmlogger -h vm-client15.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78668 /usr/libexec/pcp/bin/pmlogger -h vm-client16.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78805 /usr/libexec/pcp/bin/pmlogger -h vm-client17.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─78944 /usr/libexec/pcp/bin/pmlogger -h vm-client18.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
├─79088 /usr/libexec/pcp/bin/pmlogger -h vm-client19.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
└─79233 /usr/libexec/pcp/bin/pmlogger -h vm-client20.libvirt -r -T24h10m -c config.remote -m pmlogger_chec>
TEST 4 - Restarting pmlogger without any other tuning again ("$PMLOGGER_CHECK_SKIP_LOGCONF=yes" removed from /etc/pcp/pmlogger/control): service only gets a vm-client1.libvirt remote logger again
# systemctl status pmlogger
● pmlogger.service - Performance Metrics Archive Logger
Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2021-01-25 17:05:14 CET; 34s ago
Docs: man:pmlogger(1)
Process: 84544 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS)
Main PID: 84843 (pmlogger)
Tasks: 2 (limit: 203219)
Memory: 11.0M
CGroup: /system.slice/pmlogger.service
├─84843 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %Y%m%>
└─84967 /usr/libexec/pcp/bin/pmlogger -h vm-client1.libvirt -r -T24h10m -c config.remote -m pmlogger_check>
I can confirm that pcp-5.2.3-1.el8.x86_64.rpm fixes the issue. However with stock pcp-5.1.1-3.el8.x86_64, even specifying one dedicated config.remote per remote host doesn't make a difference: only one pmlogger gets spawned for the first remote host. Mark is back from PTO now and knows this code (and his recent fixes) better than I do - moving NEEDINFO to mgoodwin. |
Description of problem: A customer has numerous remote pmloggers configured in etc/pcp/pmlogger/control.d/remote: $ wc -l etc/pcp/pmlogger/control.d/remote 72 etc/pcp/pmlogger/control.d/remote Trying to deploy a RHEL8 system handling all these, the customer found out that only a few ones were starting (5 of them only): systemctl status pmlogger ● pmlogger.service - Performance Metrics Archive Logger Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-12-10 16:22:56 UTC; 11s ago Docs: man:pmlogger(1) Process: 12273 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS) Main PID: 12553 (pmlogger) Tasks: 6 (limit: 49088) Memory: 17.2M CGroup: /system.slice/pmlogger.service ├─12553 /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %Y%m%d.%H.%M ├─12626 /usr/libexec/pcp/bin/pmlogger -h CLIENT1 -r -T24h10m -c config.remote -m pmlogger_check %Y%m%d.%H.%M ├─12776 /usr/libexec/pcp/bin/pmlogger -h CLIENT2 -r -T24h10m -c config.remote -m pmlogger_check %Y%m%d.%H.%M ├─12929 /usr/libexec/pcp/bin/pmlogger -h CLIENT3 -r -T24h10m -c config.remote -m pmlogger_check %Y%m%d.%H.%M ├─13143 /usr/libexec/pcp/bin/pmlogger -h CLIENT4 -r -T24h10m -c config.remote -m pmlogger_check %Y%m%d.%H.%M └─13286 /usr/libexec/pcp/bin/pmlogger -h CLIENT5 -r -T24h10m -c config.remote -m pmlogger_check %Y%m%d.%H.%M Dec 10 16:22:55 XXX systemd[1]: Starting Performance Metrics Archive Logger... Dec 10 16:22:56 XXX systemd[1]: Started Performance Metrics Archive Logger. Dec 10 16:23:05 XXX pmlogger[12362]: Starting pmlogger ... This was working fine on RHEL7. The customer investigated and found out that using $PMLOGGER_CHECK_SKIP_LOGCONF=yes was necessary: with the variable set, everything was starting fine. Version-Release number of selected component (if applicable): pcp-5.1.1-3.el8.x86_64 How reproducible: Always on customer site. Didn't try, not having enough systems.