|Summary:||[RFE] Improve pcp-zeroconf for large cpu servers|
|Product:||Red Hat Enterprise Linux 7||Reporter:||Welterlen Benoit <bwelterl>|
|Component:||pcp||Assignee:||Nathan Scott <nathans>|
|Status:||CLOSED ERRATA||QA Contact:||Michal Kolar <mkolar>|
|Version:||7.5||CC:||agerstmayr, jentrena, mgoodwin, mkolar, nathans, peter.vreman, tbowling|
|Fixed In Version:||Doc Type:||If docs needed, set a value|
|Doc Text:||Story Points:||---|
|Last Closed:||2019-08-06 12:48:17 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1565370|
Description Welterlen Benoit 2018-08-06 14:12:09 UTC
Description of problem: The current pcp-zeroconf has proc.psinfo gathered every 10 seconds. This generates 20-30GB per day on a big machine, and the .meta being 7-9GB that is never compressed also not during archiving. With large cpu systems the proc.psinfo gather includes per CPU ~ 7 kernel threads. For a 384 cpu servers that means 2688 processes are included. ----------- proc.nprocs value 2761 -------- The PCP has the hotproc feature to limit the information. --------- cat /var/lib/pcp/pmdas/proc/samplehotproc.conf #pmdahotproc Version 1.0 uname != "root" || cpuburn > 0.05 sudo cp /var/lib/pcp/pmdas/proc/samplehotproc.conf /var/lib/pcp/pmdas/proc/hotproc.conf sudo service pmcd restart Redirecting to /bin/systemctl restart pmcd.service pminfo -f proc.nprocs hotproc.nprocs proc.nprocs value 2762 hotproc.nprocs value 16 ----------- Before hotproc with standard pcp-zeroconf using 'proc': ----------- tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log proc.psinfo.pid } logged every 10 sec: 1553280 bytes or 12798.63 Mbytes/day ----------- After using enable the sampe hotproc and updating the zeroconf provided atop-proc replacing 'proc.' with 'hotproc.': ----------- sudo sed 's/^\([[:space:]]\)proc\./\1hotproc./' -i /var/lib/pcp/config/pmlogconf/puppet/atop-proc sudo service pmlogger restart Redirecting to /bin/systemctl restart pmlogger.service tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log hotproc.psinfo.pid proc.runq.blocked proc.runq.runnable proc.nprocs } logged every 10 sec: 243124 bytes or 2003.28 Mbytes/day ----------- Is it possible to improve the out-of-the-box Recommended/BestPractice pcp-zeroconf configuration to enable the (sample) hotproc instead and the atop-proc pmlogger configuration to use hotproc. Version-Release number of selected component (if applicable): RHEL 7 How reproducible: Easy, always Steps to Reproduce: 1.Use the zeroconf that advice proc and not hotproc 2.Use a lot of processes or a big machine 3.Check the size used by the gathered data Actual results: 12 GB logged every day All process monitored Expected results: By default, zero-conf should be set to hotproc and not proc, unless performance reason 2 GB and only noticeable processes should be monitored Additional info:
Comment 3 Nathan Scott 2018-08-07 06:25:06 UTC
| This generates 20-30GB per day on a big machine, and the .meta being | 7-9GB that is never compressed also not during archiving. There has been much progress in this area in the PCP (rebase) in RHEL 7.6. We do now compress .meta files each day. We also have a new strategy around the data volumes, which are compressed during the day soon after each data volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is configurable).
Comment 4 Mark Goodwin 2018-08-07 11:27:56 UTC
In addition, PCP archives compress very well, typically 10:1 - so that 20-30GB per day should reduce to 2 to 3 GB/day, which is in the realm of the original expected results specified in Comment #0
Comment 6 Michal Kolar 2019-06-18 12:11:41 UTC
(In reply to Nathan Scott from comment #3) > | This generates 20-30GB per day on a big machine, and the .meta being > | 7-9GB that is never compressed also not during archiving. > > There has been much progress in this area in the PCP (rebase) in RHEL 7.6. > > We do now compress .meta files each day. We also have a new strategy around > the data volumes, which are compressed during the day soon after each data > volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is > configurable). Hi Nathan Your solution does not correspond with required feature. Is this solution acceptable to the reporter?
Comment 7 Peter Vreman 2019-06-18 15:24:37 UTC
Disk space usage is one aspect only The amount of data to be processed by the reporting tools is another. Does pmwebd for e.g. grafana not have a performance impact?
Comment 8 Nathan Scott 2019-06-18 20:29:07 UTC
Michal, yes I believe it addresses the original concerns around per-process logging - we have both hotproc auto-configuration (if in use) and (the more generally useful, for Red Hat customer support) proc logging nowadays. Peter, yes it certainly does have performance impact. We are actively working on that aspect (Grafana and PCP REST API performance) in other BZs however, with a complete revamp of the REST APIs and bringing Grafana up to the latest version - beyond the scope of this BZ however.
Comment 9 Michal Kolar 2019-06-19 13:39:18 UTC
Verified against pcp-4.3.2-2.el7.
Comment 11 errata-xmlrpc 2019-08-06 12:48:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2111