Bug 1612916 - [RFE] Improve pcp-zeroconf for large cpu servers
Summary: [RFE] Improve pcp-zeroconf for large cpu servers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.5
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: rc
: 7.7
Assignee: Nathan Scott
QA Contact: Michal Kolar
URL:
Whiteboard:
Depends On: 1565370
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-06 14:12 UTC by Welterlen Benoit
Modified: 2019-08-06 12:48 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-06 12:48:17 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2111 None None None 2019-08-06 12:48:36 UTC

Description Welterlen Benoit 2018-08-06 14:12:09 UTC
Description of problem:

The current pcp-zeroconf has proc.psinfo gathered every 10 seconds. This generates 20-30GB per day on a big machine, and the .meta being 7-9GB that is never compressed also not during archiving.

With large cpu systems the proc.psinfo gather includes per CPU ~ 7 kernel threads.
For a 384 cpu servers that means 2688 processes are included.

-----------
proc.nprocs
    value 2761
--------

The PCP has the hotproc feature to limit the information.
---------
cat /var/lib/pcp/pmdas/proc/samplehotproc.conf
#pmdahotproc
Version 1.0

uname != "root" || cpuburn > 0.05

sudo cp /var/lib/pcp/pmdas/proc/samplehotproc.conf /var/lib/pcp/pmdas/proc/hotproc.conf
sudo service pmcd restart
Redirecting to /bin/systemctl restart  pmcd.service
pminfo -f proc.nprocs hotproc.nprocs

proc.nprocs
    value 2762

hotproc.nprocs
    value 16

-----------

Before hotproc with standard pcp-zeroconf using 'proc':
-----------
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        proc.psinfo.pid
} logged every 10 sec: 1553280 bytes or 12798.63 Mbytes/day
-----------

After using enable the sampe hotproc and updating the zeroconf provided atop-proc replacing 'proc.' with 'hotproc.':
-----------
sudo sed 's/^\([[:space:]]\)proc\./\1hotproc./' -i /var/lib/pcp/config/pmlogconf/puppet/atop-proc
sudo service pmlogger restart
Redirecting to /bin/systemctl restart  pmlogger.service
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        hotproc.psinfo.pid
        proc.runq.blocked
        proc.runq.runnable
        proc.nprocs
} logged every 10 sec: 243124 bytes or 2003.28 Mbytes/day
-----------

Is it possible to improve the out-of-the-box Recommended/BestPractice pcp-zeroconf configuration to enable the (sample) hotproc instead and the atop-proc pmlogger configuration to use hotproc.


Version-Release number of selected component (if applicable):
RHEL 7

How reproducible:
Easy, always

Steps to Reproduce:
1.Use the zeroconf that advice proc and not hotproc
2.Use a lot of processes or a big machine
3.Check the size used by the gathered data

Actual results:
12 GB logged every day
All process monitored

Expected results:
By default, zero-conf should be set to hotproc and not proc, unless performance reason
2 GB and only noticeable processes should be monitored 

Additional info:

Comment 3 Nathan Scott 2018-08-07 06:25:06 UTC
| This generates 20-30GB per day on a big machine, and the .meta being
| 7-9GB that is never compressed also not during archiving.

There has been much progress in this area in the PCP (rebase) in RHEL 7.6.

We do now compress .meta files each day.  We also have a new strategy around the data volumes, which are compressed during the day soon after each data volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is configurable).

Comment 4 Mark Goodwin 2018-08-07 11:27:56 UTC
In addition, PCP archives compress very well, typically 10:1 - so that 20-30GB per day should reduce to 2 to 3 GB/day, which is in the realm of the original expected results specified in Comment #0

Comment 6 Michal Kolar 2019-06-18 12:11:41 UTC
(In reply to Nathan Scott from comment #3)
> | This generates 20-30GB per day on a big machine, and the .meta being
> | 7-9GB that is never compressed also not during archiving.
> 
> There has been much progress in this area in the PCP (rebase) in RHEL 7.6.
> 
> We do now compress .meta files each day.  We also have a new strategy around
> the data volumes, which are compressed during the day soon after each data
> volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is
> configurable).

Hi Nathan

Your solution does not correspond with required feature. Is this solution acceptable to the reporter?

Comment 7 Peter Vreman 2019-06-18 15:24:37 UTC
Disk space usage is one aspect only
The amount of data to be processed by the reporting tools is another. Does pmwebd for e.g. grafana not have a performance impact?

Comment 8 Nathan Scott 2019-06-18 20:29:07 UTC
Michal, yes I believe it addresses the original concerns around per-process logging - we have both hotproc auto-configuration (if in use) and (the more generally useful, for Red Hat customer support) proc logging nowadays.

Peter, yes it certainly does have performance impact.  We are actively working on that aspect (Grafana and PCP REST API performance) in other BZs however, with a complete revamp of the REST APIs and bringing Grafana up to the latest version - beyond the scope of this BZ however.

Comment 9 Michal Kolar 2019-06-19 13:39:18 UTC
Verified against pcp-4.3.2-2.el7.

Comment 11 errata-xmlrpc 2019-08-06 12:48:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2111


Note You need to log in before you can comment on or make changes to this bug.