Bug 1612916

Summary: [RFE] Improve pcp-zeroconf for large cpu servers
Product: Red Hat Enterprise Linux 7 Reporter: Welterlen Benoit <bwelterl>
Component: pcpAssignee: Nathan Scott <nathans>
Status: CLOSED ERRATA QA Contact: Michal Kolar <mkolar>
Severity: low Docs Contact:
Priority: unspecified    
Version: 7.5CC: agerstmayr, jentrena, mgoodwin, mkolar, nathans, peter.vreman, tbowling
Target Milestone: rcKeywords: FutureFeature
Target Release: 7.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:48:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1565370    
Bug Blocks:    

Description Welterlen Benoit 2018-08-06 14:12:09 UTC
Description of problem:

The current pcp-zeroconf has proc.psinfo gathered every 10 seconds. This generates 20-30GB per day on a big machine, and the .meta being 7-9GB that is never compressed also not during archiving.

With large cpu systems the proc.psinfo gather includes per CPU ~ 7 kernel threads.
For a 384 cpu servers that means 2688 processes are included.

-----------
proc.nprocs
    value 2761
--------

The PCP has the hotproc feature to limit the information.
---------
cat /var/lib/pcp/pmdas/proc/samplehotproc.conf
#pmdahotproc
Version 1.0

uname != "root" || cpuburn > 0.05

sudo cp /var/lib/pcp/pmdas/proc/samplehotproc.conf /var/lib/pcp/pmdas/proc/hotproc.conf
sudo service pmcd restart
Redirecting to /bin/systemctl restart  pmcd.service
pminfo -f proc.nprocs hotproc.nprocs

proc.nprocs
    value 2762

hotproc.nprocs
    value 16

-----------

Before hotproc with standard pcp-zeroconf using 'proc':
-----------
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        proc.psinfo.pid
} logged every 10 sec: 1553280 bytes or 12798.63 Mbytes/day
-----------

After using enable the sampe hotproc and updating the zeroconf provided atop-proc replacing 'proc.' with 'hotproc.':
-----------
sudo sed 's/^\([[:space:]]\)proc\./\1hotproc./' -i /var/lib/pcp/config/pmlogconf/puppet/atop-proc
sudo service pmlogger restart
Redirecting to /bin/systemctl restart  pmlogger.service
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        hotproc.psinfo.pid
        proc.runq.blocked
        proc.runq.runnable
        proc.nprocs
} logged every 10 sec: 243124 bytes or 2003.28 Mbytes/day
-----------

Is it possible to improve the out-of-the-box Recommended/BestPractice pcp-zeroconf configuration to enable the (sample) hotproc instead and the atop-proc pmlogger configuration to use hotproc.


Version-Release number of selected component (if applicable):
RHEL 7

How reproducible:
Easy, always

Steps to Reproduce:
1.Use the zeroconf that advice proc and not hotproc
2.Use a lot of processes or a big machine
3.Check the size used by the gathered data

Actual results:
12 GB logged every day
All process monitored

Expected results:
By default, zero-conf should be set to hotproc and not proc, unless performance reason
2 GB and only noticeable processes should be monitored 

Additional info:

Comment 3 Nathan Scott 2018-08-07 06:25:06 UTC
| This generates 20-30GB per day on a big machine, and the .meta being
| 7-9GB that is never compressed also not during archiving.

There has been much progress in this area in the PCP (rebase) in RHEL 7.6.

We do now compress .meta files each day.  We also have a new strategy around the data volumes, which are compressed during the day soon after each data volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is configurable).

Comment 4 Mark Goodwin 2018-08-07 11:27:56 UTC
In addition, PCP archives compress very well, typically 10:1 - so that 20-30GB per day should reduce to 2 to 3 GB/day, which is in the realm of the original expected results specified in Comment #0

Comment 6 Michal Kolar 2019-06-18 12:11:41 UTC
(In reply to Nathan Scott from comment #3)
> | This generates 20-30GB per day on a big machine, and the .meta being
> | 7-9GB that is never compressed also not during archiving.
> 
> There has been much progress in this area in the PCP (rebase) in RHEL 7.6.
> 
> We do now compress .meta files each day.  We also have a new strategy around
> the data volumes, which are compressed during the day soon after each data
> volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is
> configurable).

Hi Nathan

Your solution does not correspond with required feature. Is this solution acceptable to the reporter?

Comment 7 Peter Vreman 2019-06-18 15:24:37 UTC
Disk space usage is one aspect only
The amount of data to be processed by the reporting tools is another. Does pmwebd for e.g. grafana not have a performance impact?

Comment 8 Nathan Scott 2019-06-18 20:29:07 UTC
Michal, yes I believe it addresses the original concerns around per-process logging - we have both hotproc auto-configuration (if in use) and (the more generally useful, for Red Hat customer support) proc logging nowadays.

Peter, yes it certainly does have performance impact.  We are actively working on that aspect (Grafana and PCP REST API performance) in other BZs however, with a complete revamp of the REST APIs and bringing Grafana up to the latest version - beyond the scope of this BZ however.

Comment 9 Michal Kolar 2019-06-19 13:39:18 UTC
Verified against pcp-4.3.2-2.el7.

Comment 11 errata-xmlrpc 2019-08-06 12:48:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2111