1612916 – [RFE] Improve pcp-zeroconf for large cpu servers

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1612916 - [RFE] Improve pcp-zeroconf for large cpu servers

Summary: [RFE] Improve pcp-zeroconf for large cpu servers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	pcp
Sub Component:
Version:	7.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	low
Target Milestone:	rc
Target Release:	7.7
Assignee:	Nathan Scott
QA Contact:	Michal Kolar
Docs Contact:
URL:
Whiteboard:
Depends On:	1565370
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-06 14:12 UTC by Welterlen Benoit
Modified:	2019-08-06 12:48 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-08-06 12:48:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:2111	0	None	None	None	2019-08-06 12:48:36 UTC

Description Welterlen Benoit 2018-08-06 14:12:09 UTC

Description of problem:

The current pcp-zeroconf has proc.psinfo gathered every 10 seconds. This generates 20-30GB per day on a big machine, and the .meta being 7-9GB that is never compressed also not during archiving.

With large cpu systems the proc.psinfo gather includes per CPU ~ 7 kernel threads.
For a 384 cpu servers that means 2688 processes are included.

-----------
proc.nprocs
    value 2761
--------

The PCP has the hotproc feature to limit the information.
---------
cat /var/lib/pcp/pmdas/proc/samplehotproc.conf
#pmdahotproc
Version 1.0

uname != "root" || cpuburn > 0.05

sudo cp /var/lib/pcp/pmdas/proc/samplehotproc.conf /var/lib/pcp/pmdas/proc/hotproc.conf
sudo service pmcd restart
Redirecting to /bin/systemctl restart  pmcd.service
pminfo -f proc.nprocs hotproc.nprocs

proc.nprocs
    value 2762

hotproc.nprocs
    value 16

-----------

Before hotproc with standard pcp-zeroconf using 'proc':
-----------
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        proc.psinfo.pid
} logged every 10 sec: 1553280 bytes or 12798.63 Mbytes/day
-----------

After using enable the sampe hotproc and updating the zeroconf provided atop-proc replacing 'proc.' with 'hotproc.':
-----------
sudo sed 's/^\([[:space:]]\)proc\./\1hotproc./' -i /var/lib/pcp/config/pmlogconf/puppet/atop-proc
sudo service pmlogger restart
Redirecting to /bin/systemctl restart  pmlogger.service
tail -n5 /var/log/pcp/pmlogger/x/pmlogger.log
        hotproc.psinfo.pid
        proc.runq.blocked
        proc.runq.runnable
        proc.nprocs
} logged every 10 sec: 243124 bytes or 2003.28 Mbytes/day
-----------

Is it possible to improve the out-of-the-box Recommended/BestPractice pcp-zeroconf configuration to enable the (sample) hotproc instead and the atop-proc pmlogger configuration to use hotproc.


Version-Release number of selected component (if applicable):
RHEL 7

How reproducible:
Easy, always

Steps to Reproduce:
1.Use the zeroconf that advice proc and not hotproc
2.Use a lot of processes or a big machine
3.Check the size used by the gathered data

Actual results:
12 GB logged every day
All process monitored

Expected results:
By default, zero-conf should be set to hotproc and not proc, unless performance reason
2 GB and only noticeable processes should be monitored 

Additional info:

Comment 3 Nathan Scott 2018-08-07 06:25:06 UTC

| This generates 20-30GB per day on a big machine, and the .meta being
| 7-9GB that is never compressed also not during archiving.

There has been much progress in this area in the PCP (rebase) in RHEL 7.6.

We do now compress .meta files each day.  We also have a new strategy around the data volumes, which are compressed during the day soon after each data volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is configurable).

Comment 4 Mark Goodwin 2018-08-07 11:27:56 UTC

In addition, PCP archives compress very well, typically 10:1 - so that 20-30GB per day should reduce to 2 to 3 GB/day, which is in the realm of the original expected results specified in Comment #0

Comment 6 Michal Kolar 2019-06-18 12:11:41 UTC

(In reply to Nathan Scott from comment #3)
> | This generates 20-30GB per day on a big machine, and the .meta being
> | 7-9GB that is never compressed also not during archiving.
> 
> There has been much progress in this area in the PCP (rebase) in RHEL 7.6.
> 
> We do now compress .meta files each day.  We also have a new strategy around
> the data volumes, which are compressed during the day soon after each data
> volume (.0, .1, .2, etc) reaches the 100Mb mark (sub-volume chunk size is
> configurable).

Hi Nathan

Your solution does not correspond with required feature. Is this solution acceptable to the reporter?

Comment 7 Peter Vreman 2019-06-18 15:24:37 UTC

Disk space usage is one aspect only
The amount of data to be processed by the reporting tools is another. Does pmwebd for e.g. grafana not have a performance impact?

Comment 8 Nathan Scott 2019-06-18 20:29:07 UTC

Michal, yes I believe it addresses the original concerns around per-process logging - we have both hotproc auto-configuration (if in use) and (the more generally useful, for Red Hat customer support) proc logging nowadays.

Peter, yes it certainly does have performance impact.  We are actively working on that aspect (Grafana and PCP REST API performance) in other BZs however, with a complete revamp of the REST APIs and bringing Grafana up to the latest version - beyond the scope of this BZ however.

Comment 9 Michal Kolar 2019-06-19 13:39:18 UTC

Verified against pcp-4.3.2-2.el7.

Comment 11 errata-xmlrpc 2019-08-06 12:48:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2111

Note You need to log in before you can comment on or make changes to this bug.