RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1243809 - drop kernel.percpu.interrupts from default pmlogconf
Summary: drop kernel.percpu.interrupts from default pmlogconf
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Nathan Scott
QA Contact: Miloš Prchlík
URL:
Whiteboard:
Depends On:
Blocks: 1249090
TreeView+ depends on / blocked
 
Reported: 2015-07-16 11:34 UTC by Frank Ch. Eigler
Modified: 2015-11-19 11:55 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 11:55:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2015:2096 0 normal SHIPPED_LIVE pcp bug fix and enhancement update 2015-11-19 10:39:13 UTC

Description Frank Ch. Eigler 2015-07-16 11:34:52 UTC
The kernel.percpu.interrupts metric indom scales badly when run on a larger machine (#cpus * #irqs, which can run into the tens of thousands).  That leads to much larger than usual log files.  We should stop recording this by default in pmlogconf's various files.  While looking for pcp-residents consumers of this information, I found only pmcollectl, and even that was aggregating across interrupt lines.  So, the collection & logging effort is being apprx. 100% wasted.

We already have a kernel.all.intr metric (and are logging it by default) to feed overall system stats.  If there were demand, we could perhaps add a kernel.percpu.intr (indom = cpus) aggregated across interrupt lines and/or a kernel.all.interrupt.FOO (PMNS pseudo-indom = interrupt-line) aggregated across cpus), and get those pmlogconf-defaulted.

Comment 4 Nathan Scott 2015-08-05 02:59:33 UTC
pcp-3.10.6-1 el7 build contains this fix.

Comment 6 Joe Mario 2015-08-23 23:31:57 UTC

Frank and Nathan:
 I'm on the large system again, (HP Dragonhawk with 480 cpus).  It's running RHEL 7.2 alpha.   The pcp version is pcp-3.10.6-1 el7.

But it doesn't look like the excessive logging decreased by much.  I expected it to be much lower, given comment #4 above says it's fixed.

The system is idle, and with pmlogger enabled, the log file is growing by 841308 bytes per minute.  

The "pminfo -f" command still takes 1 minute to run and generates 84 Meg of data.

The pminfo output is at http://perf1.lab.bos.redhat.com/jmario/scratch/pminfo_aug_23_bl920gen8.txt

Let me know if you need access to the system.  I currently have it reserved.

Joe

Comment 7 Nathan Scott 2015-08-24 04:21:08 UTC
Thanks Joe.  From a look around the system, here's a few notes I made:


pmlogger

- the default log size here will drop a fair bit again shortly, with this weeks 7.2 pcp rebuild including the BZ 1254509 fix - I'll send you a note when that's ready if you like.

- after that, the default size will be around the 200MB per day mark uncompressed (140624 bytes every 60 sec).  For such a large system, this is pretty good I think - back when I was doing production system analysis we'd typically see daily logs in the order 150-175MB (smaller, application servers) though that was logging much more frequently (~15 second sampling)

- when the log compression kicks in, after 3 days IIRC, that drops right down to around about 1MB (!) - on-the-fly-compression from pmlogger is in the long-term PCP roadmap.

- alot of the space we're consuming currently is due to the per-cpu time metrics
(see "pminfo kernel.percpu.cpu") - simply because 480 * 11 64-bit values * the sampling interval, needs to be part of the logged set;


pminfo

- most of the pminfo time is being spent traversing and fetching the kernel.percpu.interrupts metrics (while we've stopped logging these now with my previous change, we've not done any optimisation work here yet) - there's plenty of scope for improving that code still.  But other metric trees are nice and quick to fetch values from, including the other percpu metrics...

# time pminfo -f kernel.percpu.cpu >/dev/null

real	0m0.058s
user	0m0.005s
sys	0m0.001s

So, once we come back to tackling the interrupts metrics, we'll see a noticeable improvement there.  Its not super-high priority yet though, just because running "pminfo -f" across every possible metric is not a common operation.  We do have other planned work in the interrupts metrics though, so it was good to see first hand the pain level there.

Comment 8 Miloš Prchlík 2015-10-20 11:13:50 UTC
Verified for build pcp-3.10.6-2.el7.

Comment 9 errata-xmlrpc 2015-11-19 11:55:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2096.html


Note You need to log in before you can comment on or make changes to this bug.