Bug 1243809
| Summary: | drop kernel.percpu.interrupts from default pmlogconf | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Ch. Eigler <fche> |
| Component: | pcp | Assignee: | Nathan Scott <nathans> |
| Status: | CLOSED ERRATA | QA Contact: | Miloš Prchlík <mprchlik> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.2 | CC: | brolley, fche, jmario, lberk, mbenitez, mcermak, mgoodwin, mprchlik, nathans |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-19 11:55:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1249090 | ||
|
Description
Frank Ch. Eigler
2015-07-16 11:34:52 UTC
pcp-3.10.6-1 el7 build contains this fix. Frank and Nathan: I'm on the large system again, (HP Dragonhawk with 480 cpus). It's running RHEL 7.2 alpha. The pcp version is pcp-3.10.6-1 el7. But it doesn't look like the excessive logging decreased by much. I expected it to be much lower, given comment #4 above says it's fixed. The system is idle, and with pmlogger enabled, the log file is growing by 841308 bytes per minute. The "pminfo -f" command still takes 1 minute to run and generates 84 Meg of data. The pminfo output is at http://perf1.lab.bos.redhat.com/jmario/scratch/pminfo_aug_23_bl920gen8.txt Let me know if you need access to the system. I currently have it reserved. Joe Thanks Joe. From a look around the system, here's a few notes I made: pmlogger - the default log size here will drop a fair bit again shortly, with this weeks 7.2 pcp rebuild including the BZ 1254509 fix - I'll send you a note when that's ready if you like. - after that, the default size will be around the 200MB per day mark uncompressed (140624 bytes every 60 sec). For such a large system, this is pretty good I think - back when I was doing production system analysis we'd typically see daily logs in the order 150-175MB (smaller, application servers) though that was logging much more frequently (~15 second sampling) - when the log compression kicks in, after 3 days IIRC, that drops right down to around about 1MB (!) - on-the-fly-compression from pmlogger is in the long-term PCP roadmap. - alot of the space we're consuming currently is due to the per-cpu time metrics (see "pminfo kernel.percpu.cpu") - simply because 480 * 11 64-bit values * the sampling interval, needs to be part of the logged set; pminfo - most of the pminfo time is being spent traversing and fetching the kernel.percpu.interrupts metrics (while we've stopped logging these now with my previous change, we've not done any optimisation work here yet) - there's plenty of scope for improving that code still. But other metric trees are nice and quick to fetch values from, including the other percpu metrics... # time pminfo -f kernel.percpu.cpu >/dev/null real 0m0.058s user 0m0.005s sys 0m0.001s So, once we come back to tackling the interrupts metrics, we'll see a noticeable improvement there. Its not super-high priority yet though, just because running "pminfo -f" across every possible metric is not a common operation. We do have other planned work in the interrupts metrics though, so it was good to see first hand the pain level there. Verified for build pcp-3.10.6-2.el7. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2096.html |