Bug 1211432

Summary: change pmlogger default interval from 60s down to 10s
Product: Red Hat Enterprise Linux 7 Reporter: Mark Goodwin <mgoodwin>
Component: pcpAssignee: Lukas Berk <lberk>
Status: CLOSED ERRATA QA Contact: Michal Kolar <mkolar>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: bmr, brolley, bubrown, fche, mbenitez, mcermak, mgoodwin, michele, mkolar, nathans
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pcp-3.11.8-4.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-01 18:29:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch none

Description Mark Goodwin 2015-04-14 01:44:33 UTC
Description of problem: feedback from the field suggests 60s is not enough logging resolution, especially for storage related metrics. 10s is a better default, but will increase logging volumes obviously, which could upset other projects such as sos (hence I put this in a bz for comment).

Comment 1 Mark Goodwin 2015-04-14 02:16:43 UTC
Bryn, will this upset sos data volumes from the PCP plugin? We can re-measure the actual volume/day if you need an accurate measurement on typical sized machines, but what limit would be acceptable? Trade-off useful logging frequency with daily data volume size and sos-report download times.

Comment 2 Frank Ch. Eigler 2015-04-14 10:58:48 UTC
Mark, are all pmlogconf'y metrics of deemed sixfold increased interest?

Comment 3 Mark Goodwin 2015-04-14 11:34:32 UTC
(In reply to Frank Ch. Eigler from comment #2)
> Mark, are all pmlogconf'y metrics of deemed sixfold increased interest?

nope. But I doubt the average user is going to be hunting around in /usr/libexec/pcp/bin for pmlogonf, and then figure out how to set the interval for individual groups of metrics. So it seems easier to just set the default logging interval to 10s and be done with it. This can be done either in the pmlogger source or in the control file.

Personally, I find collectl's -i flag more user friendly :

-i, --interval interval[:interval2[:interval3]]
      This is the sampling interval in seconds.  The default is 10 seconds  
      when  run as  a  daemon and 1 second otherwise.  The process 
      subsystem and slabs (-sY and -sZ) are sampled at the lower rate
      of interval2.  Environmentals  (-sE),  which only  apply  to a
      subset of hardware, are sampled at interval3.  Both interval2 and
      interval3, if specified, must be an even multiple of interval1.  The
      daemon default is -i10:60:300 and all other modes are -i1:60:300.
      To sample only processes once every 10 seconds use -i:10.

Comment 4 Frank Ch. Eigler 2015-04-14 12:02:13 UTC
> > Mark, are all pmlogconf'y metrics of deemed sixfold increased interest?
> 
> nope. But I doubt the average user is going to be hunting around in
> /usr/libexec/pcp/bin for pmlogonf

Can we do it for them (once)?

> Personally, I find collectl's -i flag more user friendly :
> -i, --interval interval[:interval2[:interval3]]

Could we start with grouping our pmlogconf metrics into the
same three classes, and setting corresponding defaults?

Comment 5 Mark Goodwin 2015-04-15 02:30:02 UTC
Hi Frank, we could enhance pmlogconf's 'delta' syntax to support the default interval times N for N >= 1. That way we can scale things up depending the prevailing default logging interval, which could be made quite small.

e.g. if pmlogger's default is 5s, then we might want disk and network stats to be default2, cpu stats to be default1 (which would be the default anyway), proc stats to be default12, etc etc. hinv stuff would be 'once', or maybe default60 since nearly everything is hot-pluggable nowdays so 'once' should no longer be used. If the default logging interval is changed (in pmlogger's control file or on the cmd line if run manually), then everything scales up (or down) accordingly.

Comment 6 Mark Goodwin 2015-04-15 08:35:19 UTC
Created attachment 1014623 [details]
patch

this should be on the list for RFC, nevertheless - the patch allows a pmlogger config interval to be specified as "N times default" instead of an explicit time interval, for some integer N >= 1. No doc updates or pmlogconf updates yet, but comments welcome. Back-compat - all existing syntax options preserved.

Comment 7 Frank Ch. Eigler 2015-04-15 14:25:12 UTC
That looks like a useful approach, esp. if coupled with pmlogconf fragments that exploit it, and an assessment of the increase of pmcd/pmlogger resource consumption.

Comment 8 Bryn M. Reeves 2015-04-16 10:10:15 UTC
Michele Baldessari wrote the pcp plugin - it currently uses a fixed 100MiB limit by default (applied to PCP_LOG_DIR/pmlogger/`hostname`).

A warning is issued and recorded in the logs if the limit is reached and if sos is run with the '--all-logs' option this size limiting is disabled.

We can adjust the limits (or set them dynamically based on the interval setting) if needed.

Passing the NEEDINFO to Michele in case he has suggestions.

Comment 9 Michele Baldessari 2015-04-17 06:51:08 UTC
The 100MB seemed a fairly reasonable limit when I wrote the PCP plugin.
I had compared it to a couple of servers I run where PCP is collecting
the default metric set with the default 1min interval.

I'd say we need to define what would be a maximum total size for an sosreport
(i.e. a reasonable approximation) and then we can tweak the 100MB value.
We need a bit of hard data to understand a bit how much bigger the sosreports
would get with different intervals (and stock config). I will do this on
a couple of servers of mine and report here, if it is useful.

Mark, I really like the approach in comment 5 and 6

Comment 10 Mark Goodwin 2015-06-24 06:04:16 UTC
Actually, this one didn't make it into 3.10.5. I will pretty certainly have it ready for 3.10.6 ...

Comment 11 Nathan Scott 2015-07-15 07:12:49 UTC
(reassigning as per comment#10)

Comment 12 Mark Goodwin 2015-08-03 11:02:54 UTC
The originally proposed patch for this issue has not been committed - instead we worked with Kenj to add -s mode to pmcpp(1) and to enable it's use by default as a pmlogger config preprocessor - this offers the greatest flexibility. That work has been completed and qa coverage has been added, so we can close off this BZ.

The remaining part of this work is to update all of the pmlogconf config files to use macros for various subsystems, e.g. %disk_interval, %cpu_interval and so forth, with the macros themselves being defined in a system wide include file, e.g. $PCP_SYSCONFIG_DIR/pmlogger/pmlogger.macros or some such. This customization work has not been included in the fix for this BZ - as I see it, that additional work belongs in a new BZ/RFE (which I'll post once we have agreement).

Comment 13 Nathan Scott 2015-08-04 00:04:55 UTC
Sounds good to me - thanks Mark.

Comment 15 Nathan Scott 2016-06-24 03:38:36 UTC
No progress on this as yet, Marks plan in comment #12 remains the goal though.

Comment 16 Nathan Scott 2017-03-01 05:52:57 UTC
Part of the zeroconf packaging work being handled by Lukas.

Comment 18 Michal Kolar 2017-06-01 10:03:35 UTC
Reproduced against pcp-3.11.3-4.el7 and verified against pcp-3.11.8-5.el7.

Comment 19 errata-xmlrpc 2017-08-01 18:29:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1968