This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1293471 - RFE: inline compression of /var/log/pcp data files
RFE: inline compression of /var/log/pcp data files
Status: ASSIGNED
Product: Fedora
Classification: Fedora
Component: pcp (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: Dave Brolley
qe-baseos-tools
: FutureFeature
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-21 15:10 EST by Dwight (Bud) Brown
Modified: 2017-08-31 19:18 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dwight (Bud) Brown 2015-12-21 15:10:35 EST
Description of problem:
I've a modest configuration of 400 disks, but after 30 minutes of pcp collecting data the file within /var/log/pcp is almost 200MB which means almost 10GB per day.  Typical configurations of 2000-4000 disks will create substantially larger files.  Looking for an inline compression/decompression on writing/reading file to help control the file sizes of collected data.

For example, collectl data collection is compressed by default.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Comment 1 Frank Ch. Eigler 2015-12-21 15:25:04 EST
Can you attach the /var/log/.../pmlogger.log file for our reference?  Some high-cpu# machines were adversely affected by former defaults.
Comment 4 Frank Ch. Eigler 2015-12-22 10:35:44 EST
Would your machine happen to have a large number of CPUs?  A former default-configuration bug causes unnecessary logging of .percpu. metrics (bug #1243809).

Interesting how the pmlogger.log file's estimate of daily consumption (28 MB/day) is so grossly wrong.  Perhaps the set of devices fluctuates a great deal, so that during the first fetch (near midnight), it finds only very few instances?

I don't know of any prepackaged pcp tools that give an analysis of space consumption of an archive, in terms of the number/sizes of metrics/instances stored.  pmdumplog will decode everything, but of course results in huge output.  Perhaps you could run pmdumplog and transcribe a random megabyte from the middle, or give us access to the archives to look closer.
Comment 5 Dwight (Bud) Brown 2015-12-22 10:47:28 EST
1 physical process, 8 hyperthreaded:

# cat /proc/cpuinfo | grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3
processor	: 4
processor	: 5
processor	: 6
processor	: 7

# grep "physical id" /proc/cpuinfo
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
Comment 7 Dwight (Bud) Brown 2015-12-22 10:52:11 EST
There are a couple pmlogger.log files on this system in that dhcp keeps assigning it different IP.  I reposted the pmlogger.log file.

} logged every 1 sec: 107792 bytes or 8881.79 Mbytes/day
Comment 8 Frank Ch. Eigler 2015-12-22 11:07:55 EST
OK, good to see that those estimates are closer to reality.  Sampling at 1Hz will clearly drive up storage needs.  You may need to hand-edit a pmlogger configuration that focuses on the metrics you really need at that high rate.  (One can mix high & low rate sampling of same or different metrics in one archive.)

See also http://oss.sgi.com/bugzilla/show_bug.cgi?id=1072

Re. DHCP & different directories, consider setting a fixed hostname(1) for the machine, and/or change /etc/pcp/control to set a fixed string instead of LOCALHOSTNAME for the log directory.  Other options also exist.
Comment 9 Dwight (Bud) Brown 2015-12-22 11:14:01 EST
This is a test box that gets restarted/reinstalled/reconfigured almost on a daily basis.  Remembering to go in and set a fixed hostname after anaconda since it will be a manual step, is likely not to get applied rigorously.
Comment 10 Mark Goodwin 2015-12-22 19:21:58 EST
The PCP log maintenance cron jobs can be configured to compress PCP logs during the daily log rotation. See the -x -X and -Y options in the pmlogger_daily(1) man page. PCP logs generally compress around 80%

For hosts that are frequently reinstalled and use DHCP primary address, I use redhat-ddns-client, which assigns a usersys.redhat.com hostname to a dynamic DHCP address each time the host is booted. THis works well on my bounce boxen, though not sure how automatable it is on every fresh install. 

Bud, if the existing compression feature (during log rotation) suits your requirements, can we close this out CURRENTRELEASE? Or are you be requesting on-the-fly compression with transparent playback of compressed archives, i.e. using libz in the PCP libraries?
Comment 11 Mark Goodwin 2015-12-23 00:22:50 EST
Actually, data volumes (*.0, *.1, etc) can be compressed and will be used transparently by PCP clients. Compressing volumes is a feature of the log rotation cron jobs, see the -x option in the pmlogger_daily(1) man page,
but you can also manually compress them if you want. A 2G data volume will
typically come down to about 400MB with xz compression.
Comment 12 Frank Ch. Eigler 2015-12-23 07:42:35 EST
(In reply to Mark Goodwin from comment #11)
> Actually, data volumes (*.0, *.1, etc) can be compressed and will be used
> transparently by PCP clients. [...]

It's only kind of transparent, since pcp clients end up decompressing the
whole archive into /tmp before starting to read it, potentially GBs of I/O.
It would be much better to be able to traverse the file in compressed form.
Comment 13 Dwight (Bud) Brown 2015-12-23 11:18:40 EST
"The PCP log maintenance cron jobs can be configured to compress PCP logs during the daily log rotation. "

That feature is insufficient for our needs.

What we currently have is compression during collection, both with sysstat piped through compression and collectl which compresses on the fly -- this minimizes io to disk which is especially useful on smaller storage configurations where its not possible to log data away from the disk with potential perf issues.  

A 2nd part of this is, can we redirect PCP log files to a different location other than /var/log/pcp for the above same reason?  Again, with sysstat that PCP is replacing that is possible (as well as with collectl).
Comment 14 Frank Ch. Eigler 2015-12-23 11:38:12 EST
> [...]
> A 2nd part of this is, can we redirect PCP log files to a different location
> other than /var/log/pcp for the above same reason?  Again, with sysstat that
> PCP is replacing that is possible (as well as with collectl).

With "service pmlogger", you can direct logging to whereever you like by editing the /etc/pcp/pmlogger/control{,.d/*} file(s).

With "service pmmgr" (an alternative to service-pmlogger), you can do so by editing the /etc/pcp/pmmgr/log-directory file.

If using hand-started pmlogger, you can do so by specifying the destination archive on its command line.
Comment 15 Mark Goodwin 2015-12-23 17:14:49 EST
You can also simply edit /etc/pcp.conf and change :

# directory for PCP logs
# Standard path: /var/log/pcp
# Subdirectories: pmcd pmlogger pmie
PCP_LOG_DIR=/var/log/pcp

All PCP tools and services should honor $PCP_LOG_DIR. If something doesn't then please report a bug.

An alternative would be to have an NFS mount for all PCP pmlogger archives (e.g. mounted on ${PCP_LOG_DIR}/pmlogger). Different hosts can all share the same NFS export, and their primary pmlogger service will write archives to a subdir based on their hostname. THis can avoid unwanted local disk traffic, which is sometimes important for compute nodes in an HPC cluster, etc, that sort of thing.
Comment 16 Nathan Scott 2016-01-07 18:56:08 EST
Shift to Fedora since that's where the work will need to arrive first (fairly significant chunk of libpcp work here).

This feature has been discussed within PCP circles for a long time, Bud - appreciate the feedback and we'll bump the priority as much as possible.
Comment 17 Nathan Scott 2017-08-31 19:18:20 EDT
Dave's making good progress on this feature ... marking it "assigned".

Note You need to log in before you can comment on or make changes to this bug.