Bug 1293471

Summary:	RFE: inline compression of /var/log/pcp data files
Product:	[Fedora] Fedora	Reporter:	Dwight (Bud) Brown <bubrown>
Component:	pcp	Assignee:	Dave Brolley <brolley>
Status:	CLOSED ERRATA	QA Contact:	qe-baseos-tools-bugs
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rawhide	CC:	brolley, bubrown, fche, lberk, mbenitez, mgoodwin, nathans, pcp
Target Milestone:	---	Keywords:	FutureFeature
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	pcp-4.1.0-2.fc27 pcp-4.1.0-2.fc28	Doc Type:	Enhancement
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-06-23 19:56:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Dwight (Bud) Brown 2015-12-21 20:10:35 UTC

Description of problem:
I've a modest configuration of 400 disks, but after 30 minutes of pcp collecting data the file within /var/log/pcp is almost 200MB which means almost 10GB per day.  Typical configurations of 2000-4000 disks will create substantially larger files.  Looking for an inline compression/decompression on writing/reading file to help control the file sizes of collected data.

For example, collectl data collection is compressed by default.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Frank Ch. Eigler 2015-12-21 20:25:04 UTC

Can you attach the /var/log/.../pmlogger.log file for our reference?  Some high-cpu# machines were adversely affected by former defaults.

Comment 4 Frank Ch. Eigler 2015-12-22 15:35:44 UTC

Would your machine happen to have a large number of CPUs?  A former default-configuration bug causes unnecessary logging of .percpu. metrics (bug #1243809).

Interesting how the pmlogger.log file's estimate of daily consumption (28 MB/day) is so grossly wrong.  Perhaps the set of devices fluctuates a great deal, so that during the first fetch (near midnight), it finds only very few instances?

I don't know of any prepackaged pcp tools that give an analysis of space consumption of an archive, in terms of the number/sizes of metrics/instances stored.  pmdumplog will decode everything, but of course results in huge output.  Perhaps you could run pmdumplog and transcribe a random megabyte from the middle, or give us access to the archives to look closer.

Comment 5 Dwight (Bud) Brown 2015-12-22 15:47:28 UTC

1 physical process, 8 hyperthreaded:

# cat /proc/cpuinfo | grep processor
processor	: 0
processor	: 1
processor	: 2
processor	: 3
processor	: 4
processor	: 5
processor	: 6
processor	: 7

# grep "physical id" /proc/cpuinfo
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0
physical id	: 0

Comment 7 Dwight (Bud) Brown 2015-12-22 15:52:11 UTC

There are a couple pmlogger.log files on this system in that dhcp keeps assigning it different IP.  I reposted the pmlogger.log file.

} logged every 1 sec: 107792 bytes or 8881.79 Mbytes/day

Comment 8 Frank Ch. Eigler 2015-12-22 16:07:55 UTC

OK, good to see that those estimates are closer to reality.  Sampling at 1Hz will clearly drive up storage needs.  You may need to hand-edit a pmlogger configuration that focuses on the metrics you really need at that high rate.  (One can mix high & low rate sampling of same or different metrics in one archive.)

See also http://oss.sgi.com/bugzilla/show_bug.cgi?id=1072

Re. DHCP & different directories, consider setting a fixed hostname(1) for the machine, and/or change /etc/pcp/control to set a fixed string instead of LOCALHOSTNAME for the log directory.  Other options also exist.

Comment 9 Dwight (Bud) Brown 2015-12-22 16:14:01 UTC

This is a test box that gets restarted/reinstalled/reconfigured almost on a daily basis.  Remembering to go in and set a fixed hostname after anaconda since it will be a manual step, is likely not to get applied rigorously.

Comment 10 Mark Goodwin 2015-12-23 00:21:58 UTC

The PCP log maintenance cron jobs can be configured to compress PCP logs during the daily log rotation. See the -x -X and -Y options in the pmlogger_daily(1) man page. PCP logs generally compress around 80%

For hosts that are frequently reinstalled and use DHCP primary address, I use redhat-ddns-client, which assigns a usersys.redhat.com hostname to a dynamic DHCP address each time the host is booted. THis works well on my bounce boxen, though not sure how automatable it is on every fresh install. 

Bud, if the existing compression feature (during log rotation) suits your requirements, can we close this out CURRENTRELEASE? Or are you be requesting on-the-fly compression with transparent playback of compressed archives, i.e. using libz in the PCP libraries?

Comment 11 Mark Goodwin 2015-12-23 05:22:50 UTC

Actually, data volumes (*.0, *.1, etc) can be compressed and will be used transparently by PCP clients. Compressing volumes is a feature of the log rotation cron jobs, see the -x option in the pmlogger_daily(1) man page,
but you can also manually compress them if you want. A 2G data volume will
typically come down to about 400MB with xz compression.

Comment 12 Frank Ch. Eigler 2015-12-23 12:42:35 UTC

(In reply to Mark Goodwin from comment #11)
> Actually, data volumes (*.0, *.1, etc) can be compressed and will be used
> transparently by PCP clients. [...]

It's only kind of transparent, since pcp clients end up decompressing the
whole archive into /tmp before starting to read it, potentially GBs of I/O.
It would be much better to be able to traverse the file in compressed form.

Comment 13 Dwight (Bud) Brown 2015-12-23 16:18:40 UTC

"The PCP log maintenance cron jobs can be configured to compress PCP logs during the daily log rotation. "

That feature is insufficient for our needs.

What we currently have is compression during collection, both with sysstat piped through compression and collectl which compresses on the fly -- this minimizes io to disk which is especially useful on smaller storage configurations where its not possible to log data away from the disk with potential perf issues.  

A 2nd part of this is, can we redirect PCP log files to a different location other than /var/log/pcp for the above same reason?  Again, with sysstat that PCP is replacing that is possible (as well as with collectl).

Comment 14 Frank Ch. Eigler 2015-12-23 16:38:12 UTC

> [...]
> A 2nd part of this is, can we redirect PCP log files to a different location
> other than /var/log/pcp for the above same reason?  Again, with sysstat that
> PCP is replacing that is possible (as well as with collectl).

With "service pmlogger", you can direct logging to whereever you like by editing the /etc/pcp/pmlogger/control{,.d/*} file(s).

With "service pmmgr" (an alternative to service-pmlogger), you can do so by editing the /etc/pcp/pmmgr/log-directory file.

If using hand-started pmlogger, you can do so by specifying the destination archive on its command line.

Comment 15 Mark Goodwin 2015-12-23 22:14:49 UTC

You can also simply edit /etc/pcp.conf and change :

# directory for PCP logs
# Standard path: /var/log/pcp
# Subdirectories: pmcd pmlogger pmie
PCP_LOG_DIR=/var/log/pcp

All PCP tools and services should honor $PCP_LOG_DIR. If something doesn't then please report a bug.

An alternative would be to have an NFS mount for all PCP pmlogger archives (e.g. mounted on ${PCP_LOG_DIR}/pmlogger). Different hosts can all share the same NFS export, and their primary pmlogger service will write archives to a subdir based on their hostname. THis can avoid unwanted local disk traffic, which is sometimes important for compute nodes in an HPC cluster, etc, that sort of thing.

Comment 16 Nathan Scott 2016-01-07 23:56:08 UTC

Shift to Fedora since that's where the work will need to arrive first (fairly significant chunk of libpcp work here).

This feature has been discussed within PCP circles for a long time, Bud - appreciate the feedback and we'll bump the priority as much as possible.

Comment 17 Nathan Scott 2017-08-31 23:18:20 UTC

Dave's making good progress on this feature ... marking it "assigned".

Comment 18 Nathan Scott 2018-06-15 00:55:44 UTC

This ended up resolved by Dave and Kenj in the PCP community; the final touches went into pcp-4.1.0 (releasing today).

Comment 19 Fedora Update System 2018-06-15 07:37:20 UTC

pcp-4.1.0-2.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-dfb77e69f1

Comment 20 Fedora Update System 2018-06-15 07:37:45 UTC

pcp-4.1.0-2.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-e351b52702

Comment 21 Fedora Update System 2018-06-15 14:09:26 UTC

pcp-4.1.0-2.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-e351b52702

Comment 22 Fedora Update System 2018-06-15 16:35:27 UTC

pcp-4.1.0-2.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-dfb77e69f1

Comment 23 Fedora Update System 2018-06-23 19:56:10 UTC

pcp-4.1.0-2.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2018-06-23 20:46:55 UTC

pcp-4.1.0-2.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.