Description of problem:
By default, the PCP logs are rotated every day at 00:10, via a cron entry:
# grep pmlogger_daily /etc/cron.d/pcp-pmlogger
10 0 * * * pcp /usr/libexec/pcp/bin/pmlogger_daily -X xz -x 3
When logs are rotated, samples are getting lost, causing "pmval" to display "No values available" for the time interval where log got rotated.
This is annoying when consolidating logs for large intervals, such as 8 hours (in such case, a whole 8 hour slice gets lost)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. start PCP
systemctl start pmcd.service pmlogger.service
2. wait for 10 minutes or more to gather statistics
3. force a log rotation as specified in the cron entry:
sudo -u pcp /usr/libexec/pcp/bin/pmlogger_daily -X xz -x 3
4. wait for 10 minutes or more to gather statistics
"No values available" is displayed for the time slice the log rotation happened
No loss of statistics
See sample below for log rotation around 13:37.
"Every minute" samples
# pmval -a /var/log/pcp/pmlogger/vm-rhel73 -t 1m -z network.interface.in.bytes
13:14:02.045 No values available
13:15:02.045 233.5 0.0
13:16:02.045 65.92 0.0
13:17:02.045 121.7 0.0
13:18:02.045 240.9 0.0
13:19:02.045 233.5 0.0
13:20:02.045 335.2 0.0
13:21:02.045 102.6 0.0
13:22:02.045 32.23 0.0
13:23:02.045 213.3 0.0
13:24:02.045 115.1 0.0
13:25:02.045 150.5 0.0
13:26:02.045 54.67 0.0
13:27:02.045 41.17 0.0
13:28:02.045 38.43 0.0
13:29:02.045 147.0 0.0
13:30:02.045 55.83 0.0
13:31:02.045 118.0 0.0
13:32:02.045 186.5 0.0
13:33:02.045 81.88 0.0
13:34:02.045 92.93 0.0
13:35:02.045 60.00 0.0
13:36:02.045 180.8 0.0
13:37:02.045 No values available
13:38:02.045 No values available
13:39:02.045 94.83 0.0
13:40:02.045 91.12 0.0
"Every 10 minutes" consolidation (whole 13:35 -> 13:45 slice is lost):
# pmval -a /var/log/pcp/pmlogger/vm-rhel73 -t 10m -z network.interface.in.bytes
13:15:02.045 No values available
13:25:02.045 161.1 0.0
13:35:02.045 87.64 0.0
13:45:02.045 No values available
This is due to the '<mark>' record that gets inserted between archives, either when multiple archives are merged or when you replay more than one archive.
A <mark> record is a pmlogger record that signifies a temporal gap in an archive (due to said merging and certain other events). libpcp currently will return no values when the current replay interval traverses the <mark>.
For context, see BZ #1296750 - incorrect interpolation across <mark> record in a merged archive
We're actively working on this and expect to have a solution in the current upstream release (pcp-3.12.2) soon - there are some circumstances where <mark> records are tolerable for replay purposes.
In the mean-time, you should be able to use pmval and other tools in non-interpolating mode using the -U flag, see the man page.
> By default, the PCP logs are rotated every day at 00:10, via a cron entry:
> # grep pmlogger_daily /etc/cron.d/pcp-pmlogger
> 10 0 * * * pcp /usr/libexec/pcp/bin/pmlogger_daily -X xz -x 3
> When logs are rotated, samples are getting lost, causing "pmval" to display
> "No values available" for the time interval where log got rotated.
> This is annoying when consolidating logs for large intervals, such as 8
> hours (in such case, a whole 8 hour slice gets lost)
You may find pmmgr a useful alternative to pmlogger_daily, when it comes to consolidation & sensitivity to daily processing edge cases, because you have greater control over the granularity of the log files. For example,
# yum install pcp-manager
# echo '7days' > /etc/pcp/pmlogmerge
# echo '7days' > /etc/pcp/pmlogmerge-retain
# echo '-t 3600' > /etc/pcp/pmmgr/pmlogreduce
# service pmmgr on; service pmlogger off
# admire /var/log/pcp/pmmgr/$HOSTNAME
This would give you 7-day-long archives, with older ones being compressed by
time-wise subsampling (3600s).
We're discussing possible ways to tackle this properly, the underlying issue is exactly as Mark described. In the meantime, you might find the stripmark utility in the PCP testsuite to be of use ...
You can use this to remove "mark" records from archives - use this only for the case where you know there was not a discontinuation in the PCP data (i.e. end of days archive processing). The utility will strip all mark records, and obviously its use is a manual step that should not be necessary - I suggest it only as a stop-gap measure until we come up with a longer-term viable plan.
Using pmmgr is not a long-term solution to this issue either. We'll fix it properly without these workarounds as soon as we are able to.
Verified against pcp-3.12.2-5.el7.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.