Bug 1697182
| Summary: | pmlogger_daily is unable to discard or compress older files which failed compression for some reason | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | nikhil kshirsagar <nkshirsa> |
| Component: | pcp | Assignee: | Mark Goodwin <mgoodwin> |
| Status: | CLOSED DUPLICATE | QA Contact: | qe-baseos-tools-bugs |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.4 | CC: | fche, kenj, lberk, mgoodwin, nathans, patrickm |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pcp-4.3.2-1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-05-09 08:09:36 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
nikhil kshirsagar
2019-04-08 03:40:30 UTC
After checking the sosreport, this customer is running RHEL74 with pcp-3.11.3-4.el7. That is a very old version of PCP, over three years old. The log management scripts have been considerably improved since then, especially with respect to early termination and skipping log discard and/or compression.
pcp-3.11.3-1 was released 17 June 2016. Since then, PCP in RHEL74 has had 4 patches as follows:
rhbz1211432.patch - change pmlogger default interval from 60s down to 10s
rhbz1419490.patch - [Pegas1.0 FEAT] Enable Performance Co-Pilot tool (pcp) to collect POWER9 NEST PMU perf metrics
rhbz1425880.patch - pcp-pmda-nutcracker has missing dependency
rhbz1432086.patch - errors appear during the installation of pcp-selinux
The current upstream version of the pmlogger_daily script (pcp-4.3.2, which is due to release in about two weeks) has the following important updates compared to pcp-3.11.3-4, see below. I have omitted minor changes and updates that are not relevant. The most critically important updates are :
commit ccafcd170d31e0ee56d7c9a3841ee9714457607a
Author: Ken McDonell <kenj.au>
Date: Sun Oct 14 15:51:46 2018 +1100
pmlogger_daily: change workflow to cull early
Change the workflow to move culling of old files earlier in the
pipeline. So if there is a problem with log rewriting or merging that
is unattended for a log time, we'll eventually cull the offending
archives rather than having the rewriting or mering failure block
the culling and (in extreme cases) lead to full flesystems
Also add qa/686 (new) to check all of this out.
commit 1c48b97bbfc25a77947b58957db8afe24c6504de
Author: Ken McDonell <kenj.au>
Date: Wed Jun 6 12:11:21 2018 +1000
src/pmlogger/pmlogger_daily.sh: fix skipping for current pmlogger files
Small logic error was _not_ skipping the .meta file for a current
pmlogger.
Also fixed a related issue that prevented compressing other .meta
files under some circumstances.
commit db932dbff9528df88790e262dcdea692510a8041
Author: Ken McDonell <kenj.au>
Date: Wed Jun 6 10:58:19 2018 +1000
src/pmlogger/pmlogger_daily.sh: compress .meta files
Use the same -x and -X controls (or their environment variable
equivalents) as for the data volumes.
Small man page update also.
This is a preliminary checkin ... it is mostly working but I have
a couple of corner-cases that need some refinement (especially to
NOT compress the .meta file for the current pmlogger when -x 0
and/or -K is used), but I want to give this some air time and clear
the commit backlog.
commit a7fa9dc5bf761440e0c7bdcac422d8c7dd4b439f
Author: Ken McDonell <kenj.au>
Date: Thu May 10 10:47:45 2018 +1000
src/pmlogger/pmlogger_daily.sh: rework date-and-timestamp and .prev handling
Root cause of qa/623 failures is now understood, namely
pmdate ... >pmlogger_daily.stamp
when pmlogger_daily.stamp exists and is not writeable (as a
result of earlier QA activity).
1. revert changes in pmlogger_daily.sh from these commits:
7515978 qa: skip pmlogger_daily -p without write permissions
0967679 qa: fix hang in test 623 related to mv overwriting
2. save prior pmlogger_daily.stamp with _save_prev_file()
3. change from I/O redirection to rm (thanks to _save_prev_file())
and cp ... avoids permissions problem and if any of this does
not work then emit warnings and fix it up if possible
4. also send errors and warnings before exec(1) to stderr, which
makes it easier to catch 'em in QA scripts
commit e38d422edf41a8214a3644dd726f6ea59c12137f
Author: Ken McDonell <kenj.au>
Date: Sat May 5 16:02:43 2018 +1000
pmlogger_daily: add -p option for polling
and crontab glue to call this at 30mins past the hour every hour.
The intent here is that if the daily processing does not happen
for some reason at 00:10 (or whatever time the local crontab entry
may have been changed to), then we'd like to do the daily processing
as soon as possible ... if it has already been done, pmlogger_daily -p
exits.
commit fc583a681407404ab877cbd21e10d3d132fdc0d8
Author: Ken McDonell <kenj.au>
Date: Fri May 4 07:49:39 2018 +1000
pmlogger_daily.sh: fix problem with compression program check
$COMPRESS may contain arguments after the command name, so strip
args before checking with which(1).
Also, with -N do _not_ hide output using exec and i/o redirection.
commit d675fa61dc07446d0e8292073e82a3b058ce9c47
Author: Mark Goodwin <mgoodwin>
Date: Thu Apr 26 14:16:50 2018 +1000
pmlogger_daily: fortify COMPRESS_DEFAULT with xz and options
Early versions of xz (e.g. on RHEL6) do not support xz --block-size.
Check for this and use xz with whatever options are available, overridden
by $PCP_COMPRESS if set. If xz is not installed and $PCP_COMPRESS is not
set, the code around line 1129 falls back to no compression.
This patch does not add any additional packaging dependencies on xz.
If it's not installed or not available on a particular platform, the
pmlogger_daily script will not do any compression.
QA group "logutil" passes (on F27 .. will also check on RHEL6)
commit fdfa25c6ef1d26f665c785ceacb7320476b45c41
Author: Mark Goodwin <mgoodwin>
Date: Tue Apr 10 19:28:12 2018 +1000
pmlogger_daily: set default xz compression options
The default pmlogger_daily xz compression options compress the
merged daily log volumes as a single block, using xz -6, producing
one block (whole file) compressed output. The compression ratio for
PCP data volumes is approximately 90%, which is extremely good.
The -X, -Y, -x and -K options and several environment variables change
the defaults, as documented in pmlogger_daily(1) and xz(1).
In the pmlogger_daily script, this patch changes the default (in the
absence of any other relevant command line or environment variables)
to be xz -0 --block-size=10MiB. The xz -0 option runs substantially faster
and uses less memory than the default (xz -6), and the --block-size option
splits the input into 10MB blocks, thus allowing random seek (with 10MB
granulatrity of the input file) in the compressed output. You can examine
the compressed block offsets with xz --list -v FILE.xz (this offset table
is stored in the compressed output header and is accessible via the liblzma
API).
Splitting the input into 10MB blocks can be exploited by the transparent
decompression code in libpcp, e.g. - it can use 10MB blocks for the LRU
block cache. The -0 option reduces system resource overheads with only
minimal reduction of compression ratio compared to -6. The changes to the
defaults introduced by this patch are backward compatible with the existing
transparent decompression code in libpcp.
modified: src/pmlogger/pmlogger_daily.sh
commit cc7ff79bae5f04f0e49fba5d611463c08a7b11fb
Author: Ken McDonell <kenj.au>
Date: Fri Mar 16 19:29:54 2018 +1100
pmlogger_daily: change default behaviour for compression
If pmlogconfig -L reports transparent_decompress=true then we will
compress as soon as possible by default ... this is the same as -x 0
or PCP_COMPRESSAFTER=0. The rationale is that in this environment
we can do on-the-fly decompression, so there is no real reason to
delay compression.
Otherwise compression is never done by default ... this is the
same as -x never or -x forever or PCP_COMPRESSAFTER=never or
PCP_COMPRESSAFTER=forever and matches the previous default for
pmlogger_default, although we shipped crontabs with -x 3 which changed
the effective default. The -x 3 in the crontabs was removed in a
previous round of commits.
All the PCP_COMPRESSAFTER settings in the control files have been
turned into comments (so users can easily over-ride the defaults
if required).
commit 1181fb2ca9c702f090822bd397c5320ecf787571
Author: Ken McDonell <kenj.au>
Date: Wed Mar 14 07:05:59 2018 +1100
pmlogger_daily: more timely compression changes
There are a bunch of changes here:
- PCP_COMPRESS=foo is an alternate for -X foo
- PCP_COMPRESSAFTER=N is an alternate for -x N
- PCP_COMPRESSREGEX=pat is an alternate for -Y pat
These allow the compression options to be embedded in the pmlogger
control file.
If both the environment variable and the command line argument are
specified for the same option, and they have differrent values,
a warning is emitted and the environment variable "wins".
If PCP_COMPRESSAFTER=0 (or -x 0), then we try to compress all archive
volumes that are not currently being written.
A new -K option (in conjunction with PCP_COMPRESSAFTER=0) does _just_
the compression tasks, and so may be called repeatedly before the
scheduled daily execution of pmlogger_daily.
Extra checks are included to not run pmnewlog if the current pmlogger
process cannot be identified (otherwise we don't know which one to
stop and restart).
Extra checks are included to not compress if the current pmlogger
process, current archive basename and current volume cannot be
identified (otherwise we risk compressing the current volume from
underneath pmlogger).
A new -f option by-passes the extra checks (for QA use only, not
mentioned in the man page).
Safer handling of the pmlogger_daily.prev file.
Some tidying of error and warning messages to improve consistency
and clarity, remove clutter and mention _which_ control file when
context is reported.
man page updates for all of the above.
commit b68c06873bced0d1600e634cadafbef08b304827
Author: Ken McDonell <kenj.net>
Date: Fri Jun 9 06:47:01 2017 +1000
pmlogger_check and pmlogger_daily: small changes
Fix a logic problem reported by Martins where the creation of
a missing directory incorrectly triggered a non-zero exit status.
Also reported by Martins, for pmlogger_daily clean up the reporting
of warnings so:
(a) the formatting is like pmlogger_check, and
(b) if there is a warning, but processing continues, the misleading
"logging for host "..." unchanged" message is not issued.
commit fb56481639ff5b73792a20d34dff3ac4e191a907
Author: Lukas Berk <lberk>
Date: Thu Apr 6 15:36:18 2017 -0400
RHBZ: 1381301 restore context to pcp_var_run_t after pmcd start
pmcd makes /var/run/pcp on the fly, which, gives /var/run/pcp
var_run_t context (despite the default policy being pcp_var_rum_t).
If the command exists, just run restorecon on the directory after we
make it.
commit eaeeffd045546a3b7e4554e1f2ac7120f87dc715
Author: Ken McDonell <kenj.net>
Date: Mon Feb 13 07:16:29 2017 +1100
src/pmlogger/pmlogger_daily.sh & qa/925(new): created output directory hierarchy
Replicate logic from pmlogger_check to create the full directory
hierarchy for PCP archives (if it does not already exist) with the
correct modes and ownership.
This addresses the issue of pmlogger_daily possibly running before
pmlogger_check as reported by Martins Innus <minnus>.
commit ba1355032bfdc91bfabc39d3a2255015fa92ccb9
Author: Mark Goodwin <mgoodwin>
Date: Thu Sep 15 12:05:15 2016 +1000
cron scripts: ignore *.rpmsave and *.rpmnew in control.d dirs
On RPM based distros, if the control files have been modified,
an upgrade will create rpmsave/new files, and the PCP cron scripts
will parse them, leaving scary looking messages in the logs.
So we ignore these files but issue a warning for admins to catch.
Fixes RH BZ #1375415
Despite all of the above changes, there are still probably a few lingering cases where the scripts may abort early and not discard or compress correctly. These are the "continue rather than break" parts of the script that have been identified as potential improvements. I also think Nathan's suggestion to move aside any corrupted archives would be a worth-while improvement to make - such archives would then no longer be re-processed in subsequent runs (and would require manual recovery if the data is subsequently considered important enough).
-- Mark
Commit 6599dfa9 fixes this I believe. I opted NOT to implement the "defer" option ... it only complicates the eventual cleanup and would mean the "you have a bad archive" error would appear in pmlogger_daily.log only once (the first time it was encountered), rather than every day until the bad archive is fixed or goes away. The reprocessing is not a big deal, provided the later archives are processed correctly. Note any bad archives will eventually be culled as per the normal workflow for good archives. |