RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1697182 - pmlogger_daily is unable to discard or compress older files which failed compression for some reason
Summary: pmlogger_daily is unable to discard or compress older files which failed comp...
Keywords:
Status: CLOSED DUPLICATE of bug 1647308
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcp
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: ---
Assignee: Mark Goodwin
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-08 03:40 UTC by nikhil kshirsagar
Modified: 2019-05-09 22:42 UTC (History)
6 users (show)

Fixed In Version: pcp-4.3.2-1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-09 08:09:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description nikhil kshirsagar 2019-04-08 03:40:30 UTC
Description of problem:

ls output here of pcp logs in /var - http://pastebin.test.redhat.com/747934
We see older files left behind, much older than the default discard time of 14 days.

The pmlogger_daily.log does show this error,

[root@nkshirsa sosreport-NYEShpnpc001-20190328111119]# cat var/log/pcp/pmlogger/pmlogger_daily.log
pmlogger_daily: Warning: output archive (20180619) already exists
[/etc/pcp/pmlogger/control.d/local:3] ... skip log merging, culling and compressing for host "local:"
[root@nkshirsa sosreport-NYEShpnpc001-20190328111119]# 

The error seems to come from,

                do
                    if [ -f $outfile.0 -o -f $outfile.index -o -f $outfile.meta ]
                    then
                        _skipping "output archive ($outfile) already exists"
                        break
                    else

_skipping()
{
    echo "$prog: Warning: $@"
    echo "[$controlfile:$line] ... skip log merging, culling and compressing for host \"$host\""
    touch $tmp/skip
}

So it seems for whatever reason, if some older files failed compression, pmlogger_daily aborts out and doesnt compress even newer valid files.
So we need to make changes to this script to skip the affected file and continue with the compression/cull of other files, instead of error-ring out due to one file and skipping every file.

Version-Release number of selected component (if applicable):
rhel 7.4

Comment 2 Mark Goodwin 2019-04-15 08:51:49 UTC
After checking the sosreport, this customer is running RHEL74 with pcp-3.11.3-4.el7. That is a very old version of PCP, over three years old. The log management scripts have been considerably improved since then, especially with respect to early termination and skipping log discard and/or compression.

pcp-3.11.3-1 was released 17 June 2016. Since then, PCP in RHEL74 has had 4 patches as follows:

rhbz1211432.patch - change pmlogger default interval from 60s down to 10s

rhbz1419490.patch - [Pegas1.0 FEAT] Enable Performance Co-Pilot tool (pcp) to collect POWER9 NEST PMU perf metrics

rhbz1425880.patch - pcp-pmda-nutcracker has missing dependency

rhbz1432086.patch - errors appear during the installation of pcp-selinux

The current upstream version of the pmlogger_daily script (pcp-4.3.2, which is due to release in about two weeks) has the following important updates compared to pcp-3.11.3-4, see below. I have omitted minor changes and updates that are not relevant.  The most critically important updates are :

commit ccafcd170d31e0ee56d7c9a3841ee9714457607a
Author: Ken McDonell <kenj.au>
Date:   Sun Oct 14 15:51:46 2018 +1100

    pmlogger_daily: change workflow to cull early
    
    Change the workflow to move culling of old files earlier in the
    pipeline.  So if there is a problem with log rewriting or merging that
    is unattended for a log time, we'll eventually cull the offending
    archives rather than having the rewriting or mering failure block
    the culling and (in extreme cases) lead to full flesystems
    
    Also add qa/686 (new) to check all of this out.

commit 1c48b97bbfc25a77947b58957db8afe24c6504de
Author: Ken McDonell <kenj.au>
Date:   Wed Jun 6 12:11:21 2018 +1000

    src/pmlogger/pmlogger_daily.sh: fix skipping for current pmlogger files
    
    Small logic error was _not_ skipping the .meta file for a current
    pmlogger.
    
    Also fixed a related issue that prevented compressing other .meta
    files under some circumstances.

commit db932dbff9528df88790e262dcdea692510a8041
Author: Ken McDonell <kenj.au>
Date:   Wed Jun 6 10:58:19 2018 +1000

    src/pmlogger/pmlogger_daily.sh: compress .meta files
    
    Use the same -x and -X controls (or their environment variable
    equivalents) as for the data volumes.
    
    Small man page update also.
    
    This is a preliminary checkin ... it is mostly working but I have
    a couple of corner-cases that need some refinement (especially to
    NOT compress the .meta file for the current pmlogger when -x 0
    and/or -K is used), but I want to give this some air time and clear
    the commit backlog.

commit a7fa9dc5bf761440e0c7bdcac422d8c7dd4b439f
Author: Ken McDonell <kenj.au>
Date:   Thu May 10 10:47:45 2018 +1000

    src/pmlogger/pmlogger_daily.sh: rework date-and-timestamp and .prev handling
    
    Root cause of qa/623 failures is now understood, namely
    pmdate ... >pmlogger_daily.stamp
    when pmlogger_daily.stamp exists and is not writeable (as a
    result of earlier QA activity).
    
    1. revert changes in pmlogger_daily.sh from these commits:
       7515978 qa: skip pmlogger_daily -p without write permissions
       0967679 qa: fix hang in test 623 related to mv overwriting
    2. save prior pmlogger_daily.stamp with _save_prev_file()
    3. change from I/O redirection to rm (thanks to _save_prev_file())
       and cp ... avoids permissions problem and if any of this does
       not work then emit warnings and fix it up if possible
    4. also send errors and warnings before exec(1) to stderr, which
       makes it easier to catch 'em in QA scripts

commit e38d422edf41a8214a3644dd726f6ea59c12137f
Author: Ken McDonell <kenj.au>
Date:   Sat May 5 16:02:43 2018 +1000

    pmlogger_daily: add -p option for polling
    
    and crontab glue to call this at 30mins past the hour every hour.
    
    The intent here is that if the daily processing does not happen
    for some reason at 00:10 (or whatever time the local crontab entry
    may have been changed to), then we'd like to do the daily processing
    as soon as possible ... if it has already been done, pmlogger_daily -p
    exits.

commit fc583a681407404ab877cbd21e10d3d132fdc0d8
Author: Ken McDonell <kenj.au>
Date:   Fri May 4 07:49:39 2018 +1000

    pmlogger_daily.sh: fix problem with compression program check
    
    $COMPRESS may contain arguments after the command name, so strip
    args before checking with which(1).
    
    Also, with -N do _not_ hide output using exec and i/o redirection.

commit d675fa61dc07446d0e8292073e82a3b058ce9c47
Author: Mark Goodwin <mgoodwin>
Date:   Thu Apr 26 14:16:50 2018 +1000

    pmlogger_daily: fortify COMPRESS_DEFAULT with xz and options
    
    Early versions of xz (e.g. on RHEL6) do not support xz --block-size.
    Check for this and use xz with whatever options are available, overridden
    by $PCP_COMPRESS if set. If xz is not installed and $PCP_COMPRESS is not
    set, the code around line 1129 falls back to no compression.
    
    This patch does not add any additional packaging dependencies on xz.
    If it's not installed or not available on a particular platform, the
    pmlogger_daily script will not do any compression.
    
    QA group "logutil" passes (on F27 .. will also check on RHEL6)

commit fdfa25c6ef1d26f665c785ceacb7320476b45c41
Author: Mark Goodwin <mgoodwin>
Date:   Tue Apr 10 19:28:12 2018 +1000

    pmlogger_daily: set default xz compression options
    
    The default pmlogger_daily xz compression options compress the
    merged daily log volumes as a single block, using xz -6, producing
    one block (whole file) compressed output. The compression ratio for
    PCP data volumes is approximately 90%, which is extremely good.
    The -X, -Y, -x and -K options and several environment variables change
    the defaults, as documented in pmlogger_daily(1) and xz(1).
    
    In the pmlogger_daily script, this patch changes the default (in the
    absence of any other relevant command line or environment variables)
    to be xz -0 --block-size=10MiB. The xz -0 option runs substantially faster
    and uses less memory than the default (xz -6), and the --block-size option
    splits the input into 10MB blocks, thus allowing random seek (with 10MB
    granulatrity of the input file) in the compressed output. You can examine
    the compressed block offsets with xz --list -v FILE.xz (this offset table
    is stored in the compressed output header and is accessible via the liblzma
    API).
    
    Splitting the input into 10MB blocks can be exploited by the transparent
    decompression code in libpcp, e.g. - it can use 10MB blocks for the LRU
    block cache. The -0 option reduces system resource overheads with only
    minimal reduction of compression ratio compared to -6. The changes to the
    defaults introduced by this patch are backward compatible with the existing
    transparent decompression code in libpcp.
    
    modified:   src/pmlogger/pmlogger_daily.sh

commit cc7ff79bae5f04f0e49fba5d611463c08a7b11fb
Author: Ken McDonell <kenj.au>
Date:   Fri Mar 16 19:29:54 2018 +1100

    pmlogger_daily: change default behaviour for compression
    
    If pmlogconfig -L reports transparent_decompress=true then we will
    compress as soon as possible by default ... this is the same as -x 0
    or PCP_COMPRESSAFTER=0.  The rationale is that in this environment
    we can do on-the-fly decompression, so there is no real reason to
    delay compression.
    
    Otherwise compression is never done by default ... this is the
    same as -x never or -x forever or PCP_COMPRESSAFTER=never or
    PCP_COMPRESSAFTER=forever and matches the previous default for
    pmlogger_default, although we shipped crontabs with -x 3 which changed
    the effective default.  The -x 3 in the crontabs was removed in a
    previous round of commits.
    
    All the PCP_COMPRESSAFTER settings in the control files have been
    turned into comments (so users can easily over-ride the defaults
    if required).

commit 1181fb2ca9c702f090822bd397c5320ecf787571
Author: Ken McDonell <kenj.au>
Date:   Wed Mar 14 07:05:59 2018 +1100

    pmlogger_daily: more timely compression changes
    
    There are a bunch of changes here:
    
    - PCP_COMPRESS=foo is an alternate for -X foo
    - PCP_COMPRESSAFTER=N is an alternate for -x N
    - PCP_COMPRESSREGEX=pat is an alternate for -Y pat
    
    These allow the compression options to be embedded in the pmlogger
    control file.
    
    If both the environment variable and the command line argument are
    specified for the same option, and they have differrent values,
    a warning is emitted and the environment variable "wins".
    
    If PCP_COMPRESSAFTER=0 (or -x 0), then we try to compress all archive
    volumes that are not currently being written.
    
    A new -K option (in conjunction with PCP_COMPRESSAFTER=0) does _just_
    the compression tasks, and so may be called repeatedly before the
    scheduled daily execution of pmlogger_daily.
    
    Extra checks are included to not run pmnewlog if the current pmlogger
    process cannot be identified (otherwise we don't know which one to
    stop and restart).
    
    Extra checks are included to not compress if the current pmlogger
    process, current archive basename and current volume cannot be
    identified (otherwise we risk compressing the current volume from
    underneath pmlogger).
    
    A new -f option by-passes the extra checks (for QA use only, not
    mentioned in the man page).
    
    Safer handling of the pmlogger_daily.prev file.
    
    Some tidying of error and warning messages to improve consistency
    and clarity, remove clutter and mention _which_ control file when
    context is reported.
    
    man page updates for all of the above.

commit b68c06873bced0d1600e634cadafbef08b304827
Author: Ken McDonell <kenj.net>
Date:   Fri Jun 9 06:47:01 2017 +1000

    pmlogger_check and pmlogger_daily: small changes
    
    Fix a logic problem reported by Martins where the creation of
    a missing directory incorrectly triggered a non-zero exit status.
    
    Also reported by Martins, for pmlogger_daily clean up the reporting
    of warnings so:
    (a) the formatting is like pmlogger_check, and
    (b) if there is a warning, but processing continues, the misleading
        "logging for host "..." unchanged" message is not issued.

commit fb56481639ff5b73792a20d34dff3ac4e191a907
Author: Lukas Berk <lberk>
Date:   Thu Apr 6 15:36:18 2017 -0400

    RHBZ: 1381301 restore context to pcp_var_run_t after pmcd start
    
    pmcd makes /var/run/pcp on the fly, which, gives /var/run/pcp
    var_run_t context (despite the default policy being pcp_var_rum_t).
    If the command exists, just run restorecon on the directory after we
    make it.

commit eaeeffd045546a3b7e4554e1f2ac7120f87dc715
Author: Ken McDonell <kenj.net>
Date:   Mon Feb 13 07:16:29 2017 +1100

    src/pmlogger/pmlogger_daily.sh & qa/925(new): created output directory hierarchy
    
    Replicate logic from pmlogger_check to create the full directory
    hierarchy for PCP archives (if it does not already exist) with the
    correct modes and ownership.
    
    This addresses the issue of pmlogger_daily possibly running before
    pmlogger_check as reported by Martins Innus <minnus>.

commit ba1355032bfdc91bfabc39d3a2255015fa92ccb9
Author: Mark Goodwin <mgoodwin>
Date:   Thu Sep 15 12:05:15 2016 +1000

    cron scripts: ignore *.rpmsave and *.rpmnew in control.d dirs
    
    On RPM based distros, if the control files have been modified,
    an upgrade will create rpmsave/new files, and the PCP cron scripts
    will parse them, leaving scary looking  messages in the logs.
    So we ignore these files but issue a warning for admins to catch.
    
    Fixes RH BZ #1375415


Despite all of the above changes, there are still probably a few lingering cases where the scripts may abort early and not discard or compress correctly. These are the "continue rather than break" parts of the script that have been identified as potential improvements. I also think Nathan's suggestion to move aside any corrupted archives would be a worth-while improvement to make - such archives would then no longer be re-processed in subsequent runs (and would require manual recovery if the data is subsequently considered important enough).

-- Mark

Comment 4 Ken McDonell 2019-04-23 00:11:49 UTC
Commit 6599dfa9 fixes this I believe.
I opted NOT to implement the "defer" option ... it only complicates the eventual cleanup and would mean the "you have a bad archive" error would appear in pmlogger_daily.log only once (the first time it was encountered), rather than every day until the bad archive is fixed or goes away.  The reprocessing is not a big deal, provided the later archives are processed correctly.  Note any bad archives will eventually be culled as per the normal workflow for good archives.


Note You need to log in before you can comment on or make changes to this bug.