Bug 1639166
Summary: | pcp plugin for sosreport treats pmlogger binaries over 100MB as a file with string and corrupts it (be default) | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Johnray Fuller <jrfuller> | |
Component: | sos | Assignee: | Pavel Moravec <pmoravec> | |
Status: | CLOSED ERRATA | QA Contact: | Miroslav Hradílek <mhradile> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | 7.5 | CC: | agk, bmr, chorn, fche, gavin, jrfuller, mhradile, plambri, pmoravec, sbradley, yuokada | |
Target Milestone: | rc | Keywords: | OtherQA | |
Target Release: | --- | Flags: | chorn:
needinfo+
|
|
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | sos-3.7-2.el7 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1719884 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 13:15:33 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1594286, 1719884 |
Description
Johnray Fuller
2018-10-15 08:09:46 UTC
I am setting this to high, GSS is relying on useful data being captured. This will likely increase with 7.6GA when pcp-zeroconf already sets up pcp. This is intentional behaviour - sosreport should have some upper limit to any logfile it collects. The limit for pcp files is increased from default to 100M - the value you noticed. We are open to increasing this default value for PCP log size (see https://github.com/sosreport/sos/blob/master/sos/plugins/pcp.py#L29), but it would be dangerous to allow sosreport to collect arbitrary huge files by default. There are few ways you can change this default behaviour: sosreport -k pcp.pcplogs=1000000000 # or whatever high value sosreport -k pcp.pcplogs=None # not straighforward, but it sets no limit for PCP logs sosreport --all-logs # no limit to any logfile and some more logfiles to be collected - in general Further, we can stop collecting the "tail of binary" files since they are useless (it is enough to add "tailit=False" option to https://github.com/sosreport/sos/blob/master/sos/plugins/pcp.py#L122 ). The only drawback will be confusing deduction "sosreport didnt collect the pcp logfile, so there is no such file on the system" (while the file *is* there..) Let me know if these cmdline options are sufficient, or if the default PCP logsize should be modified, or if some other change like tailit=False is required. FYI I created KCS 3655171 for this. pmoravec++ for creating the kbase. (In reply to Pavel Moravec from comment #3) > This is intentional behaviour [..] I think we did not intend to make the sosreport bigger with data which is not usable. > sosreport -k pcp.pcplogs=1000000000 # or whatever high value > sosreport -k pcp.pcplogs=None # not straighforward, but it sets no > limit for PCP logs > sosreport --all-logs # no limit to any logfile and some > more logfiles to be collected - in general Thanks, but these require of course that one has hit the issue or is aware of it before running sosreport. > Further, we can stop collecting the "tail of binary" files since they are > useless (it is enough to add "tailit=False" option to > https://github.com/sosreport/sos/blob/master/sos/plugins/pcp.py#L122 ). The > only drawback will be confusing deduction "sosreport didnt collect the pcp > logfile, so there is no such file on the system" (while the file *is* > there..) Creating an empty file 'archive-file-not-collected-because-of-size' seems better than current behaviour, maybe containing details on the real size of the (not captured) archive file(s). I looked at pmlogextract, - it's part of package 'pcp', so no new dependency - it can apparently shrink archive files - but has no 'last 100MB please', but one has instead to use timeframes as options. Looks like a valid option? (In reply to Christian Horn from comment #4) > > Further, we can stop collecting the "tail of binary" files since they are > > useless (it is enough to add "tailit=False" option to > > https://github.com/sosreport/sos/blob/master/sos/plugins/pcp.py#L122 ). The > > only drawback will be confusing deduction "sosreport didnt collect the pcp > > logfile, so there is no such file on the system" (while the file *is* > > there..) > Creating an empty file 'archive-file-not-collected-because-of-size' seems > better than current behaviour, maybe containing details on the real size of > the (not captured) archive file(s). Creating empty or small file would be bit tricky, I can offer either current behaviour or no file created and sos_logs/sos.log have new logs like: - "reached limit when processing file XXX, older files for mask YYY skipped" - "tailing file ZZZ skipped because (either it is archive or a plugin doesnt want to tail it)" > > I looked at pmlogextract, > - it's part of package 'pcp', so no new dependency > - it can apparently shrink archive files > - but has no 'last 100MB please', but one has instead to use timeframes as > options. > Looks like a valid option? I am open to any change that will be seen as improvement from user point of view (and won't break sos (too much) :)). As I dont know pcp much, could you suggest - what command(s) with what particular parameters we should collect - what logfiles would become obsolete to collect due to that Also some factors to consider: - how much (extra) time it would take to extract logs via pmlogextract or possibly other tool, compared to grabbing binary data? I dont object against this option, just want to assess possible issues. - also command outputs have size limits and just tails of the cmd output is collected in case limit is reached - what limit would be useful? (we can play with idea of no limit and time-range limit via pmlogextract arguments, but what if someone logs excessive amount of data every second?) I had a chat with the bug reporter, we have not settled with something so far to resolve this. Can you tell us how much data would be reasonable to store in the sosreport? Our best guess of 'how much' customers will be logging is what the pcp-zeroconf package is by default configuring. We were just looking at rhel7.5 laptop, it created 1.1GB of logs for ~13hours. This can be compressed to 10MB though, with xz. So far sosreport tries to cut off the file after 100MB, assuming it's an uncompressed textfile. I would like to also bring this up with the pcp team guys, but before that one question: how much space (compressed) would feel ok to grab here by default? If for example 50mb compressed would be a sane value, then we could from the other side try to come up with a mechanism to somehow reliably say which files to collect (or use commands to extract) that data. I don't have strong opinion on the default size of PCP data to be collected - it is rather question what is practical to have. The higher the limit is, the longer sosreport will collect and compress the data and the bigger the archive will be (less practical for copying / unpacking / keeping on disk). And the more mess data is collected when something gest broken (i.e. when pcp logs are corrupted, you collect more of trash data for bigger limit). The higher the limit is, the less probability of hitting this situation. So the particular default value should rather come from the users of sosreport, what they see as practical. (where I understand you might dont have eough experience so far - it is fine to collect it for some time till say devel freeze of 7.7). Bryn, as sos upstream maintainer, does this follow your point of view? Asked for input in upstream: https://groups.io/g/pcp/message/19995?p=,,,20,0,0,0::Created,,,20,2,0,27802916,d=2&d=2 Ok, tried some things, discussed with the reporter. Pavel, could the following be implemented? - List the last 12 modified files in /var/log/pcp/pmlogger/<hostname>/, i.e. ls -rtl /var/log/pcp/pmlogger/<hostname>/|tail -n 12 - Then take the files from that selection, and put them into the sosreport temp directory, for later tar file generation and compression. I think that will do following: - it will prevent tailed files, which we can not use anyway - this should give us 2 days of logging. - We are not doing extra checks for size, but the these logs can be compressed excellently. - I considered a mechanism like this: - take last 12 modified files in /var/log/pcp/pmlogger/<hostname>/ - if >3GB, just take the last 6 files (so effectively just pcp data from last day) But that is harder to implement, and we anyway have module pcp not activated by default, so lets just go for 12 files. We can adjust this later if required. We should make a note for beta testing that people should run 'sosreport -e pcp' with pcp-zeroconf installed. I raised https://github.com/sosreport/sos/pull/1496 for the change: - plugin option `pcplogs` renamed to `pmmgrlogs` and applies to size of pmmgr logfiles only (should the limit be still 100MB?). - new plugin option `pmloggerfiles` specifies number of newest pmlogger files to collect, defaults to 12. - pmmgr logs collected up to 100MB - 12 newest pmlogger files collected regardless of size The 12 newest files - do you count that 2 of them will be pmlogger.log and pmlogger.log.prior ? I.e. are 10 *.xz files the requested count to be collected? (no response means you are ok with it) Thanks Pavel, (In reply to Pavel Moravec from comment #10) > - plugin option `pcplogs` renamed to `pmmgrlogs` and applies to size of > pmmgr logfiles only (should the limit be still 100MB?). I am not sure if the "100mb" here means - the decision to include a file or not - or truncating after 100MB. We should either collect a file completely or nothing, truncated files are not usable. > - new plugin option `pmloggerfiles` specifies number of newest pmlogger > files to collect, defaults to 12. > - pmmgr logs collected up to 100MB We have 2 kinds of archive files there: a) uncompressed, most recent archives. Likely to be big. b) compressed, a bit older archives I think that limiting to 100MB will in most cases not grab the most recent file, which is most likely to have valuable data. I think that limit should be 7GB: that would also grab the uncompressed files. The size would not hurt us later as sosreport compresses (it might though hurt if temporary copies are created?). > - 12 newest pmlogger files collected regardless of size > > The 12 newest files - do you count that 2 of them will be pmlogger.log and > pmlogger.log.prior ? I.e. are 10 *.xz files the requested count to be > collected? Just taking the 12 files (including pmlogger.log/.prior) should be ok. (In reply to Christian Horn from comment #11) > Thanks Pavel, > > (In reply to Pavel Moravec from comment #10) > > - plugin option `pcplogs` renamed to `pmmgrlogs` and applies to size of > > pmmgr logfiles only (should the limit be still 100MB?). > I am not sure if the "100mb" here means > - the decision to include a file or not > - or truncating after 100MB. > We should either collect a file completely or nothing, truncated files are > not usable. What suffixes can occur in that directory? Current PR will collect up to 100MB of files from newest to older ones. If collecting the next file would exceed the 100MB, the next file will be currently collected&tailed - until it ends on '.gz', '.xz', '.bz', '.bz2'. Anyway I will update the PR to prevent collecting+tailing the "threshold file" in any case. > > > - new plugin option `pmloggerfiles` specifies number of newest pmlogger > > files to collect, defaults to 12. > > - pmmgr logs collected up to 100MB > We have 2 kinds of archive files there: > a) uncompressed, most recent archives. Likely to be big. > b) compressed, a bit older archives > I think that limiting to 100MB will in most cases not grab the most recent > file, which is most likely to have valuable data. > I think that limit should be 7GB: that would also grab the uncompressed > files. The size would not hurt us later as sosreport compresses (it might > though hurt if temporary copies are created?). You mean ${PCP_LOG_DIR}/pmmgr directory can be so huge and we should collect 7GB of logs (from newest to older ones, no truncate in either case) - instead of current limit 100MB? This is a new requirement, so far we changed ${PCP_LOG_DIR}/pmlogger from 100MB to "12 newest files regardless of size)". It is ok to implement a new request, but wont the huge size cause more harm than benefits, like: - increase time of generating sosreport (to copy&compress those data) - if disk space under /var/tmp is not sufficiently large, sosreport can fail completely (the sizelimit is configured via option pcplogs (now) / pmmgrlogs (proposed change) that has default vale 100MB) Is this "pmmgr default logsize to increase from 100MB to 7GB" request still reasonable? Raising needinfo on original reporter (but anyone relevant can respond), I just want to hear an opinion from somebody else if this requirement has broader support. (In reply to Pavel Moravec from comment #12) > (In reply to Christian Horn from comment #11) > > Thanks Pavel, > > > > (In reply to Pavel Moravec from comment #10) > > > - plugin option `pcplogs` renamed to `pmmgrlogs` and applies to size of > > > pmmgr logfiles only (should the limit be still 100MB?). > > I am not sure if the "100mb" here means > > - the decision to include a file or not > > - or truncating after 100MB. > > We should either collect a file completely or nothing, truncated files are > > not usable. > > What suffixes can occur in that directory? > > Current PR will collect up to 100MB of files from newest to older ones. If > collecting the next file would exceed the 100MB, the next file will be > currently collected&tailed - until it ends on '.gz', '.xz', '.bz', '.bz2'. > > Anyway I will update the PR to prevent collecting+tailing the "threshold > file" in any case. > > > > > > > - new plugin option `pmloggerfiles` specifies number of newest pmlogger > > > files to collect, defaults to 12. > > > - pmmgr logs collected up to 100MB > > We have 2 kinds of archive files there: > > a) uncompressed, most recent archives. Likely to be big. > > b) compressed, a bit older archives > > I think that limiting to 100MB will in most cases not grab the most recent > > file, which is most likely to have valuable data. > > I think that limit should be 7GB: that would also grab the uncompressed > > files. The size would not hurt us later as sosreport compresses (it might > > though hurt if temporary copies are created?). > > You mean ${PCP_LOG_DIR}/pmmgr directory can be so huge and we should collect > 7GB of logs (from newest to older ones, no truncate in either case) - > instead of current limit 100MB? Correction: ${PCP_LOG_DIR}/pmmgr/$(hostname) directory. > > This is a new requirement, so far we changed ${PCP_LOG_DIR}/pmlogger from > 100MB to "12 newest files regardless of size)". It is ok to implement a new > request, but wont the huge size cause more harm than benefits, like: > - increase time of generating sosreport (to copy&compress those data) > - if disk space under /var/tmp is not sufficiently large, sosreport can fail > completely > (the sizelimit is configured via option pcplogs (now) / pmmgrlogs (proposed > change) that has default vale 100MB) > > > Is this "pmmgr default logsize to increase from 100MB to 7GB" request still > reasonable? Again, ${PCP_LOG_DIR}/pmmgr/$(hostname) directory is meant here. (In reply to Pavel Moravec from comment #12) > (In reply to Christian Horn from comment #11) > > (In reply to Pavel Moravec from comment #10) > > > - plugin option `pcplogs` renamed to `pmmgrlogs` and applies to size of > > > pmmgr logfiles only (should the limit be still 100MB?). > > I am not sure if the "100mb" here means > > - the decision to include a file or not > > - or truncating after 100MB. > > We should either collect a file completely or nothing, truncated files are > > not usable. Uh me--, I did not notice this is about pmmgr. All of that related to pmlogger files. > What suffixes can occur in that directory? I am not sure, have not looked into that component in detail. My comment #11 should be ignored. > This is a new requirement, so far we changed ${PCP_LOG_DIR}/pmlogger from > 100MB to "12 newest files regardless of size)". It is ok to implement a new > request, but wont the huge size cause more harm than benefits, like: > - increase time of generating sosreport (to copy&compress those data) > - if disk space under /var/tmp is not sufficiently large, sosreport can fail > completely > (the sizelimit is configured via option pcplogs (now) / pmmgrlogs (proposed > change) that has default vale 100MB) If this is "newest 100MB", it will include the uncompressed latest file. - The latest (uncompressed) big archive files in ${PCP_LOG_DIR}/pmlogger are likely most interesting, but might be GB's in size. - I understand that copying over into a tempdir would eat much size. So one could either compress it directly into the temp-dir, or drop it completely. - The huge, uncompressed file ends with .0, the comressed ones with .xz. Example: -rw-r--r--. 1 pcp pcp 1071808 12月 10 09:30 20181210.08.37.0.xz -rw-r--r--. 1 pcp pcp 6772 12月 10 09:30 20181210.08.37.index -rw-r--r--. 1 pcp pcp 60388 12月 10 09:30 20181210.08.37.meta.xz -rw-r--r--. 1 pcp pcp 547673552 12月 10 17:13 20181210.09.30.0 -rw-r--r--. 1 pcp pcp 57772 12月 10 17:13 20181210.09.30.index -rw-r--r--. 1 pcp pcp 6878971 12月 10 17:13 20181210.09.30.meta > Is this "pmmgr default logsize to increase from 100MB to 7GB" request still > reasonable? We did not look at pmmgr. After a chat, the current changes are sufficient: - from pmlogger, collect 12 newest files of whatever size (this might cause disk space issues if the files are huge, but let try now) - from pmmgr, collect up to 100MB of logfiles and dont tail the "file just over limit" <comment #15> > After a chat, the current changes are sufficient: > - from pmlogger, collect 12 newest files of whatever size > (this might cause disk space issues if the files are huge, but let try now) Looking at sos-3.7-1.el7 . - the tail is unchanged (as expected after comment #15) - I see that the sosreport contains exactly 12 files, ok - but it still is simply tail'ing one file, while I think it should with above definition take the file untruncated: Real directory: [..older files..] -rw-r--r--. 1 pcp pcp 714768 4月 3 12:59 20190403.11.30.0.xz -rw-r--r--. 1 pcp pcp 11292 4月 3 12:59 20190403.11.30.index -rw-r--r--. 1 pcp pcp 49684 4月 3 12:58 20190403.11.30.meta.xz -rw-r--r--. 1 pcp pcp 12912 4月 4 03:02 20190404.03.02.0.xz -rw-r--r--. 1 pcp pcp 172 4月 4 03:02 20190404.03.02.index -rw-r--r--. 1 pcp pcp 42608 4月 4 05:02 20190404.03.02.meta.xz -rw-r--r--. 1 pcp pcp 97681448 4月 4 07:54 20190404.05.25.0 -rw-r--r--. 1 pcp pcp 18712 4月 4 07:54 20190404.05.25.index -rw-r--r--. 1 pcp pcp 2412223 4月 4 07:54 20190404.05.25.meta What sos-3.7-1.el7 records: -rw-r--r--. 1 pcp pcp 714768 Apr 3 12:59 20190403.11.30.0.xz -rw-r--r--. 1 pcp pcp 11292 Apr 3 12:59 20190403.11.30.index -rw-r--r--. 1 pcp pcp 49684 Apr 3 12:58 20190403.11.30.meta.xz -rw-r--r--. 1 pcp pcp 12912 Apr 4 03:02 20190404.03.02.0.xz -rw-r--r--. 1 pcp pcp 172 Apr 4 03:02 20190404.03.02.index -rw-r--r--. 1 pcp pcp 42608 Apr 4 05:02 20190404.03.02.meta.xz lrwxrwxrwx. 1 root root 84 Apr 4 07:48 20190404.05.25.0 -> ../../../../../sos_strings/pcp/var.log.pcp.pmlogger.rhel7u6a.20190404.05.25.0.tailed -rw-r--r--. 1 pcp pcp 17712 Apr 4 07:48 20190404.05.25.index -rw-r--r--. 1 pcp pcp 2271181 Apr 4 07:48 20190404.05.25.meta Pavel, any chance to get 20190404.05.25.0 untruncated? Right, in the above example from Comment #15, var.log.pcp.pmlogger.rhel7u6a.20190404.05.25.0.tailed is a corrupt file and cannot be used by PCP. Ouch that's a silly bug: https://github.com/sosreport/sos/blob/master/sos/plugins/pcp.py#L133 collects the files with sizelimit=None (so far so good). But: https://github.com/sosreport/sos/blob/master/sos/plugins/__init__.py#L764-L765 overwrites the limit to --log-size value (defaults to 25) :-/ Trivial fix: s/sizelimit=None/sizelimit=0/ Thanks for testing, marking as FailedQE, will fix in a respin in very few weeks. Hi, could you please re-test the additional fix as well? A yum repository for the build of sos-3.7-2.el7 (task 21377022) is available at: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.7/2.el7/ You can install the rpms locally by putting this .repo file in your /etc/yum.repos.d/ directory: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.7/2.el7/sos-3.7-2.el7.repo RPMs and build logs can be found in the following locations: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.7/2.el7/noarch/ The full list of available rpms is: http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.7/2.el7/noarch/sos-3.7-2.el7.src.rpm http://brew-task-repos.usersys.redhat.com/repos/official/sos/3.7/2.el7/noarch/sos-3.7-2.el7.noarch.rpm Build output will be available for the next 21 days. Hi Pavel, (In reply to Pavel Moravec from comment #23) > [..] > A yum repository for the build of sos-3.7-2.el7 (task 21377022) is available > at: [..] Looks good to me. The 12 last files seem to be taken from /var/log/pcp/pmlogger/<hostname>, and these files are not changed when being taken over into the sosreport. This will make the data useful for us. Thanks! Chris Thanks for feedback, clearing needinfo on me (nthing needed from me, just QE shall mark the BZ as verified now) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:2295 |