Bug 1304394

Summary: [RFE] add options to sosreport to limit time range of collected logs
Product: Red Hat Enterprise Linux 7 Reporter: Yaniv Lavi <ylavi>
Component: sosAssignee: Pavel Moravec <pmoravec>
Status: CLOSED ERRATA QA Contact: Miroslav HradĂ­lek <mhradile>
Severity: medium Docs Contact: Michal Stubna <mstubna>
Priority: medium    
Version: 7.4CC: agk, bmr, cww, dfediuck, djasa, jjansky, mhradile, pdwyer, plambri, pmoravec, pstehlik, Rhev-m-bugs, sbonazzo, sbradley, srevivo
Target Milestone: rcKeywords: FutureFeature, Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://github.com/sosreport/sos/issues/284
Whiteboard:
Fixed In Version: sos-3.8-6.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1020790
: 1789049 (view as bug list) Environment:
Last Closed: 2020-03-31 20:04:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1594286, 1648022, 1789049    

Description Yaniv Lavi 2016-02-03 13:32:17 UTC
+++ This bug was initially created as a clone of Bug #1020790 +++

Description of problem:
sometimes it makes sense to limit the time range for logs so that large amounts of irrelevant info are omitted from sosreport archive.

Comment 2 Bryn M. Reeves 2016-02-03 14:09:55 UTC
This is not a simple problem since end-users may not use the default syslog formatting options.

For now we offer the ability to limit the size of text log by size - this will collect the most recent log entries up to the specified limit.

With journald this is much simpler and more reliable since timestamps are stored in an unambiguous form.

Comment 3 Yaniv Lavi 2016-02-03 14:30:52 UTC
Bryn, what will happen is the log was rotated and you set a log size? Will it do it in a smart way taking from most recent to rotated by size?
Sandro, what do you think of this suggestion?

Comment 4 Sandro Bonazzola 2016-02-05 07:09:22 UTC
(In reply to Yaniv Dary from comment #3)
> Sandro, what do you think of this suggestion?

If --log-size takes most recent entries, that may be used.
However, note that limiting the size doesn't guarantee that you collect for example last 24 hours of logs: you may have a flood in the last hour filling the logs making previous logs not available in the report.

The time based filtering feature has been opened upstream here: https://github.com/sosreport/sos/issues/284 by Bryn 2 years ago so he's aware of this.

If it's acceptable to have a limit on the size instead of on the time (requiring a new sosreport execution if needed logs are not included), using the log-size option is fine for me (adding a big warning about it).

Comment 7 Bryn M. Reeves 2017-04-27 13:55:46 UTC
We can do this easily (with a little bit of work upstream) for journald logs: the existing Plugin.add_journal() interface supports the underlying --since and --until switches of journalctl:

    def add_journal(self, units=None, boot=None, since=None, until=None,
                    lines=None, allfields=False, output=None, timeout=None):
        """ Collect journald logs from one of more units.

        Keyword arguments:
        units     -- A string, or list of strings specifying the systemd
                     units for which journal entries will be collected.
        boot      -- A string selecting a boot index using the journalctl
                     syntax. The special values 'this' and 'last' are also
                     accepted.
        since     -- A string representation of the start time for journal
                     messages.
        until     -- A string representation of the end time for journal
                     messages.
        lines     -- The maximum number of lines to be collected.
        allfields -- Include all journal fields regardless of size or
                     non-printable characters.
        output    -- A journalctl output control string, for example
                     "verbose".
        timeout   -- An optional timeout in seconds.
        """

Right now there are no users of this part of the interface for RHEL (on Fedora, where /var/log/messages no longer exists it is used to grab the last three days of logs), but plugins using journal logs are growing quite rapidly:

$ grep add_journal sos/plugins/[a-zA-Z]*.py | wc -l
20

I've hesitated to wire these up to new global options (e.g. --logsince, --loguntil) to avoid creating an expectation that this will work with syslog too - implementing  that is considerably more work, and is more fragile due to the syslog formatting problems mentioned in comment #2.

If it's either felt worth hooking up just for journal logs (e.g. --journalsince...) that's a relatively small piece of work - the main thing is fixing up the 20 plugins that use the interface already to use the options.

It's possible to do for syslog also, with some limitations - but that is a larger piece of work - if we're thinking of this for the next update we should start planning when the work will get done now.

Comment 8 Pavel Moravec 2017-08-30 10:05:37 UTC
(In reply to Yaniv Lavi (Dary) from comment #3)
> Bryn, what will happen is the log was rotated and you set a log size? Will
> it do it in a smart way taking from most recent to rotated by size?
> Sandro, what do you think of this suggestion?

Sadly, this does not work (now). Calling e.g. "sosreport -o logs", it collects /var/log/messages* with a sizelimit, but sorts files alphabetically, so files are added (until limit is reached) in ordering:

/var/log/messages
/var/log/messages-20160101
/var/log/messages-20170101
/var/log/messages-20170828

Anyway changing this shall be simple.

We have the log limit / filter on sos roadmap but dont have capacity to implement it - at least within 7.5 scope. The above can be a feasible workaround - would you appreciate it?

Comment 9 Bryn M. Reeves 2017-08-30 10:45:13 UTC
(In reply to Pavel Moravec from comment #8)
> /var/log/messages
> /var/log/messages-20160101
> /var/log/messages-20170101
> /var/log/messages-20170828
> 
> Anyway changing this shall be simple.

This is a bug but sadly it's not that simple; the current sorting convention the sos file collector uses is alphanumeric. This gives correct results for the "old" rotation naming convention of appending ".N" (older files have higher numbers), but it fails for the "new" convention of appending the rotation date:

$ LC_ALL=C echo -e 'messages\nmessages.2\nmessages.1' | sort
messages
messages.1
messages.2

$ LC_ALL=C echo -e 'messages-20161102\nmessages-20170801' | sort
messages-20161102
messages-20170801

There are two obvious solutions: checking the file name pattern and adapting, and performing a stat(2) check to sort files by mtime. Both of these add some complexity (although I think the stat approach is more complex).

We can try to get this addressed for 3.5 (even if it's a temporary change to improve the behaviour for the two current common conventions - but I would rather have a "proper" fix).

Comment 10 Pavel Moravec 2017-09-01 19:00:51 UTC
My proposal:

- fix in RHEL7.5 just the https://bugzilla.redhat.com/show_bug.cgi?id=1486952 to collect newest logrotated files every time properly

- have this BZ opened for the initial reasonable RFE request

Updating flags accordingly.

Comment 13 Pavel Moravec 2018-04-02 13:51:22 UTC
no space in limited-scope of 7.6

Comment 14 Pavel Moravec 2018-12-04 08:21:28 UTC
To recap:

- bz1486952 implemented this RFE for journal logs
- the pending request is to have option "collect logfiles not newer/older than .."
  - so collecting of logfiles should be conditional based on these parameters

Sadly, the logfiles are collected by the same method like config files (add_copy_spec), while config files must be collected regardless of their age.

So for proper implementation, we would need to:
- have a dedicated method add_copy_log that applies the time range condition
- update almost *all* plugins (all those collecting a log file) accordingly

OK, challenge for Christmas silent period accepted.

preliminary ACKing for 7.7

Comment 15 Pavel Moravec 2019-03-10 13:00:14 UTC
The upstream PR

https://github.com/sosreport/sos/pull/1586

will just allow the possibility to filter (log)files collected by sosreport by their mtime. No change in sosreport data collection itself will happen.

If you have ideas what particular logs to filter based on mtime, then please either comment here (soon) or open a new bugzilla, stating:

- logfile pattern
- maxage and/or minage (in hours)
- usual and maximal sizes of files in such pattern (do we really want sosreport to collect 10GB logfile created in latest day?)
- if/what sizelimit should be still applied to that pattern
  - note that sizelimit is applied *independently* on age limit. So adding age limit, we shall probably increase (or remove) sizelimit to move from size to age filtering of the given file pattern.

Comment 16 Pavel Moravec 2019-03-10 13:05:35 UTC
For QE: how I tested it:

1) in sos/plugins/qpid.py, I added:

            "/var/log/cumin"
        ], minage=1, maxage=3)

(or played with those values or specified just one of the params)

2) generated fake logs with fake mtime

date >> /var/log/cumin      # this location is usually dir but.. we just fake something, right?
date >> /var/log/cumin.log
date >> /var/log/qpidd.log

touch -m --date="$(date -d '1 hour ago')"   /var/log/cumin
touch -m --date="$(date -d '3 hours ago')"  /var/log/cumin.log
touch -m --date="$(date -d '1 minute ago')" /var/log/qpidd.log

sosreport -o qpid --batch --build

and check what of these 3 files will be collected, based on minage/maxage setup.

Comment 17 Pavel Moravec 2019-03-29 11:26:47 UTC
Scope of 7.7 closed, rescheduling for potential inclusion in 7.8.

Comment 18 Pavel Moravec 2019-08-11 10:07:01 UTC
-- since option might be implemented in 7.8, but it will suffer by https://github.com/sosreport/sos/issues/1750 (collecting directories instead of files will ignore --since option). That is expected limitation so far.

Comment 21 Pavel Moravec 2019-10-09 11:57:28 UTC
Concise specification of the implemented feature: option --since will filter out logarchive files older than given timestamp, as well as journal log entries older than the timestamp.

Detailed description:
- if --since option is not used, no change
- if --since YYYYMMDD[HHMMSS] is provided, then:
  - no journal log older than the timestamp will be collected
  - no "logrotated file" older than given timestamp will be collected
    - "logrotated file" = file matching reg.expression https://github.com/sosreport/sos/blob/master/sos/plugins/__init__.py#L857
  - other files (not matching the expression - ideally all configs or current logs) will be still collected

Comment 23 Pavel Moravec 2019-11-05 12:14:53 UTC
Good findings! I raised:

https://github.com/sosreport/sos/issues/1847 (--since option wrongly applied to some configs also)
- this must be somehow resolved in 7.8

https://github.com/sosreport/sos/issues/1848 (jorunalctl shall apply --since as well (I think), or at least remove the --all-logs this/prev boot nonsense)
- I see this rather optional (say, nkown issue), but would like to have it fixed as well in 7.8

Comment 24 Pavel Moravec 2019-12-12 09:00:46 UTC
this has been mostly fixed in 3.8-1 already due to sos rebase, some final bits are pending to be committed to dist-git now.

Comment 28 errata-xmlrpc 2020-03-31 20:04:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:1127