RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2203141 - sosreport does not complete tailed log collection (when size of log file exceeds --log-size) when a plugin times out.
Summary: sosreport does not complete tailed log collection (when size of log file exce...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: sos
Version: 8.7
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Daniel Záležák
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-11 10:12 UTC by Ameya Patil
Modified: 2023-06-26 13:56 UTC (History)
8 users (show)

Fixed In Version: sos-4.5.4-1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-26 13:55:53 UTC
Type: Bug
Target Upstream Version:
Embargoed:
pm-rhel: mirror+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github sosreport sos pull 3237 0 None open [plugins] collect strings before commands 2023-05-14 17:15:12 UTC
Red Hat Issue Tracker RHELPLAN-157070 0 None None None 2023-05-12 06:46:57 UTC

Description Ameya Patil 2023-05-11 10:12:31 UTC
Description of problem:

While collecting sosreport , if a plugin gets timeout and at the same time if there a log file collection being done by that plugin and if the size of the file being collected is greater than default size limit (25 MiB as per man page of sosreport ) then the tailed log is not being captured.

To simulate this issue, we can add the "sleep 300" in the plugin file. However this can occur when any command in the plugin times out.
For example,

In file - /usr/lib/python3.6/site-packages/sos/report/plugins/logs.py
~~~
 90             else:
 91                 self.add_copy_spec([
 92                     "/var/log/syslog*",
 93                     "/var/log/kern.log*",
 94                     "/var/log/auth.log*",
 95                 ])
 96         self.add_cmd_output('sleep 300')     <--- Add this line inside setup()
 97 
 98     def postproc(self):
 99         self.do_path_regex_sub(
~~~

Make sure any file this plugin is going to collecting is bigger than 25 MiB.
For example on my system  I had /var/log/messages-20230508 which is greater than 25 MiB

~~~
2023-05-10 05:34:33,310 INFO: [plugin:logs] File '/var/log/messages-20230508' is over size limit, will instead tail the file during collection phase.
~~~

When the sosreport runs the log plugin , it get timeout because of the sleep command which it need to run for 300 seconds to run.
When this happens the log file which is greater than 25 MiB which we expect to  have the tailed logs collection of upto 25 MiB of the file for these file saved in the sos_strings as a symlink file.
However when the plugin timeouts, for example in case of the sleep command it skips the tailed log collection all together.


The collection happens in collect_plugin() method defined in /usr/lib/python3.6/site-packages/sos/report/plugins/__init__.py:
~~~
3184     def collect_plugin(self):
3185         """Collect the data for a plugin."""
3186         start = time()
3187         self._collect_copy_specs()
3188         self._collect_container_copy_specs()
3189         self._collect_tailed_files()
3190         self._collect_cmds()
3191         self._collect_strings()
3192         self._collect_manual()
~~~


Due to this, because of some command which timed out the log collection is incomplete even when the log collection is not getting timed out it not completed because collect_tailed_files() is done before running the collect_cmds() so the tailed log file is read into memory but not written to disk because collect_stings() is done after  the collect_cmds() completes. Which is timing out in our case and hence collect_strings() is not done.


By changing the ordering we were able to work around this issue,

If we change the ordering in the collect_plugin() to add the collection of `_collect_stings()` prior to the collection of `_collect_cmds()` to push the string collection before the tailed logs is captured.
~~~
    def collect_plugin(self):
        """Collect the data for a plugin."""
        start = time()
        self._collect_copy_specs()
        self._collect_container_copy_specs()
        self._collect_tailed_files()
        self._collect_strings()
        self._collect_cmds()
        self._collect_manual()
        fields = (self.name(), time() - start)
        self._log_debug("collected plugin '%s' in %s" % fields)
~~~

As per the seen in my test system, now we see that read() is done (possibly in _collect_tailed_files() )  and then before running command , writing is done (possibly in _collect_strings() ) and then commands (like sleep) are executed (possibly in _collect_cmds() )
~~~
# less sos.strace | grep -e 26214400  -e 'execve("/usr/bin/sleep"'
51071 09:09:44.987002 pread64(5</var/lib/rpm/Packages>, "\0\0\0\0\1\0\0\0\0\31\0\0\0\0\0\0\377\30\0\0\1\0\346\17\0\7\0\0\0K\0\1W\274\0\0\0?\0\0\0\7\0\1Q(\0\0\0\20\0\0\0d\0\0\0\10\0\0\0\0\0\0\0\1\0\0\3\350\0\0\0\6\0\0\0\2\0\0\0\1\0\0\3\351\0\0\0\6\0\0\0\23\0\0\0\1\0\0\3\352\0\0\0\6\0\0\0\34\0\0\0\1\0\0\3\353\0\0\0\4\0\0\0$\0\0"..., 4096, 26214400) = 4096 <0.000009>
51071 09:09:45.318863 pread64(5</var/lib/rpm/Packages>, "\0\0\0\0\1\0\0\0\0\372\0\0\377\371\0\0\1\372\0\0\1\0\346\17\0\7labs/puppet/lib/ruby/vendor_gems/gems/gettext-3.2.2/samples/cgi/po/uk/\0/opt/puppetlabs/puppet/lib/ruby"..., 4096, 262144000) = 4096 <0.000010>
51108 09:09:48.881786 lseek(5</var/log/messages-20230508>, -26214400, SEEK_END) = 8689563 <0.000008>
51108 09:09:48.882011 read(5</var/log/messages-20230508>, "046]: Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy\nMay  2 07:31:50 rhel8 systemd-user-runtime-dir[2046]: Failed to r"..., 26214401) = 26214400 <0.003822>
51108 09:09:48.925317 write(5</var/tmp/sos.k2a6qni0/sosreport-rhel8-2023-05-10-luhomlu/sos_strings/logs/var.log.messages-20230508.tailed>, "046]: Found cgroup on /sys/fs/cgroup/systemd, legacy hierarchy\nMay  2 07:31:50 rhel8 systemd-user-runtime-dir[2046]: Failed to r"..., 26214400 <unfinished ...>
51108 09:09:48.931311 <... write resumed>) = 26214400 <0.005949>
51123 09:09:50.633524 execve("/usr/bin/sleep", ["sleep", "300"], 0x7ffe085b0e60 /* 45 vars */) = 0 <0.000116>
50982 09:14:49.929088 lstat("/var/tmp/sos.k2a6qni0/sosreport-rhel8-2023-05-10-luhomlu/sos_strings/logs/var.log.messages-20230508.tailed", {st_mode=S_IFREG|0640, st_size=26214400, ...}) = 0 <0.000006>
50982 09:14:49.929835 fstat(12</var/tmp/sos.k2a6qni0/sosreport-rhel8-2023-05-10-luhomlu/sos_strings/logs/var.log.messages-20230508.tailed>, {st_mode=S_IFREG|0640, st_size=26214400, ...}) = 0 <0.000004>
50982 09:14:50.585211 newfstatat(13</var/tmp/sos.k2a6qni0/sosreport-rhel8-2023-05-10-luhomlu/sos_strings/logs>, "var.log.messages-20230508.tailed", {st_mode=S_IFREG|0640, st_size=26214400, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000005>
~~~


OTHER Workaround we see is that , 

- To use "--log-size" parameter to extend this default size limit of 25 MiB based to a size you decide before hand which will not cause the tailed log collection to occur on the file you are intrested in by checking its size so the whole file is copied as is as log as its below the new size limit you define.
- Or use "--all-logs" both of which collect the whole file do not relay on tailed log collection so even when the command times out this copying of the whole file which likely happens in the _collect_copy_specs() which is done at the start and does not face any timeout issue due to commands.


So hence wanted to check if the above change of changing the order at which the _collect_strings() so as to enable this without affecting other functionality or if the there is other-way to do this.


Version-Release number of selected component (if applicable):
sos-4.5.1-3.el8.noarch


How reproducible:
Everytime.

Steps to Reproduce:
1. Add the "sleep 300" command to force the timeout, to a plugin like log.py at /usr/lib/python3.6/site-packages/sos/report/plugins/logs.py

   ~~~
   96         self.add_cmd_output('sleep 300')     <--- Add this line inside setup()
   ~~~

2. Confirm that you have a file /var/log/message greater than 25 MB. If not extend it size by adding some ascii text to it.

    # base64  < /dev/urandom | head -c 25M >> /var/log/messages

    # ls -lah /var/log/message

3. After this run the sosreport command and let it run till the log plugin times output.

    # sosreport -o logs

4. After check the extracted archive and you will see  that there is no tailed log collection done, in face the "sos_strings" directly is not present

   # tar -xvf <sosreport>
   # cd the <sosreport_extracted_directory>

   # ls -lah var/log/messages
   lrwxrwxrwx. 1 root root  55 May  8 03:38 var/log/messages  -> ../../sos_strings/logs/var.log.messages.tailed  <---(This symlink is broken)

   # No directory named "sos_string" present in the sos archive.


Actual results:
Tailed log collection is not saved to sosreport even when it read it not written to disk.


Expected results:
We want the tailed log collection to happen even when the command times out because that data is already read into memory during the before the command collection happens.

Additional info:
This is not only specific to log plugin but any plugin which collect tailed logs and store it in sos_strings for example the audit.py plugin and audit file also show similar behavior.

Comment 1 Pavel Moravec 2023-05-12 09:55:07 UTC
Hello,
thanks a lot for the particular reproducer and detailed analysis including a fix proposal.

Indeed, when a plugin timeouts during commands execution, self._collect_cmds() raises an exception which causes skipping further self._collect_strings() method. While the later method just deals with copy_specs and tailed files, it's execution is not affected by collect_cmds - and even logically it should be called just after tailed_files and tailed_files.

I plan to raise upstream PR for the

3190         self._collect_cmds()
3191         self._collect_strings()

lines swapping (and get credit to the reporter), until they would like to do so on their own.

Comment 2 Pavel Moravec 2023-05-14 17:15:12 UTC
OK, upstream PR created: https://github.com/sosreport/sos/pull/3237 . Thanks again for all the work!

Comment 4 Daniel Záležák 2023-06-15 10:53:13 UTC
Switched to Tested based on positive results from manual test:

`````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````
[root@ci-vm-10-0-137-66 ~]# grep "self.add_cmd_output('sleep 300')" /usr/lib/python3.6/site-packages/sos/report/plugins/logs.py
            self.add_cmd_output('sleep 300')

[root@ci-vm-10-0-137-66 ~]# ls -lah /var/log/messages
-rw-------. 1 root root 26M Jun 15 06:32 /var/log/messages

[root@ci-vm-10-0-137-66 sosreport-ci-vm-10-0-137-66-2023-06-15-mkgefaw]# ls
etc  sos_commands  sos_logs  sos_reports  sos_strings  var  version.txt

[root@ci-vm-10-0-137-66 sosreport-ci-vm-10-0-137-66-2023-06-15-mkgefaw]# ls -lah var/log/messages
lrwxrwxrwx. 1 root root 46 Jun 15 06:32 var/log/messages -> ../../sos_strings/logs/var.log.messages.tailed

[root@ci-vm-10-0-137-66 sosreport-ci-vm-10-0-137-66-2023-06-15-mkgefaw]# ls sos_strings/
logs

[root@ci-vm-10-0-137-66 sosreport-ci-vm-10-0-137-66-2023-06-15-mkgefaw]# ls sos_strings/logs/
var.log.messages.tailed

[root@ci-vm-10-0-137-66 sosreport-ci-vm-10-0-137-66-2023-06-15-mkgefaw]# ls -lah sos_strings/logs/var.log.messages.tailed 
-rw-r--r--. 1 root root 25M Jun 15 06:38 sos_strings/logs/var.log.messages.tailed
``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````

Thank you Ameya Patil for reproducer.

Comment 7 Daniel Záležák 2023-06-16 08:53:41 UTC
Switched to Tested based on positive results from manual test:

OLD
rhel: RHEL-8.9
sos:  sos-4.5.3-1.el8.noarch
arch: x86_64

[root@ci-vm-10-0-137-223 ~]# grep "self.add_cmd_output('sleep 300')" /usr/lib/python3.6/site-packages/sos/report/plugins/logs.py
        self.add_cmd_output('sleep 300')
[root@ci-vm-10-0-137-223 ~]# ls -lah /var/log/messages
-rw-------. 1 root root 26M Jun 16 04:29 /var/log/messages
[root@ci-vm-10-0-137-223 ~]# cd /var/tmp/sosreport-ci-vm-10-0-137-223-2023-06-16-gujdvsc
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-gujdvsc]# ls
etc  sos_commands  sos_logs  sos_reports  var  version.txt
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-gujdvsc]# ls -lah var/log/messages
lrwxrwxrwx. 1 root root 46 Jun 16 04:29 var/log/messages -> ../../sos_strings/logs/var.log.messages.tailed
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-gujdvsc]# ls sos_strings/
ls: cannot access 'sos_strings/': No such file or directory
`````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````

NEW
rhel: RHEL-8.9
sos:  sos-4.5.4-1.el8.noarch
arch: x86_64

[root@ci-vm-10-0-137-223 ~]# grep "self.add_cmd_output('sleep 300')" /usr/lib/python3.6/site-packages/sos/report/plugins/logs.py
        self.add_cmd_output('sleep 300')
[root@ci-vm-10-0-137-223 ~]# ls -lah /var/log/messages
-rw-------. 1 root root 51M Jun 16 04:49 /var/log/messages
[root@ci-vm-10-0-137-223 ~]# cd /var/tmp/sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse]# ls
etc  sos_commands  sos_logs  sos_reports  sos_strings  var  version.txt
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse]# ls -lah var/log/messages
lrwxrwxrwx. 1 root root 46 Jun 16 04:43 var/log/messages -> ../../sos_strings/logs/var.log.messages.tailed
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse]# ls sos_strings/
logs
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse]# ls sos_strings/logs/
var.log.messages.tailed
[root@ci-vm-10-0-137-223 sosreport-ci-vm-10-0-137-223-2023-06-16-wnnzcse]# ls -lah sos_strings/logs/var.log.messages.tailed 
-rw-r--r--. 1 root root 25M Jun 16 04:44 sos_strings/logs/var.log.messages.tailed
``````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````
Thank you Ameya Patil for reproducer.

Comment 10 errata-xmlrpc 2023-06-26 13:55:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (sos bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3801


Note You need to log in before you can comment on or make changes to this bug.