RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1183244 - sosreport tracebacks with memory error with huge system journall
Summary: sosreport tracebacks with memory error with huge system journall
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: Miroslav Hradílek
URL: https://github.com/sosreport/sos/issu...
Whiteboard:
: 1298186 1320182 1326422 1369122 1369123 1537490 (view as bug list)
Depends On:
Blocks: 1295396 1369123 1464262 1473612 1477664 1546815 1552138 1584547 1584548
TreeView+ depends on / blocked
 
Reported: 2015-01-17 15:35 UTC by David Kutálek
Modified: 2022-03-13 13:55 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1584547 1584548 (view as bug list)
Environment:
Last Closed: 2018-10-30 10:31:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
potential journalctl sizelimit patch for 7.5 (7.83 KB, patch)
2018-07-23 11:31 UTC, Pavel Moravec
no flags Details | Diff
potential journalctl sizelimit patch for 7.5 (fixed) (8.39 KB, application/mbox)
2018-08-22 14:30 UTC, Pavel Moravec
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:3144 0 None None None 2018-10-30 10:33:21 UTC

Internal Links: 1667185

Description David Kutálek 2015-01-17 15:35:58 UTC
Description of problem:

With huge journal present on system, sosreport logs plugin tracebacks with Memory error:

...
 Setting up archive ...
[archive:TarFileArchive] initialised empty FileCacheArchive at '/tmp/tmp.wyaguXLNq8/sosreport-auto-x86-64-001.ss.eng.bos.redhat.com-20150117095533'
[sos.sosreport:setup] executing 'sosreport -v -v --batch --tmp-dir=/tmp/tmp.wyaguXLNq8 -o logs --all-logs --log-size=15'
 Setting up plugins ...
[plugin:logs] added copyspec '/etc/syslog.conf'
[plugin:logs] added copyspec '/etc/rsyslog.conf'
[plugin:logs] added copyspec '/var/log/boot.log'
[plugin:logs] packed command tuple: ('journalctl --all --this-boot --no-pager', 'None', 'None', 300, 'None')
[plugin:logs] added cmd output 'journalctl --all --this-boot --no-pager'
[plugin:logs] packed command tuple: ('journalctl --all --this-boot --no-pager -o verbose', 'None', 'None', 300, 'None')
[plugin:logs] added cmd output 'journalctl --all --this-boot --no-pager -o verbose'
fileobj: /etc/syslog.conf could not be opened
[plugin:logs] added string '...id="18922" x-info="http://www.rsyslog.com"] exiting on signal 15.' as 'var.log.messages.tailed'
[plugin:logs] added copyspec '/var/log/secure'
[plugin:logs] added copyspec '/var/log/boot.log'
[plugin:logs] added string '...id="18922" x-info="http://www.rsyslog.com"] exiting on signal 15.' as 'var.log.everything.tailed'
[plugin:logs] added string '...id="18922" x-info="http://www.rsyslog.com"] exiting on signal 15.' as 'var.log.everything.tailed'
 Running plugins. Please wait ...

  Running 1/1: logs...        
[plugin:logs] collecting path '/var/log/secure'
[plugin:logs] copying path '/var/log/secure' to archive:'/var/log/secure'
[plugin:logs] collecting path '/etc/rsyslog.conf'
[plugin:logs] copying path '/etc/rsyslog.conf' to archive:'/etc/rsyslog.conf'
[plugin:logs] collecting path '/var/log/boot.log'
[plugin:logs] copying path '/var/log/boot.log' to archive:'/var/log/boot.log'
[plugin:logs] unpacked command tuple: ('journalctl --all --this-boot --no-pager', 'None', 'None', 300, 'None')
[plugin:logs] collecting output of 'journalctl --all --this-boot --no-pager'
[plugin:logs] command 'journalctl --all --this-boot --no-pager' timed out after 300s
[plugin:logs] collected output of 'journalctl' in 301.554158926
[plugin:logs] unpacked command tuple: ('journalctl --all --this-boot --no-pager -o verbose', 'None', 'None', 300, 'None')
[plugin:logs] collecting output of 'journalctl --all --this-boot --no-pager -o verbose'
[plugin:logs] command 'journalctl --all --this-boot --no-pager -o verbose' timed out after 300s
[plugin:logs] collected output of 'journalctl' in 536.92585206
logs
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/sos/sosreport.py", line 1164, in collect
    plug.collect()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 630, in collect
    self._collect_cmd_output()
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 609, in _collect_cmd_output
    timeout=timeout, runat=runat)
  File "/usr/lib/python2.7/site-packages/sos/plugins/__init__.py", line 567, in get_cmd_output_now
    self.archive.add_string(result['output'], outfn)
  File "/usr/lib/python2.7/site-packages/sos/archive.py", line 178, in add_string
    f.write(content)
  File "/usr/lib64/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib64/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
MemoryError


Creating compressed archive...
[archive:TarFileArchive] finalizing archive '/tmp/tmp.wyaguXLNq8/sosreport-auto-x86-64-001.ss.eng.bos.redhat.com-20150117095533' using method 'auto'
[archive:TarFileArchive] built archive at '/tmp/tmp.wyaguXLNq8/sosreport-auto-x86-64-001.ss.eng.bos.redhat.com-20150117095533.tar' (size=149985280)

Your sosreport has been generated and saved in:
  /tmp/tmp.wyaguXLNq8/sosreport-auto-x86-64-001.ss.eng.bos.redhat.com-20150117095533.tar.xz

The checksum is: 7eb98d831bf48c22690e8e8ca96c5a9a

Please send this file to your support representative.

---

du -h /var/log/journal/6dd863494c2a45f3977173c55cef4a9d/
4.0G    /var/log/journal/6dd863494c2a45f3977173c55cef4a9d/


Version-Release number of selected component (if applicable):

sos-3.2-11.el7.noarch

How reproducible:

With so big journal always

Steps to Reproduce:
1. Prepare huge journal files
2. Run sosreport with limit:
3. sosreport -v -v --batch -o logs --all-logs --log-size=15

Actual results:

Traceback

Expected results:

No traceback

Additional info:

Comment 1 Bryn M. Reeves 2015-01-20 11:36:54 UTC
It's not so much the size of the journal files that matters as the size of the data produced for the selected log window (--this-boot in this case). I've got a few systems with ~4GiB journals that do not show this behaviour (although journalctl commands are tediously slow).

Could you redirect the output of 'journalctl --all --this-boot --no-pager -o verbose' on this system to a file so we can see how big it's getting?

Comment 2 David Kutálek 2015-01-20 12:01:15 UTC
[1]+  Done                    journalctl --all --this-boot --no-pager -o verbose > ./journall.out

# uptime
 07:00:20 up 47 days, 16:36,  1 user,  load average: 1.45, 1.84, 1.56

# ls -l ./journall.out 
-rw-r--r--. 1 root root 1702041597 Jan 20 06:58 ./journall.out

Comment 3 Bryn M. Reeves 2015-01-21 11:03:28 UTC
OK, so a little over 1600MiB. With sos's current IO strategy that is going to put considerable pressure on the host VM.

Unfortunately there's no direct way to tell journalctl to limit the volume of data it produces.

Rewriting the command IO paths to use less memory has been on the TODO for a long time but hasn't been a priority (as we've not had common cases where it was really necessary).

Comment 4 Pavel Moravec 2015-11-04 09:18:58 UTC
Scope of RHEL7.2 is closed. Re-scheduling to RHEL7.3.

Comment 6 Pavel Moravec 2016-05-08 12:20:53 UTC
*** Bug 1326422 has been marked as a duplicate of this bug. ***

Comment 7 Lee Yarwood 2016-09-16 10:03:44 UTC
*** Bug 1320182 has been marked as a duplicate of this bug. ***

Comment 8 Pavel Moravec 2016-12-14 21:15:48 UTC
*** Bug 1369122 has been marked as a duplicate of this bug. ***

Comment 10 Lee Yarwood 2017-01-19 09:49:39 UTC
*** Bug 1369123 has been marked as a duplicate of this bug. ***

Comment 13 Pavel Moravec 2017-03-24 16:54:10 UTC
Regrettably, no fix has been provided for this and due to soon devel freeze, it is not supposed this will be fixed in 7.4 :(

Re-scheduling to 7.5

Comment 14 Pavel Moravec 2017-03-24 16:55:28 UTC
*** Bug 1298186 has been marked as a duplicate of this bug. ***

Comment 15 Pavel Moravec 2017-09-01 15:44:50 UTC
No devel resources for this in 7.5, re-scheduling for potential inclusion in 7.6.

Comment 21 Peter Portante 2018-02-27 13:16:40 UTC
Effectively sosreport is DoS'ing a system when it is run.  This should be considered a high priority to fix, if not urgent.  We can't ask our customers to gather data on a problem only to have the tool that gathers that data cause further harm, can we?

Comment 27 Scott Dodson 2018-03-12 15:32:44 UTC
*** Bug 1537490 has been marked as a duplicate of this bug. ***

Comment 28 Pavel Moravec 2018-04-17 07:26:30 UTC
devel_ack+ for 7.6, expecting https://github.com/sosreport/sos/pull/1120 will be merged before sos 3.6

Comment 32 Pavel Moravec 2018-05-31 07:40:02 UTC
7.6 will have twofold fixes:

(A) don't collect "journalctl --verbose" (huge data to crunch in memory) by default (only when using --all-logs)

(B) process any command output in batches (see https://github.com/sosreport/sos/pull/1120/files)

The 7.5.z will contain just the first safe fix that 1) is really safe even for z-stream, and 2) it shrinks memory requirements significantly.


Verification steps:

(A) 
- ensure that sosreport by default does not collect sos_commands/logs/journalctl_--no-pager_--all_--boot_--output_verbose file (by not calling the adequate command)
- --all-logs will still collect that command

(B)
run sosreport prior and after this BZ fixed (esp. on systems with big journalctl or any other huge cmd output collected by sos), and compare maximal memory consumed

Comment 44 Pavel Moravec 2018-07-23 11:31:42 UTC
Created attachment 1469923 [details]
potential journalctl sizelimit patch for 7.5

> Adding to it, as RHEL 7.6 is not going to be released soon customer is insisting to include fix in RHEL 7.5. But as per flag it seems fix will not be included in RHEl 7.5. If yes can you please help with justification which will help to set customer expectation.

7.6 (to be released in autumn/winter) will have twofold fixes:

(A) don't collect "journalctl --verbose" (huge data to crunch in memory) by default (only when using --all-logs)

(B) process any command output in batches (see https://github.com/sosreport/sos/pull/1120/files)

The 7.5.z (that the customer already has) contains just the first safe fix that 1) is really safe even for z-stream, and 2) it shrinks memory requirements significantly.

(B) introduces splitting cmd output and parsing it in batches, that might have unwanted consequences we would like to test in downstream prior releasing it in _supported_ sosreport tool. 

Reviewing what (B) changes, there is still some safe improvement that I could propose into 7.5 sos - limiting the size of data collected by "journalctl --unit [unit]" calls (where the "journalctl --unit docker" is the problematic for this customer).

So there are 3 options (until we release supported version of sos-3.6 in RHEL7.6):
1) stay on fully supported sos-3.5-9.el7_5
2) use the unsupported sos-3.6-2
3) use (supported) hotfixed version over 3.5-9 that can be checked if it helps via:

- installing sos-3.5-9.el7_5
- uploading 1120-sizelimit.patch to /tmp directory
- running:
cd /usr/lib/python2.7/site-packages
cat /tmp/1120-sizelimit.patch | patch -p1

Let try the 3rd option if it prevents OOM killer and let me know if a hotfix is desired for it.

Comment 45 Pushpendra Chavan 2018-08-17 10:36:07 UTC
(In reply to Pavel Moravec from comment #44)

> 
> - installing sos-3.5-9.el7_5
> - uploading 1120-sizelimit.patch to /tmp directory
> - running:
> cd /usr/lib/python2.7/site-packages
> cat /tmp/1120-sizelimit.patch | patch -p1
> 
> Let try the 3rd option if it prevents OOM killer and let me know if a hotfix
> is desired for it.

This worked!

Comment 50 Pavel Moravec 2018-08-22 14:30:59 UTC
Created attachment 1477901 [details]
potential journalctl sizelimit patch for 7.5 (fixed)

(In reply to Pushpendra Chavan from comment #48)
> Created attachment 1477508 [details]
> sosreport generated after applying patch
> 
> sosreport generated after applying patch. Need to check if this is complete
> or not

I apologize, the patch has a typo causing almost nothing works. Use this one and apply it:

yum reinstall -y sos-3.5-9.el7_5
cd /usr/lib/python2.7/site-packages
cat /tmp/1120.patch | patch -p1

Comment 60 errata-xmlrpc 2018-10-30 10:31:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:3144


Note You need to log in before you can comment on or make changes to this bug.