Bug 1326422 - sosreport may run out of memory if the journal has a lot of entries
Summary: sosreport may run out of memory if the journal has a lot of entries
Keywords:
Status: CLOSED DUPLICATE of bug 1183244
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: sos
Version: 7.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: ---
Assignee: Pavel Moravec
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-12 15:49 UTC by Evgheni Dereveanchin
Modified: 2019-12-16 05:38 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-05-08 12:20:53 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Evgheni Dereveanchin 2016-04-12 15:49:54 UTC
Description of problem:
Currently the "logs" module of sosreport consumes a high amount of RAM, and if there's enough logs in the journal it will run out of memory and crash

Version-Release number of selected component (if applicable):
sos-3.2-35.el7_2.3

How reproducible:
This was reproduced on a system with 32GB RAM and at least 10GB free.
the amount of logs in the journal (produced by OpenShift)
# journalctl -b | wc -l
1647020

Steps to Reproduce:
1. add a million lines to the journal (careful, may kill the system)
# for i in {1..1000000}; do echo "test$i test$i test$i test$i test$i 12345 test$i" | systemd-cat; done
2. try to collect sosreport
# sosreport

Actual results:
...

 Setting up archive ...
 Setting up plugins ...
 Running plugins. Please wait ...

  Running 36/80: logs...        Killed

Expected results:
sosreport collected successfully

Additional info:
on slower systems the logs module will just time out
  Running 28/67: logs...
[plugin:logs] command 'journalctl --all --this-boot --no-pager -o verbose' timed out after 300s

Comment 2 Evgheni Dereveanchin 2016-04-12 15:56:24 UTC
I've also seen cases when the logs module just fills /tmp due to the high volume of messages in verbose output.

The proposal here would be to check for journal size and only gather the latest info if there's too many lines present.

Comment 3 Bryn M. Reeves 2016-04-13 12:03:13 UTC
> on slower systems the logs module will just time out
>   Running 28/67: logs...
> [plugin:logs] command 'journalctl --all --this-boot --no-pager -o verbose' 

This isn't an sos problem per-se; journalctl is just taking too long to write the messages to stdio. We can increase the timeout but it's just a workaround really.

> I've also seen cases when the logs module just fills /tmp due to the high 
> volume of messages in verbose output.

Sos in RHEL7 does not write to /tmp. Do you mean /var/tmp? If you're actually seeing /tmp filled up then it's unlikely sos is responsible.

> The proposal here would be to check for journal size and only gather the 
> latest info if there's too many lines present.

Presently there is no way to do this (reasonably) with the existing journald tooling: there is no way to request the size (and afaik journalctl itself cannot know without reading in all the records). Teaching sos to inspect the raw journal files directly would be a layering violation.

This would mean we'd have to do everything twice: once to count lines and a second to capture the data (and this is racy: a process generating a high rate of messages/sec will cause a large error in the two counts).

Comment 4 Bryn M. Reeves 2016-04-13 12:04:57 UTC
Addressing the OOM condition for very large journals is possible but it involves fairly significant changes to the IO handling in the Plugin class as well as the process IO from sos.utilities. It's something that's on the upstream roadmap but it has not been implemented or evaluated for suitability for a RHEL update at this time.

Comment 5 Pavel Moravec 2016-05-08 12:20:53 UTC

*** This bug has been marked as a duplicate of bug 1183244 ***


Note You need to log in before you can comment on or make changes to this bug.