Bug 1663267

Summary:

rsyslog-8.24.0-34 memory leak

Product:

Red Hat Enterprise Linux 7

Reporter:

Pim Rupert <pim>

Component:

rsyslog

Assignee:

Jiří Vymazal <jvymazal>

Status:

CLOSED DUPLICATE

QA Contact:

Dalibor Pospíšil <dapospis>

Severity:

unspecified

Docs Contact:

Priority:

low

Version:

7.6

CC:

adam.winberg, charles.v.ferrari.ctr, cwarfiel, dapospis, james, jvymazal, msekleta, orion, ravpatil, rik.theys, rmeggins, santony, sbroz, smykhail, tjaros, uemit.seren, ysoni

Target Milestone:

Keywords:

Reopened, Triaged

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2019-07-02 11:06:11 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
memory usage rsyslog on example machine 2	none
memory usage example machine 4	none

Description Pim Rupert 2019-01-03 15:29:29 UTC

Description of problem:

rsyslog memory usage is out of control since the rsyslog-8.24.0-34 update. We see real memory usage of the rsyslog process raise in a straight line to approx. 400 MB in just a few weeks. Confirmed on multiple hosts.

Version-Release number of selected component (if applicable):

rsyslog-8.24.0-34

How reproducible:

Start rsyslog, graph memory usage.

Comment 2 Rich Megginson 2019-01-03 15:39:58 UTC

Can you please provide your rsyslog config?  It matters a great deal what modules you are using, and any processing done in the rsyslog config files.

Comment 3 Pim Rupert 2019-01-03 15:47:00 UTC

Nothing special at all. The rsyslog configuration is untouched. 

To clarify:
```
# rpm -qV rsyslog && echo "vanilla"
vanilla
```

The drop-in directory /etc/rsyslog.d/ contains one untouched file named `listen.conf`, part of the `systemd` package.

Comment 5 Jiří Vymazal 2019-01-04 08:33:06 UTC

(In reply to Pim Rupert from comment #3)
> Nothing special at all. The rsyslog configuration is untouched. 
> 
> To clarify:
> ```
> # rpm -qV rsyslog && echo "vanilla"
> vanilla
> ```
> 
> The drop-in directory /etc/rsyslog.d/ contains one untouched file named
> `listen.conf`, part of the `systemd` package.

Thanks for sharing this, however for attempt to reproduce it we would need some info what type (and volume) of traffic goes on in the affected machines.
Could you please share some general info about that?

Thanks

Comment 6 Pim Rupert 2019-01-09 11:37:33 UTC

Created attachment 1519474 [details]
memory usage rsyslog on example machine 2

Comment 7 Pim Rupert 2019-01-09 11:38:00 UTC

Created attachment 1519475 [details]
memory usage example machine 4

Comment 8 Pim Rupert 2019-01-09 11:38:12 UTC

> however for attempt to reproduce it we would need some info what type (and volume) of traffic goes on in the affected machines.

We see this on different kinds of machines with basic logging from just the localhost using the default config, thus to /var/log/messages, maillog, cron etc. We took samples of the last three days on four selected different machines:

- machine 1: avg. 2.3 lines/sec
- machine 2: avg. 17 lines/sec
- machine 3: avg. 12.5 lines/sec
- machine 4: avg. 0.7 lines/sec

We also measure RSS usage of the rsyslog process. We have data from before and after the update. Examples attached are for machine 2 (somewhat higher traffic) and 4 (somewhat lower traffic). In the graphs you can see:

- low and stable memory usage before the update was installed on November 20th
- dips in graphs for machine 2 when we restarted rsyslog
- climbing memory usage as if there is a memory leak for both machines, but more severe for machine 2 which does more log processing

Comment 11 Charles Ferrari 2019-01-18 21:04:50 UTC

Hello,

I have encountered this bug and confirmed with a reproducible test case. I'm unsure at the moment exactly what is causing it. Originally I thought it might have been omfwd since along with the memory usage I also found a lot of log spam with lines like:

... rsyslogd: action 'action 7' resumed (module 'builtin:omfwd') [v8.24.0-34.el7 ...
... rsyslogd: action 'action 8' resumed (module 'builtin:omfwd') [v8.24.0-34.el7 ...

However I don't believe that is actually this problem but a different one. I commented out the log server lines and restarted rsyslog. The log spam stopped, but the memory leak was still there. I believe the cause of the log spam was broken connectivity to the log servers, a separate bug/issue (there's another thread for that one here on the tracker).

I can't copy here my full configuration with comments and such due to technical restrictions, but I can hand-type the enabled options. It is a very simple configuration which directs rsyslog to interface with journald. If I made any typos here, it is because I had to type it myself, and I can assure you they are not in the system config. Rsyslog's parser does not complain. Also I am leaving out the logging server lines since the bug persists without them.

/etc/rsyslog.conf:

$ModLoad imuxsock
$ModLoad imjournal
$OmitLocalLogging on
$WorkDirectory /var/lib/rsyslog
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$IMJournalStateFile imjournal.state
*.info;mail.none;authpriv.none;cron.none /var/log/messages
authpriv.* /var/log/secure
mail.* -/var/log/maillog
cron.* /var/log/cron
*.emerg :omusrmsg:*
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log

There is one line in /etc/rsyslog.d/listen.conf, but I commented it out. The above is all enabled configuration.

The steps I used to reproduce the bug are as follows:

1. Use the following command line or similar to spam messages:

counter=1;while true;do msg="Test message ${counter}";echo -en "${msg}\r;logger "$msg";let counter++;done

2. In another terminal watch memory usage grow. I used the following command line to monitor both disk and memory:

watch 'top -b -n1 -p$(ps -Crsyslogd -opid h);echo;df -h /var/log'

Regards,

Charles

Comment 12 Charles Ferrari 2019-01-18 21:37:28 UTC

Hello again,

I spoke too soon in regards to my testing with the forwarding disabled. Memory usage did go up for a couple of minutes, but I'm seeing it getting freed after a certain point (~25k under RES in top). This freeing does not happen when I enable forwarding. Memory usage grows linearly without stopping.

Looks like it's probably a problem with omfwd afterall.

The configuration lines I'm using for the forwarding are as follows (addresses redacted):

*.* @@x.x.x.x:xxxx
*.* @@x.x.x.x:xxxx

Regards,

Charles

Comment 13 Orion Poplawski 2019-01-24 15:31:37 UTC

I'm seeing memory usage grow as well.  In one particular case we're running rsyslog as a collector and it grew to over 2GB in virt size.  In that case we were having lots of failed TLS connection attempts due to clients having expired certificates.

$ModLoad imtcp

# make gtls driver the default
$DefaultNetstreamDriver gtls

# certificate files
$DefaultNetstreamDriverCAFile /etc/pki/tls/certs/ca.crt
$DefaultNetstreamDriverCertFile /etc/pki/tls/certs/host.crt
$DefaultNetstreamDriverKeyFile /etc/pki/tls/private/host.key

$InputTCPServerStreamDriverAuthMode x509/name
$InputTCPServerStreamDriverPermittedPeer *.nwra.com
$InputTCPServerStreamDriverPermittedPeer *.ad.nwra.com
$InputTCPServerStreamDriverPermittedPeer *.mry.nwra.com
$InputTCPServerStreamDriverPermittedPeer *.cora.nwra.com
$InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode
$InputTCPServerRun 6514 # start up listener at port 6514
$InputTCPServerKeepAlive on # close connections to clients that disappear

# Need to keep tabs
$EscapeControlCharactersOnReceive off

# Record the time received ("generated locally") for eventlog messages  so that
# logwatch will process late messages
template(name="ReceivedFileFormat" type="list") {
    property(name="timegenerated" dateFormat="rfc3339")
    constant(value=" ")
    property(name="hostname")
    constant(value=" ")
    property(name="syslogtag")
    property(name="msg" spifno1stsp="on" )
    property(name="msg" droplastlf="on" )
    constant(value="\n")
    }

user.*          /var/log/eventlog-tls;ReceivedFileFormat
authpriv.*      /var/log/secure-tls
local1.*        -/var/log/suricata-tls


rsyslogd[3695]: not permitted to talk to peer, certificate invalid: GnuTLS returned no specific reason [v8.24.0-34.el7]
rsyslogd[3695]: invalid cert info: peer provided 1 certificate(s). Certificate 1 info: certificate valid from Fri Jan 19 12:21:53 2018 to Sat Jan 19 12:21:53 2019; Certificate public key: RSA; DN: .....
rsyslogd[3695]: netstream session 0x7f5f725051d0 from .... will be closed due to error  [v8.24.0-34.el7 try http://www.rsyslog.com/e/2089 ]

Comment 14 Rik Theys 2019-01-29 08:17:40 UTC

Hi,

We're experiencing similar issues. I have the impression the behavior started with the latest systemd-journald update to fix the bug in journald.

I know Debian released an updated systemd as their original fix contained a memory leak. Maybe the same fix is not yet included in the RHEL package?

From the Debian changelog:

systemd (232-25+deb9u8) stretch-security; urgency=high

  * Non-maintainer upload by the Security Team.
  * Address memory leak in dispatch_message_real()
    In dispatch_message_real() memory allocated by set_iovec_field_free()
    is not free()d.
    Follow upstream and introduce specific variables cmdline1 and cmdline2
    and free() those automatically when dispatch_message_real() returns.
  * Correctly allocate core_timestamp on the heap and avoid invalid free()
  * Remove unused core* variables in process_kernel()

Regards,
Rik

Comment 15 Jiří Vymazal 2019-01-29 09:06:54 UTC

(In reply to Rik Theys from comment #14)
> Hi,
> 
> We're experiencing similar issues. I have the impression the behavior
> started with the latest systemd-journald update to fix the bug in journald.
> 
> I know Debian released an updated systemd as their original fix contained a
> memory leak. Maybe the same fix is not yet included in the RHEL package?
> 
> From the Debian changelog:
> 
> systemd (232-25+deb9u8) stretch-security; urgency=high
> 
>   * Non-maintainer upload by the Security Team.
>   * Address memory leak in dispatch_message_real()
>     In dispatch_message_real() memory allocated by set_iovec_field_free()
>     is not free()d.
>     Follow upstream and introduce specific variables cmdline1 and cmdline2
>     and free() those automatically when dispatch_message_real() returns.
>   * Correctly allocate core_timestamp on the heap and avoid invalid free()
>   * Remove unused core* variables in process_kernel()
> 
> Regards,
> Rik

Many thanks for information, this will have to be checked, Michal do you know more about this please?

Thanks

Comment 16 Rik Theys 2019-01-29 09:11:21 UTC

Hi,

It seems the bug is indeed in RHEL:

https://bugzilla.redhat.com/show_bug.cgi?id=1666690

Regards,
Rik

Comment 17 Michal Sekletar 2019-02-01 09:17:29 UTC

(In reply to Jiří Vymazal from comment #15)

> Many thanks for information, this will have to be checked, Michal do you
> know more about this please?

It is a bug in journald which we introduced while fixing latest batch of journal CVEs. Fix for it was already released in systemd-219-62.el7_6.3.

Comment 18 Jiří Vymazal 2019-02-28 08:37:54 UTC

Thank you for taking the time to report this issue to us. We appreciate the feedback and use reports such as this one to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution.

If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution. 

For information on how to contact the Red Hat production support team, please visit:
https://www.redhat.com/support/process/production/#howto

Comment 19 Pim Rupert 2019-02-28 13:59:21 UTC

No Enterprise support is requested. Just wanted to report an obvious memory leak bug, which still isn't resolved.

Comment 20 James Juran 2019-02-28 14:23:31 UTC

Pim are you saying systemd-219-62.el7_6.3 does not fix this problem?

We've been experiencing this memory leak too and set up a daily rsyslog restart cron job as a workaround. I'd like to remove that and was planning to test after our upcoming patch application window. If you have confirmation it's not fixed with the systemd update, that will change our plans.

Thank you for any information you can provide.

Comment 21 Pim Rupert 2019-02-28 15:10:37 UTC

Hi James, we've not seen it fixed since the last two updates of systemd. On all machines we're seeing on both systemd-219-62.el7_6.3 and the newer systemd-219-62.el7_6.5 unchanged behaviour in memory usage. The problem is rather manageable in our case since we have plenty of memory available on most nodes and can restart the daemon when necessary, but I can imagine that's it a real pain in other situations.

Comment 26 adam winberg 2019-04-01 13:24:33 UTC

We are also still seeing increasing memory usage by journald and rsyslog, currently using systemd-219-62.el7_6.5.x86_64.

Comment 27 Michal Žejdl 2019-04-03 07:08:24 UTC

I would like to add some information from my point of view.

We have one server where slapd logs often (for more than year) what leads to many slapd entries in systemd journal and this repeating line in /var/log/messages:

journal: Suppressed ... messages from /system.slice/slapd.service

The configuration of both rsyslog and systemd is untouched.

After updating from 7.5 to 7.6 both rsyslogd and systemd-journald started to eat memory which led to OOM killing one of these processes several times.

After updating to systemd-219-62.el7_6.5.x86_64 and last restart of systemd-journald (by OOM killer and systemd) the systemd-journald no longer eats memory (problem solved).

But despite of restarting rsyslogd it still has the leak and must be restarted regulary by crond because OOM killer no longer helps here (it is given by our memory/swap/slapd usage).

The server itself was not restarted yet after systemd update.

Comment 28 adam winberg 2019-04-03 07:46:57 UTC

Yes, I believe I see the same as previous commenter - journald actually seems to keep memory usage at a consistent level after systemd update, but rsyslog keeps increasing in memory usage.

Comment 30 Michal Žejdl 2019-04-26 05:20:05 UTC

(In reply to Michal Žejdl from comment #27)
> The server itself was not restarted yet after systemd update.

the rsyslogd still has memory leak even after server reboot

Comment 34 adam winberg 2019-07-02 11:13:25 UTC

It's a real nuisance when you duplicate an open bug in favor of a private bug. It makes it impossible to track progress.

Comment 35 Jiří Vymazal 2019-07-02 13:28:32 UTC

(In reply to adam winberg from comment #34)
> It's a real nuisance when you duplicate an open bug in favor of a private
> bug. It makes it impossible to track progress.

Sorry, I did not even notice the other one was private. Unless I specifically look for a tiny 5px icon I have no way of really knowing that. The other one has another trackers and dependencies on which would complicate other stuff and processes, but I will try and see whether I can edit it so it can be made public.