Bug 1663267
Summary: | rsyslog-8.24.0-34 memory leak | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pim Rupert <pim> | ||||||
Component: | rsyslog | Assignee: | Jiří Vymazal <jvymazal> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Dalibor Pospíšil <dapospis> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 7.6 | CC: | adam.winberg, charles.v.ferrari.ctr, cwarfiel, dapospis, james, jvymazal, msekleta, orion, ravpatil, rik.theys, rmeggins, santony, sbroz, smykhail, tjaros, uemit.seren, ysoni | ||||||
Target Milestone: | rc | Keywords: | Reopened, Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-07-02 11:06:11 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Pim Rupert
2019-01-03 15:29:29 UTC
Can you please provide your rsyslog config? It matters a great deal what modules you are using, and any processing done in the rsyslog config files. Nothing special at all. The rsyslog configuration is untouched. To clarify: ``` # rpm -qV rsyslog && echo "vanilla" vanilla ``` The drop-in directory /etc/rsyslog.d/ contains one untouched file named `listen.conf`, part of the `systemd` package. (In reply to Pim Rupert from comment #3) > Nothing special at all. The rsyslog configuration is untouched. > > To clarify: > ``` > # rpm -qV rsyslog && echo "vanilla" > vanilla > ``` > > The drop-in directory /etc/rsyslog.d/ contains one untouched file named > `listen.conf`, part of the `systemd` package. Thanks for sharing this, however for attempt to reproduce it we would need some info what type (and volume) of traffic goes on in the affected machines. Could you please share some general info about that? Thanks Created attachment 1519474 [details]
memory usage rsyslog on example machine 2
Created attachment 1519475 [details]
memory usage example machine 4
> however for attempt to reproduce it we would need some info what type (and volume) of traffic goes on in the affected machines.
We see this on different kinds of machines with basic logging from just the localhost using the default config, thus to /var/log/messages, maillog, cron etc. We took samples of the last three days on four selected different machines:
- machine 1: avg. 2.3 lines/sec
- machine 2: avg. 17 lines/sec
- machine 3: avg. 12.5 lines/sec
- machine 4: avg. 0.7 lines/sec
We also measure RSS usage of the rsyslog process. We have data from before and after the update. Examples attached are for machine 2 (somewhat higher traffic) and 4 (somewhat lower traffic). In the graphs you can see:
- low and stable memory usage before the update was installed on November 20th
- dips in graphs for machine 2 when we restarted rsyslog
- climbing memory usage as if there is a memory leak for both machines, but more severe for machine 2 which does more log processing
Hello, I have encountered this bug and confirmed with a reproducible test case. I'm unsure at the moment exactly what is causing it. Originally I thought it might have been omfwd since along with the memory usage I also found a lot of log spam with lines like: ... rsyslogd: action 'action 7' resumed (module 'builtin:omfwd') [v8.24.0-34.el7 ... ... rsyslogd: action 'action 8' resumed (module 'builtin:omfwd') [v8.24.0-34.el7 ... However I don't believe that is actually this problem but a different one. I commented out the log server lines and restarted rsyslog. The log spam stopped, but the memory leak was still there. I believe the cause of the log spam was broken connectivity to the log servers, a separate bug/issue (there's another thread for that one here on the tracker). I can't copy here my full configuration with comments and such due to technical restrictions, but I can hand-type the enabled options. It is a very simple configuration which directs rsyslog to interface with journald. If I made any typos here, it is because I had to type it myself, and I can assure you they are not in the system config. Rsyslog's parser does not complain. Also I am leaving out the logging server lines since the bug persists without them. /etc/rsyslog.conf: $ModLoad imuxsock $ModLoad imjournal $OmitLocalLogging on $WorkDirectory /var/lib/rsyslog $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat $IMJournalStateFile imjournal.state *.info;mail.none;authpriv.none;cron.none /var/log/messages authpriv.* /var/log/secure mail.* -/var/log/maillog cron.* /var/log/cron *.emerg :omusrmsg:* uucp,news.crit /var/log/spooler local7.* /var/log/boot.log There is one line in /etc/rsyslog.d/listen.conf, but I commented it out. The above is all enabled configuration. The steps I used to reproduce the bug are as follows: 1. Use the following command line or similar to spam messages: counter=1;while true;do msg="Test message ${counter}";echo -en "${msg}\r;logger "$msg";let counter++;done 2. In another terminal watch memory usage grow. I used the following command line to monitor both disk and memory: watch 'top -b -n1 -p$(ps -Crsyslogd -opid h);echo;df -h /var/log' Regards, Charles Hello again, I spoke too soon in regards to my testing with the forwarding disabled. Memory usage did go up for a couple of minutes, but I'm seeing it getting freed after a certain point (~25k under RES in top). This freeing does not happen when I enable forwarding. Memory usage grows linearly without stopping. Looks like it's probably a problem with omfwd afterall. The configuration lines I'm using for the forwarding are as follows (addresses redacted): *.* @@x.x.x.x:xxxx *.* @@x.x.x.x:xxxx Regards, Charles I'm seeing memory usage grow as well. In one particular case we're running rsyslog as a collector and it grew to over 2GB in virt size. In that case we were having lots of failed TLS connection attempts due to clients having expired certificates. $ModLoad imtcp # make gtls driver the default $DefaultNetstreamDriver gtls # certificate files $DefaultNetstreamDriverCAFile /etc/pki/tls/certs/ca.crt $DefaultNetstreamDriverCertFile /etc/pki/tls/certs/host.crt $DefaultNetstreamDriverKeyFile /etc/pki/tls/private/host.key $InputTCPServerStreamDriverAuthMode x509/name $InputTCPServerStreamDriverPermittedPeer *.nwra.com $InputTCPServerStreamDriverPermittedPeer *.ad.nwra.com $InputTCPServerStreamDriverPermittedPeer *.mry.nwra.com $InputTCPServerStreamDriverPermittedPeer *.cora.nwra.com $InputTCPServerStreamDriverMode 1 # run driver in TLS-only mode $InputTCPServerRun 6514 # start up listener at port 6514 $InputTCPServerKeepAlive on # close connections to clients that disappear # Need to keep tabs $EscapeControlCharactersOnReceive off # Record the time received ("generated locally") for eventlog messages so that # logwatch will process late messages template(name="ReceivedFileFormat" type="list") { property(name="timegenerated" dateFormat="rfc3339") constant(value=" ") property(name="hostname") constant(value=" ") property(name="syslogtag") property(name="msg" spifno1stsp="on" ) property(name="msg" droplastlf="on" ) constant(value="\n") } user.* /var/log/eventlog-tls;ReceivedFileFormat authpriv.* /var/log/secure-tls local1.* -/var/log/suricata-tls rsyslogd[3695]: not permitted to talk to peer, certificate invalid: GnuTLS returned no specific reason [v8.24.0-34.el7] rsyslogd[3695]: invalid cert info: peer provided 1 certificate(s). Certificate 1 info: certificate valid from Fri Jan 19 12:21:53 2018 to Sat Jan 19 12:21:53 2019; Certificate public key: RSA; DN: ..... rsyslogd[3695]: netstream session 0x7f5f725051d0 from .... will be closed due to error [v8.24.0-34.el7 try http://www.rsyslog.com/e/2089 ] Hi, We're experiencing similar issues. I have the impression the behavior started with the latest systemd-journald update to fix the bug in journald. I know Debian released an updated systemd as their original fix contained a memory leak. Maybe the same fix is not yet included in the RHEL package? From the Debian changelog: systemd (232-25+deb9u8) stretch-security; urgency=high * Non-maintainer upload by the Security Team. * Address memory leak in dispatch_message_real() In dispatch_message_real() memory allocated by set_iovec_field_free() is not free()d. Follow upstream and introduce specific variables cmdline1 and cmdline2 and free() those automatically when dispatch_message_real() returns. * Correctly allocate core_timestamp on the heap and avoid invalid free() * Remove unused core* variables in process_kernel() Regards, Rik (In reply to Rik Theys from comment #14) > Hi, > > We're experiencing similar issues. I have the impression the behavior > started with the latest systemd-journald update to fix the bug in journald. > > I know Debian released an updated systemd as their original fix contained a > memory leak. Maybe the same fix is not yet included in the RHEL package? > > From the Debian changelog: > > systemd (232-25+deb9u8) stretch-security; urgency=high > > * Non-maintainer upload by the Security Team. > * Address memory leak in dispatch_message_real() > In dispatch_message_real() memory allocated by set_iovec_field_free() > is not free()d. > Follow upstream and introduce specific variables cmdline1 and cmdline2 > and free() those automatically when dispatch_message_real() returns. > * Correctly allocate core_timestamp on the heap and avoid invalid free() > * Remove unused core* variables in process_kernel() > > Regards, > Rik Many thanks for information, this will have to be checked, Michal do you know more about this please? Thanks Hi, It seems the bug is indeed in RHEL: https://bugzilla.redhat.com/show_bug.cgi?id=1666690 Regards, Rik (In reply to Jiří Vymazal from comment #15) > Many thanks for information, this will have to be checked, Michal do you > know more about this please? It is a bug in journald which we introduced while fixing latest batch of journal CVEs. Fix for it was already released in systemd-219-62.el7_6.3. Thank you for taking the time to report this issue to us. We appreciate the feedback and use reports such as this one to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for requesting support, and we are not able to guarantee the timeliness or suitability of a resolution. If this issue is critical or in any way time sensitive, please raise a ticket through the regular Red Hat support channels to ensure it receives the proper attention and prioritization to assure a timely resolution. For information on how to contact the Red Hat production support team, please visit: https://www.redhat.com/support/process/production/#howto No Enterprise support is requested. Just wanted to report an obvious memory leak bug, which still isn't resolved. Pim are you saying systemd-219-62.el7_6.3 does not fix this problem? We've been experiencing this memory leak too and set up a daily rsyslog restart cron job as a workaround. I'd like to remove that and was planning to test after our upcoming patch application window. If you have confirmation it's not fixed with the systemd update, that will change our plans. Thank you for any information you can provide. Hi James, we've not seen it fixed since the last two updates of systemd. On all machines we're seeing on both systemd-219-62.el7_6.3 and the newer systemd-219-62.el7_6.5 unchanged behaviour in memory usage. The problem is rather manageable in our case since we have plenty of memory available on most nodes and can restart the daemon when necessary, but I can imagine that's it a real pain in other situations. We are also still seeing increasing memory usage by journald and rsyslog, currently using systemd-219-62.el7_6.5.x86_64. I would like to add some information from my point of view. We have one server where slapd logs often (for more than year) what leads to many slapd entries in systemd journal and this repeating line in /var/log/messages: journal: Suppressed ... messages from /system.slice/slapd.service The configuration of both rsyslog and systemd is untouched. After updating from 7.5 to 7.6 both rsyslogd and systemd-journald started to eat memory which led to OOM killing one of these processes several times. After updating to systemd-219-62.el7_6.5.x86_64 and last restart of systemd-journald (by OOM killer and systemd) the systemd-journald no longer eats memory (problem solved). But despite of restarting rsyslogd it still has the leak and must be restarted regulary by crond because OOM killer no longer helps here (it is given by our memory/swap/slapd usage). The server itself was not restarted yet after systemd update. Yes, I believe I see the same as previous commenter - journald actually seems to keep memory usage at a consistent level after systemd update, but rsyslog keeps increasing in memory usage. (In reply to Michal Žejdl from comment #27) > The server itself was not restarted yet after systemd update. the rsyslogd still has memory leak even after server reboot It's a real nuisance when you duplicate an open bug in favor of a private bug. It makes it impossible to track progress. (In reply to adam winberg from comment #34) > It's a real nuisance when you duplicate an open bug in favor of a private > bug. It makes it impossible to track progress. Sorry, I did not even notice the other one was private. Unless I specifically look for a tiny 5px icon I have no way of really knowing that. The other one has another trackers and dependencies on which would complicate other stuff and processes, but I will try and see whether I can edit it so it can be made public. |